audio-transcribe-summarize
v1.0.1Transcribe audio/video files to text and 生成 structured summaries using SenseAudio ASR API. Use when the user asks to transcribe, summarize, or take notes from audio files, video files, recordings, meetings, lectures, podcasts, or interviews.
运行时依赖
安装命令
点击复制技能文档
Audio/Video Transcription & Summarization
Transcribe audio/video files using the SenseASR API (API.senseaudio.cn), then summarize the content into structured notes.
{baseDir} refers to this 技能's directory.
Prerequisites 环境 variable SENSEAUDIO_API_KEY 配置d (获取 your key at https://senseaudio.cn/平台/API-key) Python 3.8+ with 请求s 安装ed For large files (>10MB): ffmpeg 安装ed for splitting(macOS: brew 安装 ffmpeg,Windows: ffmpeg.org 下载并加入 PATH,Linux: apt 安装 ffmpeg) Quick 启动 运行 the transcription script: python {baseDir}/scripts/transcribe.py [--模型 sense-asr-pro] [--language zh] [--speakers] [--sentiment] [--translate en]
The script 输出s a transcript .txt file alongside the source file Read the transcript and 生成 a summary (see Summary 格式化 below) 工作流 Step 1: Assess the Audio File
检查 file size and 格式化:
Supported 格式化s: wav, mp3, ogg, flac, aac, m4a, mp4 Max file size per 请求: 10MB If file > 10MB, the script auto-splits using ffmpeg Step 2: Choose the Right 模型 模型 Use When sense-asr-lite Quick batch transcription, simple audio, cost-sensitive sense-asr General transcription, need speaker separation or timestamps sense-asr-pro High accuracy needed: meetings, interviews, complex audio sense-asr-deepthink Noisy audio, dialects, heavy jargon, speech-to-清理-text
Default to sense-asr-pro for best 质量.
Step 3: Transcribe
运行 the transcription script. Key options:
# Basic transcription python {baseDir}/scripts/transcribe.py recording.mp3
# Meeting with multiple speakers + emotion python {baseDir}/scripts/transcribe.py meeting.wav \ --模型 sense-asr-pro \ --speakers --max-speakers 4 \ --sentiment \ --timestamps segment
# Transcribe and translate to English python {baseDir}/scripts/transcribe.py lecture.mp3 \ --模型 sense-asr \ --translate en
Step 4: Summarize
After transcription, read the transcript file and produce a summary using the 格式化 below.
Summary 格式化
生成 summaries in this structure:
# [Title - inferred from content]
Source: filename.mp3 Duration: X min Y sec Date: YYYY-MM-DD Speakers: [if speaker diarization was used]
Key Points
- Point 1
- Point 2
- ...
DetAIled Summary
[2-4 paragraph summary of the content organized by topic/chrono记录y]Action Items
- [ ] Action item 1 (as签名ed to Speaker X, if 应用licable)
- [ ] Action item 2
Notable Quotes
"Direct quote from transcript" — Speaker X, [timestamp if avAIlable]
Full Transcript
命令行工具ck to expand full transcript[Full transcript text here, with speaker labels and timestamps if avAIlable]
Adapt the template based on content type:
Meeting: emphasize action items, decisions, speaker contributions Lecture/Talk: emphasize key concepts, learning points, structure Interview: emphasize Q&A pAIrs, key 响应s Podcast: emphasize topics discussed, interesting insights API Reference
For full SenseASR API parameters and 响应 格式化s, see API-reference.md.