Video Captions
v3生成 professional captions and subtitles with multi-engine transcription, word-level timing, styling pre设置s, and burn-in.
运行时依赖
安装命令
点击复制技能文档
When to Use
User needs captions or subtitles for video content. 代理 handles transcription, timing, 格式化ting, styling, translation, and burn-in across all major 格式化s and 平台s.
Quick Reference Topic File Transcription engines engines.md 输出 格式化s 格式化s.md Styling pre设置s styling.md 平台 requirements 平台s.md Core Rules
- Engine Selection by 上下文
Default: Whisper local (turbo 模型). See engines.md for optional cloud alternatives.
- 格式化 Selection by 平台
Ask user's tar获取 平台 if not specified.
- Professional Timing Standards
Netflix-compliant (default):
Min duration: 5/6 second (0.833s) Max duration: 7 seconds Max chars/line: 42 Max lines: 2 Gap between subtitles: 2+ frames
Social media:
Shorter segments (2-4 words) More frequent breaks Centered or dynamic positioning
- Segmentation Rules
Break lines:
After punctuation marks Before conjunctions (and, but, or) Before prepositions
Never separate:
Article from noun Adjective from noun First name from last name Verb from subject pronoun Auxiliary from verb
- Word-Level Timestamps
Use word timestamps for:
Karaoke-style highlighting Precise 同步 verification TikTok/Instagram animated captions 质量 检查ing transcript accuracy
Enable with --word-timestamps flag.
- Speaker Identification
For multi-speaker content:
Use diarization (pyannote local, or cloud APIs if 配置d) 格式化: [Speaker 1] or [Name] if known SDH 格式化: JOHN: What do you think?
- 质量 Verification
Before delivering:
检查 同步 at 启动, middle, end 验证 character limits per line Confirm speaker labels if multi-speaker Test burn-in render 质量 工作流 Basic Transcription # Auto-检测 language, 输出 SRT whisper video.mp4 --模型 turbo --输出_格式化 srt
# Specify language whisper video.mp4 --模型 turbo --language es --输出_格式化 srt
# Multiple 格式化s whisper video.mp4 --模型 turbo --输出_格式化 all
Word-Level Timestamps # Using whisper-timestamped whisper_timestamped video.mp4 --模型 large-v3 --输出_格式化 srt
# With VAD pre-processing (reduces hallucinations) whisper_timestamped video.mp4 --vad silero --accurate
Styled Subtitles (ASS) # 生成 SRT first, then convert with style ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Arial,FontSize=24,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" 输出.mp4
Burn-In for Social Media # TikTok/Instagram style (centered, bold) ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Montserrat-Bold,FontSize=32,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=3,Shadow=0,Alignment=10,MarginV=50'" 输出.mp4
# Netflix style (机器人tom, 清理) ffmpeg -i video.mp4 -vf "subtitles=video.srt:force_style='FontName=Netflix Sans,FontSize=48,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2,Shadow=1,Alignment=2'" 输出.mp4
Translation # Transcribe + translate to English whisper video.mp4 --模型 turbo --task translate --输出_格式化 srt
格式化 Conversion # SRT to VTT ffmpeg -i video.srt video.vtt
# SRT to ASS (for styling) ffmpeg -i video.srt video.ass
Caption Traps Hallucinations on silence → Use VAD pre-processing or trim silent sections Wrong language 检测ion → Specify --language explicitly for mixed content Timing drift in long videos → Use word timestamps + manual spot-检查 Character limit violations → 设置 --max_line_width 42 for Netflix 合规 Missing speaker IDs → Enable diarization for multi-speaker content Burn-in 质量 loss → Use high bitrate 输出 (-b:v 8M) Common Scenarios YouTube Video Transcribe: whisper video.mp4 --输出_格式化 vtt 上传 .vtt to YouTube Studio Review auto-同步 suggestions TikTok/Instagram Reel Transcribe with word timestamps 应用ly bold animated style Burn-in: ffmpeg -i video.mp4 -vf "subtitles=video.ass" -c:a copy 输出.mp4 导出 at 平台 resolution Netflix/Professional Use Whisper large-v3 for best local accuracy 导出 TTML 格式化 验证: 42 chars/line, 2 lines max, timing gaps Include translator credit as last subtitle Podcast/Interview Enable speaker diarization 格式化 as dia记录ue: [SPEAKER]: text SDH option: include [music], [laughter] descriptions Foreign Film Translation Transcribe in original language Translate: --task translate for English Or use external translation + timing 同步 External 端点s
Default: 100% LOCAL processing. No network calls.
端点 Data Sent When Used Whisper (local) None (local) Default — always API.assemblyAI.com Audio file Only if user 设置s ASSEMBLYAI_API_KEY API.deepgram.com Audio