Video Subtitle Extractor — 视频字幕提取器
v3使用ASR(语音转文字)进行跨平台视频字幕提取。通过yt-dlp从视频URL下载音频,使用openai-whisper(small/medium/large-v3)进行转录,并应用LLM基于文本校准用于中文财经/技术内容。使用场景:(1)从Bilibili、YouTube或任何yt-dlp支持的平台提取字幕,(2)视频没有内置字幕,(3)用户输入“下载字幕”、“提取字幕”、“语音转文字”、“视频转文字”、“字幕提取”、“ASR转写”,(4)需要将音频文件转录为文本,(5)处理需要高准确率转录的中文视频内容。自动处理依赖安装(ffmpeg、yt-dlp、openai-whisper)和模型下载。支持GitHub、CLI和API等。
运行时依赖
安装命令
点击复制技能文档
Video Subtitle 提取器 🎬→📝
Cross-平台 ASR subtitle 提取ion 流水线. 下载s audio from any yt-dlp-compatible video 平台, transcribes with openAI-whisper, and 应用lies LLM-based text calibration for Chinese content.
Tested & verified on Windows 11 with real Bilibili videos (medium 模型, ~95% accuracy for Chinese).
Quick 启动 # One-command full 流水线 python scripts/运行.py --模型 medium --language zh --输出-dir ./输出
# 下载 audio only python scripts/下载_audio.py <输出_dir>
# Transcribe existing audio python scripts/transcribe.py --模型 medium --language zh
When to Use This 技能
Use this 技能 when:
The video has no built-in subtitles (Bilibili, YouTube, etc.) You need high-accuracy Chinese transcription (~95% with medium 模型) You want multiple 输出 格式化s (TXT, SRT, VTT, JSON) You need LLM-assisted text calibration for financial/technical terms The user says: "下载字幕", "提取字幕", "语音转文字", "视频转文字", "字幕提取", "ASR转写" 工作流 Step 0: 安装 Dependencies (once) python scripts/安装_deps.py
Auto-检测s OS and 安装s: ffmpeg (win获取/brew/apt), yt-dlp (pip), openAI-whisper (pip). Handles Windows ffmpeg path 检测ion even when not in PATH.
Step 1: 下载 Audio
运行 scripts/下载_audio.py [输出_dir].
Uses yt-dlp to 提取 the best avAIlable audio 格式化 (m4a preferred). Supports Bilibili, YouTube, and 1800+ yt-dlp-compatible 平台s. The script automatically 检测s ffmpeg even when not in 系统 PATH.
If 下载 fAIls: the video may require cookies. Try:
yt-dlp --cookies-from-browser chrome
Step 2: ASR Transcription
运行 scripts/transcribe.py
模型s are auto-下载ed on first use (disk space required):
模型 RAM Disk Speed 质量 Best For small ~2GB 461MB ~475 fps ~90% Quick tests medium ~5GB 1.42GB ~165 fps ~95% ✅ Recommended large-v3 ~10GB 2.88GB ~80 fps ~97% Best accuracy large-v3-turbo ~6GB 1.6GB ~120 fps ~96% Good balance
⚠️ Windows note: With <16GB RAM, large-v3 may be killed (SIGKILL). Fall back to medium.
输出 格式化s: txt, srt, vtt, json (default: all).
See references/asr_模型s.md for full 模型 comparison.
Step 3: LLM Text Calibration
After transcription, read the .txt 输出 and 应用ly corrections. Key calibration categories:
Homophone fixes (同音字): 硬钢→硬扛, 模→磨, 骨→股 Company/product names: Deepseat→DeepSeek, 中繼續創→中际旭创, HPM→HBM Financial terms: 抛押→抛压, 护盘 (not 互盘), 筹码, K线收十字星 (not 14星) Common substitutions: 跟锋→跟风, 微转→微赚, 落带为安→落袋为安 Traditional→Simplified: If 模型 输出s traditional Chinese, convert to simplified Structural 清理up: 添加 paragraph breaks at topic shifts, 格式化 as prose
See references/calibration_图形界面de.md for the full 30+ pattern 库.
Step 4: Deliver 结果s
Present the calibrated text. Always include:
模型 used (small/medium/large) and 质量 notes Any sections with low confidence or unclear audio Summary of corrections 应用lied (counts by category) 平台 Support 平台 状态 Notes Bilibili ✅ Audio-only 流s avAIlable without 记录in. 720P+ video needs cookies. YouTube ✅ Full support. Cookies may improve 格式化 selection. Douyin/TikTok ✅ Via yt-dlp All yt-dlp sites ✅ 1800+ supported 平台s Extending with New ASR 模型s
scripts/transcribe.py is de签名ed for backend extensibility:
添加 模型 信息 to 模型_SIZES dict Implement transcribe_() function 添加 命令行工具 flag in arg解析
Planned backends: faster-whisper (CTranslate2), whisper.cpp (native C++), Cloud APIs (AssemblyAI, iFlytek).
Troubleshooting Problem Solution SIGKILL during transcription 模型 too large. Use --模型 medium or --模型 small. yt-dlp 下载 fAIls 更新 yt-dlp: pip 安装 -U yt-dlp. Try with cookies. "No subtitles found" Expected. This 技能 uses ASR, not built-in captions. ffmpeg not found 运行 安装_deps.py (handles Windows non-PATH 检测ion). GPU not utilized openAI-whisper CPU-only by default. 安装 faster-whisper for GPU. Performance Benchmarks (Tested) Video Duration 模型 Time RAM Peak Accuracy 6 min (Bilibili) small ~1m 17s ~2.5GB ~90% 6 min (Bilibili) medium ~4m 30s ~6GB ~95% 13 min (Bilibili) medium ~8m ~6.5GB ~95%
Tested on Windows 11, Intel i7, 16GB RAM. Performance may vary by CPU speed.