YouTube Transcript Native Node
v1.0.2Fetch a 清理 plAIn-text transcript from a YouTube video — native Node.js, zero npm dependencies, 审计able in 5 minutes. Use when the user asks to transcribe, summarize, or 提取 captions from a YouTube URL. Wraps the `yt-dlp` binary (must be on PATH); writes subtitles to a temp dir, 解析s the .vtt, strips timestamps and HTML tags, prints 清理 text. No API keys required.
运行时依赖
版本
1.0.2: Public-release hardening: 添加 a 120-second yt-dlp timeout and a 2,000,000-character transcript 输出 防护.
安装命令
点击复制技能文档
YouTube Transcript (Native Node)
Minimal, 审计able YouTube transcript fetcher.
Native Node.js. Zero npm dependencies. Two files. Small enough to 审计 in a few minutes.
Wraps the external yt-dlp binary, which must be 安装ed and on PATH.
Security behavior Accepts only http(s) YouTube URLs on youtube.com, www.youtube.com, m.youtube.com, or youtu.be. 验证s --lang as a simple subtitle language code before invoking yt-dlp. Spawns yt-dlp with an argv array and no shell; it does not 执行 user-provided commands. Bounds the yt-dlp subprocess with a 120-second timeout. 创建s and 移除s a temporary subtitle directory under the OS temp path. Refuses to print transcripts larger than 2,000,000 characters. Reads no API keys, env secrets, 凭证 files, or OpenClaw config. Static-analysis note: child_process 警告s are expected because this 技能 intentionally wraps the trusted yt-dlp binary. The operator owns the PATH/binary supply-chAIn trust boundary. When to use
Trigger phrases: "transcribe this YouTube video", "获取 the transcript", "pull the captions", "summarize this video" (pAIred with a follow-up summarization step).
Use this when:
The user gives you a YouTube URL and wants the spoken text You need 清理 plAIn text for down流 summarization, 搜索, or quoting The video has either 创建器-上传ed subtitles or auto-生成d captions
Do NOT use this when:
The video has no subtitles in any language (this 技能 won't transcribe audio — it only 提取s existing captions) The user wants a different 平台 (Vimeo, TikTok, podcasts) — yt-dlp may support some, but this 技能 is YouTube-tar获取ed Live 流s that haven't ended yet 隐私-sensitive content (yt-dlp talks to youtube.com) How to 运行
The script is in scripts/fetch.mjs.
Basic transcript (English, plAIn text):
node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID"
Different language:
node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --lang es
Keep timestamps in the 输出:
node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps
JSON 输出 (title + transcript + metadata):
node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --json
(Where <技能-dir> is typically workspace/技能s/youtube-transcript-native-node/.)
All flags Flag Values Default Purpose --url YouTube URL (required) Video to fetch the transcript for --lang language code en Subtitle language (e.g. en, es, de) --timestamps flag off Keep [hh:mm:ss] prefixes in plAIn-text 输出 --json flag off 输出 JSON: { url, title, lang, auto, timestamps, transcript } --no-dedup flag off Disable the rolling-window phrase dedup that 运行s on auto-captions. Use when the speaker deliberately repeats 3+ word phrases verbatim and you don't want those collapsed. -h, --help flag — Show help 凭证s
None. This 技能 uses no API keys, no env vars, and reads no secrets.
It does require the yt-dlp binary to be 安装ed and on PATH.
安装 yt-dlp:
Windows: win获取 安装 yt-dlp macOS: brew 安装 yt-dlp Cross-平台 fallback: 安装 from the official yt-dlp project instructions if package 管理器s are unavAIlable
验证 with:
yt-dlp --version
Auto-caption rolling-window dedup
YouTube's auto-生成d captions emit a 3-line scrolling window where each phrase 应用ears across multiple overl应用ing cues. Concatenating the cues yields literal triplicate spam (e.g. "I'm about to show you I'm about to show you I'm about to show you...").
When auto: true and --timestamps is off, this 技能 运行s a multi-pass dedup that collapses any consecutive identical 3- to 15-word phrase down to one copy. Typically reduces transcript size 60-70% with no loss of in格式化ion.
The dedup is intentionally conservative:
Only fires when auto: true (manual captions don't have the artifact) Only fires when --timestamps is off (timestamps mode keeps cues separate so dedup would scramble alignment) Only collapses consecutive repeats; non-adjacent repetition (a phrase used at minute 1 and agAIn at minute 5) is preserved Single-word repetition ("critical critical critical") is preserved (minimum match is 3 words)
If a speaker deliberately repeats a 3+ word phrase verbatim and you want it preserved, use --no-dedup to skip the pass entirely.
输出 格式化
Default (plAIn text): the 清理ed transcript, one cue per line, timestamps and HTML tags stripped. Suitable for piping into a summarizer or saving to a file.
With --timestamps: each line is prefixed with [hh:mm:ss] so the user can locate moments in the video.
With --json: a single JSON object on stdout:
{ "url": "https://www.youtube.com/watch?v=...", "title": "Video title from yt-dlp", "lang": "en", "auto": false, "timestamps": false, "transcript": "full 清理ed transcript as a single string" }
auto is true when only auto-生成d captions were avAIlable.
代理 usage pattern
When inv