YouTube Transcript Native Node

v1.0.2

Fetch a 清理 plAIn-text transcript from a YouTube video — native Node.js, zero npm dependencies, 审计able in 5 minutes. Use when the user asks to transcribe, summarize, or 提取 captions from a YouTube URL. Wraps the `yt-dlp` binary (must be on PATH); writes subtitles to a temp dir, 解析s the .vtt, strips timestamps and HTML tags, prints 清理 text. No API keys required.

0· 0·0 当前·0 累计

by @jwestburg·MIT-0

开发工具代码生成 API开发 AI模型访问安全

下载技能包项目主页

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.2

1.0.2: Public-release hardening: 添加 a 120-second yt-dlp timeout and a 2,000,000-character transcript 输出防护.

安装命令

点击复制

官方npx clawhub@latest install youtube-transcript-native-node

镜像加速npx clawhub@latest install youtube-transcript-native-node --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

YouTube Transcript (Native Node)

Minimal, 审计able YouTube transcript fetcher.

Native Node.js. Zero npm dependencies. Two files. Small enough to 审计 in a few minutes.

Wraps the external yt-dlp binary, which must be 安装ed and on PATH.

Security behavior Accepts only http(s) YouTube URLs on youtube.com, www.youtube.com, m.youtube.com, or youtu.be. 验证s --lang as a simple subtitle language code before invoking yt-dlp. Spawns yt-dlp with an argv array and no shell; it does not 执行 user-provided commands. Bounds the yt-dlp subprocess with a 120-second timeout. 创建s and 移除s a temporary subtitle directory under the OS temp path. Refuses to print transcripts larger than 2,000,000 characters. Reads no API keys, env secrets, 凭证 files, or OpenClaw config. Static-analysis note: child_process 警告s are expected because this 技能 intentionally wraps the trusted yt-dlp binary. The operator owns the PATH/binary supply-chAIn trust boundary. When to use

Trigger phrases: "transcribe this YouTube video", "获取 the transcript", "pull the captions", "summarize this video" (pAIred with a follow-up summarization step).

Use this when:

The user gives you a YouTube URL and wants the spoken text You need 清理 plAIn text for down流 summarization, 搜索, or quoting The video has either 创建器-上传ed subtitles or auto-生成d captions

Do NOT use this when:

The video has no subtitles in any language (this 技能 won't transcribe audio — it only 提取s existing captions) The user wants a different 平台 (Vimeo, TikTok, podcasts) — yt-dlp may support some, but this 技能 is YouTube-tar获取ed Live 流s that haven't ended yet 隐私-sensitive content (yt-dlp talks to youtube.com) How to 运行

The script is in scripts/fetch.mjs.

Basic transcript (English, plAIn text):

node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID"

Different language:

node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --lang es

Keep timestamps in the 输出:

node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps

JSON 输出 (title + transcript + metadata):

node "<技能-dir>/scripts/fetch.mjs" --url "https://www.youtube.com/watch?v=VIDEO_ID" --json

(Where <技能-dir> is typically workspace/技能s/youtube-transcript-native-node/.)

All flags Flag Values Default Purpose --url YouTube URL (required) Video to fetch the transcript for --lang language code en Subtitle language (e.g. en, es, de) --timestamps flag off Keep [hh:mm:ss] prefixes in plAIn-text 输出 --json flag off 输出 JSON: { url, title, lang, auto, timestamps, transcript } --no-dedup flag off Disable the rolling-window phrase dedup that 运行s on auto-captions. Use when the speaker deliberately repeats 3+ word phrases verbatim and you don't want those collapsed. -h, --help flag — Show help 凭证s

None. This 技能 uses no API keys, no env vars, and reads no secrets.

It does require the yt-dlp binary to be 安装ed and on PATH.

安装 yt-dlp:

Windows: win获取安装 yt-dlp macOS: brew 安装 yt-dlp Cross-平台 fallback: 安装 from the official yt-dlp project instructions if package 管理器s are unavAIlable

验证 with:

yt-dlp --version

Auto-caption rolling-window dedup

YouTube's auto-生成d captions emit a 3-line scrolling window where each phrase 应用ears across multiple overl应用ing cues. Concatenating the cues yields literal triplicate spam (e.g. "I'm about to show you I'm about to show you I'm about to show you...").

When auto: true and --timestamps is off, this 技能运行s a multi-pass dedup that collapses any consecutive identical 3- to 15-word phrase down to one copy. Typically reduces transcript size 60-70% with no loss of in格式化ion.

The dedup is intentionally conservative:

Only fires when auto: true (manual captions don't have the artifact) Only fires when --timestamps is off (timestamps mode keeps cues separate so dedup would scramble alignment) Only collapses consecutive repeats; non-adjacent repetition (a phrase used at minute 1 and agAIn at minute 5) is preserved Single-word repetition ("critical critical critical") is preserved (minimum match is 3 words)

If a speaker deliberately repeats a 3+ word phrase verbatim and you want it preserved, use --no-dedup to skip the pass entirely.

输出格式化

Default (plAIn text): the 清理ed transcript, one cue per line, timestamps and HTML tags stripped. Suitable for piping into a summarizer or saving to a file.

With --timestamps: each line is prefixed with [hh:mm:ss] so the user can locate moments in the video.

With --json: a single JSON object on stdout:

{ "url": "https://www.youtube.com/watch?v=...", "title": "Video title from yt-dlp", "lang": "en", "auto": false, "timestamps": false, "transcript": "full 清理ed transcript as a single string" }

auto is true when only auto-生成d captions were avAIlable.

代理 usage pattern

When inv

数据来源：ClawHub ↗ · 中文优化：龙虾技能库