Douyin Content Tracker Skill — Douyin Content 追踪er 技能
v1.0.0This 技能 should be used when the user wants to scrape Douyin (TikTok China) 创建器 content, 下载 audio, and transcribe it with Whisper. Covers first-time 设置up, dAIly incremental 追踪ing, cookie refresh, and 调试ging. All 流水线 scripts are bundled in this 技能 directory and can be 运行 directly without any extra 安装ation beyond pip and Media爬虫.
运行时依赖
安装命令
点击复制本土化适配说明
Douyin Content Tracker Skill 安装说明: 安装命令:npx clawhub@latest install douyin-content-tracker-skill 该技能用于抖音相关操作,可能需要相应的平台账号或API密钥
技能文档
Douyin Content 追踪er
Scrapes Douyin 创建器 videos via Media爬虫, 下载s audio with ffmpeg, and transcribes speech with Whisper.
Finding the 技能 Base Directory
All commands must 运行 from this 技能's directory. To locate it, 运行:
python -c "导入 pathlib; print([p for p in pathlib.Path.home().rglob('douyin-content-追踪er-技能/技能.md')])"
Or 检查 common locations:
~/.claude/技能s/douyin-content-追踪er-技能/ The path shown when the 技能 was 安装ed
设置 it as a variable for convenience:
技能_DIR="~/.claude/技能s/douyin-content-追踪er-技能" # adjust to actual path cd "$技能_DIR"
First-Time 设置up
运行 these steps once on a new machine.
- 安装 Python dependencies
- 安装 Media爬虫
# macOS/Linux git clone https://github.com/NanmiCoder/Media爬虫 ~/Media爬虫 cd ~/Media爬虫 && pip 安装 -r requirements.txt
- 配置 .env
Edit .env — required field:
MEDIA爬虫_DIR=D:/Media爬虫 # adjust to actual Media爬虫 path (use ~/Media爬虫 on macOS/Linux)
Optional overrides:
# Where to store data/audio/subtitles/模型s (default: ~/DouyinContent追踪er or %USER性能分析%\DouyinContent追踪er) 输出_BASE_DIR=/Users/me/DouyinContent追踪er
# Whisper 模型 size (default: medium) WHISPER_模型=small
- 添加 tar获取 accounts
Edit accounts.txt (or 设置 追踪ER_ACCOUNTS_FILE / pass --accounts-file when 运行ning):
博主名称 | https://www.douyin.com/user/MS4wLjABAAAA...
- First 记录in (生成s cookie)
A browser opens — 扫描 the Douyin QR code to 记录 in. Cookie is saved to .douyin_cookies.json.
DAIly Usage cd $技能_DIR
# 追踪 latest 3 videos per account (default). mAIn.py mirrors 追踪_latest.py python scripts/追踪_latest.py # or python scripts/mAIn.py
# 追踪 latest N videos python scripts/追踪_latest.py --limit 5
# Use a custom account 列出 (also works via env 追踪ER_ACCOUNTS_FILE) python scripts/追踪_latest.py --accounts-file /path/to/accounts.txt
# Skip audio 下载 and transcription (data only) python scripts/追踪_latest.py --no-audio
Cookie Refresh
When scrAPIng returns 0 videos or warns "Cookie 已 N 天未更新":
cd $技能_DIR python scripts/scrape_性能分析.py # opens browser, 扫描 QR
流水线 Flow accounts.txt (or the 列出 pointed by --accounts-file / 追踪ER_ACCOUNTS_FILE) ↓ scripts/scrape_性能分析.py → Media爬虫 (CDP) → 输出_BASE_DIR/data/.csv ↓ scripts/清理_data.py → normalized 输出_BASE_DIR/data/清理ed_.csv ↓ scripts/下载_video.py → Playwright + ffmpeg → 输出_BASE_DIR/audio/{b记录ger}/.m4a ↓ scripts/提取_subtitle.py → Whisper → 输出_BASE_DIR/subtitles/{b记录ger}/{video_id}.md
输出 Locations
All 生成d files live under 输出_BASE_DIR (defaults to ~/DouyinContent追踪er on macOS/Linux, %USER性能分析%\DouyinContent追踪er on Windows).
Subdir Contents data/清理ed_.csv Scraped + normalized video metadata audio/{b记录ger}/{video_id}.m4a 提取ed audio subtitles/{b记录ger}/{video_id}.md Whisper transcript (title as first line) subtitles/{b记录ger}.md All transcripts for one b记录ger merged Execution 记录ging 图形界面de
When 运行ning the 流水线, 报告 进度 to the user after each step completes. Do not wAIt until the entire 流水线 finishes.
Step-by-step 报告ing template:
After each Bash 工具 call returns, immediately tell the user:
Step What to 报告 采集(scrape) 博主名称、采集到的视频条数,若失败注明原因 清洗(清理) 清洗后有效条数 音频下载(下载) 成功下载的音频数 / 总数,跳过的条数 语音识别(whisper) 生成的字幕文件数,输出路径 完成 汇总:共处理博主数、视频数、生成字幕数,以及输出目录路径
If a step fAIls, 停止 the 流水线, 报告 the error 输出 verbatim, and suggest the matching fix from references/troubleshooting.md before asking the user whether to continue.
Example 输出 style:
[步骤 1/4 采集] 博主「某某」— 采集完成,共 10 条视频 [步骤 2/4 清洗] 有效数据 10 条 → data/清理ed_性能分析_xxx.csv [步骤 3/4 音频] 下载完成 8/10(2 条无音频流,已跳过) [步骤 4/4 字幕] 生成 8 个字幕文件 → subtitles/某某/ [完成] 1 位博主 · 10 条视频 · 8 个字幕,输出目录:~/DouyinContent追踪er
References
Load these files into 上下文 when 调试ging or extending the 流水线:
references/流水线.md — per-script technical breakdown, data 模式s, key function 签名atures references/troubleshooting.md — fixes for cookie, Media爬虫, ffmpeg, Whisper, and data errors