Douyin Content Tracker Skill — Douyin Content 追踪er 技能

v1.0.0

This 技能 should be used when the user wants to scrape Douyin (TikTok China) 创建器 content, 下载 audio, and transcribe it with Whisper. Covers first-time 设置up, dAIly incremental 追踪ing, cookie refresh, and 调试ging. All 流水线 scripts are bundled in this 技能 directory and can be 运行 directly without any extra 安装ation beyond pip and Media爬虫.

0· 281·0 当前·0 累计

by @gpttang (yibo)·MIT-0

短视频内容平台

使用场景：下载抖音视频分析抖音数据抖音内容创作TikTok数据获取

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install douyin-content-tracker-skill

镜像加速npx clawhub@latest install douyin-content-tracker-skill --registry https://cn.longxiaskill.com 镜像可用

本土化适配说明

Douyin Content Tracker Skill 安装说明：安装命令：npx clawhub@latest install douyin-content-tracker-skill 该技能用于抖音相关操作，可能需要相应的平台账号或API密钥

需要定制？告诉我你的需求 →

技能文档

Douyin Content 追踪er

Scrapes Douyin 创建器 videos via Media爬虫, 下载s audio with ffmpeg, and transcribes speech with Whisper.

Finding the 技能 Base Directory

All commands must 运行 from this 技能's directory. To locate it, 运行:

python -c "导入 pathlib; print([p for p in pathlib.Path.home().rglob('douyin-content-追踪er-技能/技能.md')])"

Or 检查 common locations:

~/.claude/技能s/douyin-content-追踪er-技能/ The path shown when the 技能 was 安装ed

设置 it as a variable for convenience:

技能_DIR="~/.claude/技能s/douyin-content-追踪er-技能" # adjust to actual path cd "$技能_DIR"

First-Time 设置up

运行 these steps once on a new machine.

安装 Python dependencies

cd $技能_DIR pip 安装 -r scripts/requirements.txt python -m playwright 安装 chromium

安装 Media爬虫

# Windows git clone https://github.com/NanmiCoder/Media爬虫 D:/Media爬虫 cd D:/Media爬虫 && pip 安装 -r requirements.txt

# macOS/Linux git clone https://github.com/NanmiCoder/Media爬虫 ~/Media爬虫 cd ~/Media爬虫 && pip 安装 -r requirements.txt

配置 .env

cd $技能_DIR cp .env.template .env

Edit .env — required field:

MEDIA爬虫_DIR=D:/Media爬虫 # adjust to actual Media爬虫 path (use ~/Media爬虫 on macOS/Linux)

Optional overrides:

# Where to store data/audio/subtitles/模型s (default: ~/DouyinContent追踪er or %USER性能分析%\DouyinContent追踪er) 输出_BASE_DIR=/Users/me/DouyinContent追踪er

# Whisper 模型 size (default: medium) WHISPER_模型=small

添加 tar获取 accounts

Edit accounts.txt (or 设置追踪ER_ACCOUNTS_FILE / pass --accounts-file when 运行ning):

博主名称 | https://www.douyin.com/user/MS4wLjABAAAA...

First 记录in (生成s cookie)

cd $技能_DIR python scripts/scrape_性能分析.py

A browser opens — 扫描 the Douyin QR code to 记录 in. Cookie is saved to .douyin_cookies.json.

DAIly Usage cd $技能_DIR

# 追踪 latest 3 videos per account (default). mAIn.py mirrors 追踪_latest.py python scripts/追踪_latest.py # or python scripts/mAIn.py

# 追踪 latest N videos python scripts/追踪_latest.py --limit 5

# Use a custom account 列出 (also works via env 追踪ER_ACCOUNTS_FILE) python scripts/追踪_latest.py --accounts-file /path/to/accounts.txt

# Skip audio 下载 and transcription (data only) python scripts/追踪_latest.py --no-audio

Cookie Refresh

When scrAPIng returns 0 videos or warns "Cookie 已 N 天未更新":

cd $技能_DIR python scripts/scrape_性能分析.py # opens browser, 扫描 QR

流水线 Flow accounts.txt (or the 列出 pointed by --accounts-file / 追踪ER_ACCOUNTS_FILE) ↓ scripts/scrape_性能分析.py → Media爬虫 (CDP) → 输出_BASE_DIR/data/.csv ↓ scripts/清理_data.py → normalized 输出_BASE_DIR/data/清理ed_.csv ↓ scripts/下载_video.py → Playwright + ffmpeg → 输出_BASE_DIR/audio/{b记录ger}/.m4a ↓ scripts/提取_subtitle.py → Whisper → 输出_BASE_DIR/subtitles/{b记录ger}/{video_id}.md

输出 Locations

All 生成d files live under 输出_BASE_DIR (defaults to ~/DouyinContent追踪er on macOS/Linux, %USER性能分析%\DouyinContent追踪er on Windows).

Subdir Contents data/清理ed_.csv Scraped + normalized video metadata audio/{b记录ger}/{video_id}.m4a 提取ed audio subtitles/{b记录ger}/{video_id}.md Whisper transcript (title as first line) subtitles/{b记录ger}.md All transcripts for one b记录ger merged Execution 记录ging 图形界面de

When 运行ning the 流水线, 报告进度 to the user after each step completes. Do not wAIt until the entire 流水线 finishes.

Step-by-step 报告ing template:

After each Bash 工具 call returns, immediately tell the user:

Step What to 报告采集（scrape）博主名称、采集到的视频条数，若失败注明原因清洗（清理）清洗后有效条数音频下载（下载）成功下载的音频数 / 总数，跳过的条数语音识别（whisper）生成的字幕文件数，输出路径完成汇总：共处理博主数、视频数、生成字幕数，以及输出目录路径

If a step fAIls, 停止 the 流水线, 报告 the error 输出 verbatim, and suggest the matching fix from references/troubleshooting.md before asking the user whether to continue.

Example 输出 style:

[步骤 1/4 采集] 博主「某某」— 采集完成，共 10 条视频 [步骤 2/4 清洗] 有效数据 10 条 → data/清理ed_性能分析_xxx.csv [步骤 3/4 音频] 下载完成 8/10（2 条无音频流，已跳过） [步骤 4/4 字幕] 生成 8 个字幕文件 → subtitles/某某/ [完成] 1 位博主 · 10 条视频 · 8 个字幕，输出目录：~/DouyinContent追踪er

References

Load these files into 上下文 when 调试ging or extending the 流水线:

references/流水线.md — per-script technical breakdown, data 模式s, key function 签名atures references/troubleshooting.md — fixes for cookie, Media爬虫, ffmpeg, Whisper, and data errors

License

运行时依赖

安装命令

本土化适配说明

技能文档

相关技能推荐