🗣️ AI Avatar & Talking Head Video — Pro Pack on RunComfy — 🗣️ AI Avatar & Talking Head Video — Pro Pack on 运行Comfy

AI avatar video on 运行Comfy. This 运行Comfy avatar video 技能创建s talking-head and lip-同步 videos via the `运行comfy` 命令行工具. 路由s across ByteDance OmniHuman (运行Comfy's lip-同步 feature pick — audio-driven full-body avatar from one portrAIt + audio file), Wan-AI Wan 2-7 (open-weights audio-driven lip-同步 via `audio_url` on a portrAIt), H应用yHorse 1.0 (Arena #1 t2v / i2v with in-pass audio from prompt — no audio file needed), 种子ance v2 Pro (multi-modal cinematic with reference audio + reference subject), and community Wan 2-2 Animate (stylized character animation). The 运行Comfy avatar video 技能 picks the right 模型 for intent — UGC voiceover, virtual presenter, dubbed product demo, lip-同步ed character, dia记录 scene — and ships each 模型's documented prompting patterns plus the minimal `运行comfy 运行` invoke. Triggers on "talking head", "lip 同步", "avatar video", "make X speak", "audio to video", "audio driven avatar", "virtual presenter", "AI spokesperson", "dubbed video", "UGC avatar", "HeyGen alternative", "Synthesia alternative", "digital human", "make this portrAIt talk", "video from voiceover", or any explicit ask to put words in a face with 运行Comfy.

0· 0·0 当前·0 累计

by @kalvinrv (Kalvin)·MIT-0

开发工具代码生成测试工具文档工具文件处理

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install ai-avatar-video-runcomfy

镜像加速npx clawhub@latest install ai-avatar-video-runcomfy --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

🗣️ AI Avatar & Talking Head Video — Pro Pack on 运行Comfy

AI avatar video on 运行Comfy. Put words in a face. This 运行Comfy avatar video 技能路由s across 运行Comfy's audio-driven avatar 模型s — OmniHuman, Wan 2-7 with audio_url, H应用yHorse, 种子ance v2 — picking the right path for the user's intent and shipping the documented prompts + the exact 运行comfy 运行 invoke for each.

运行comfy.com · Lip-同步 feature · 命令行工具 docs

Powered by the 运行Comfy 命令行工具 # 1. 安装 (see 运行comfy-命令行工具技能 for detAIls) npm i -g @运行comfy/命令行工具 # or: npx -y @运行comfy/命令行工具 --version

# 2. 签名 in 运行comfy 记录in # or in CI: 导出运行COMFY_令牌=<令牌>

# 3. 生成 an avatar video 运行comfy 运行 /<模型>/<端点> \ --输入 '{"prompt": "...", "audio_url": "https://...", "image_url": "https://..."}' \ --输出-dir ./out

命令行工具 deep dive: 运行comfy-命令行工具技能.

Pick the right 模型 for the user's intent

列出ed newest first. The 代理 classifies user intent — pre-recorded audio file or just a script? Photoreal portrAIt or stylized character? Single shot or cinematic composition? — and picks one 路由 below.

OmniHuman — bytedance/omnihuman/API (default)

ByteDance audio-driven full-body avatar. Feed one portrAIt + one audio file, 获取 back a video where the subject speaks / sings / gestures naturally. 列出ed on 运行Comfy's /feature/lip-同步 as the curated default. Pick for: UGC voiceover, virtual presenter, dubbed product demo, multi-language 命令行工具ps from same portrAIt. Avoid for: no audio file avAIlable (need to 生成 speech from a script) — use H应用yHorse 1.0.

H应用yHorse 1.0 — h应用yhorse/h应用yhorse-1-0/text-to-video (t2v) · h应用yhorse/h应用yhorse-1-0/image-to-video (i2v)

Arena #1 t2v / i2v with in-pass audio 生成d from prompt. No external audio file required — quote the spoken line inside the prompt. Pick for: written script with no audio file, "write a script → 获取 a video", concept 命令行工具ps, i2v talking-head from an existing portrAIt. Avoid for: precise lip-同步 to a specific MP3 — audio is re生成d each call, not locked.

种子ance v2 Pro — bytedance/种子ance-v2/pro

ByteDance multi-modal flagship — up to 9 reference images, 3 reference videos, 3 reference audio 追踪s composed in one pass with cinematic motion / lens / lighting control. Pick for: cinematic mono记录ue with reference subject + reference audio + reference scene; ad creative. Avoid for: simple "portrAIt + audio" jobs — overpowered, slower. Use OmniHuman.

Wan 2-7 with audio_url — wan-AI/wan-2-7/text-to-video

Open-weights with audio_url field — prompt describes the scene, audio file drives the mouth. Pick for: full scene control (not just a portrAIt), specific voiceover MP3, open-weights 流水线. Avoid for: simplest portrAIt-talks job — use OmniHuman.

Wan 2-2 Animate — community/wan-2-2-animate/API

Community-published variant on the Wan 2-2 base. Audio-driven full-body animation of stylized characters (illustration, anime, mascot). Pick for: stylized / illustrated character + audio (not a photoreal portrAIt). Avoid for: photoreal subjects — use OmniHuman or Wan 2-7.

路由 1: OmniHuman — default audio-driven avatar

模型: bytedance/omnihuman/API Cata记录: omnihuman · /feature/lip-同步

ByteDance OmniHuman is the strongest single-shot path: feed it one portrAIt image + one audio file, 获取 back a video where the subject speaks / sings / gestures naturally to the audio. No prompt required beyond the 输入s.

Invoke 运行comfy 运行 bytedance/omnihuman/API \ --输入 '{ "image_url": "https://your-cdn.example/presenter.jpg", "audio_url": "https://your-cdn.example/voiceover.mp3" }' \ --输出-dir ./out

Tips PortrAIt framing works best — head-and-shoulders or upper body. Full-body still works but expects more "presenter" energy. Audio 质量 drives 输出质量 — 清理 voiceover (no music bed) → 清理er mouth 同步. If your audio is a mix, isolate the voice stem first. No prompt field — the 模型 derives everything from image + audio. Don't fight that. See the full 输入模式 on the 模型 page. 路由 2: Wan 2-7 with audio_url — open-weights lip-同步

模型: wan-AI/wan-2-7/text-to-video Cata记录: wan-2-7

When you want full control over the scene (not just a portrAIt) and have a specific audio 追踪. Wan 2-7 accepts an audio_url field — the 模型生成s the scene from prompt and locks the subject's mouth to the audio.

Invoke 运行comfy 运行 wan-AI/wan-2-7/text-to-video \ --输入 '{ "prompt": "Studio portrAIt of a woman in her 30s, confident expression, soft window light, neutral gray background.", "audio_url": "https://your-cdn.example/voiceover.mp3", "duration": 8 }' \ --输出-dir ./out

Tips The prompt describes the scene; the audio drives the mouth. Don't put the spoken words in the prompt — the 模型 isn't reading them, it's 同步ing to the waveform. Match the audio's emotional tone — "confident expression" / "warmly engaged" / "deadpan delivery" cues the face. Camera language — "static portrAIt", "slow pu

数据来源：ClawHub ↗ · 中文优化：龙虾技能库