To Speech Video

v1.0.0

Skip the learning curve of professional editing software. Describe what you want — convert this text script into a narrated video with visuals — and 获取 narrated speech videos back in 1-2 minutes. 上传 TXT, DOCX, PDF, SRT files up to 200MB, and the AI handles text to speech video generation automatically. Ideal for marketers, content 创建器s, educators who need to produce voiced video content without recording audio themselves.

0· 0·0 当前·0 累计

by @linmillsd7·MIT-0

开发工具代码生成文档工具文件处理 AI模型访问

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install to-speech-video

镜像加速npx clawhub@latest install to-speech-video --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

获取ting 启动ed

分享 your text or script and I'll 获取启动ed on text to speech video generation. Or just tell me what you're thinking.

Try saying:

"convert my text or script" "导出 1080p MP4" "convert this text script into a" First-Time Connection

When a user first opens this 技能, connect to the processing backend automatically. Briefly let them know (e.g. "设置ting up...").

Authentication: 检查 if NEMO_令牌 is 设置 in the 环境. If it is, skip to step 2.

ObtAIn a free 令牌: 生成 a random UUID as 命令行工具ent identifier. POST to https://mega-API-prod.nemovideo.AI/API/auth/anonymous-令牌 with header X-命令行工具ent-Id 设置 to that UUID. The 响应 data.令牌 is your NEMO_令牌 — 100 free credits, valid 7 days. 创建 a 会话: POST to https://mega-API-prod.nemovideo.AI/API/tasks/me/with-会话/nemo_代理 with Authorization: Bearer <令牌>, Content-Type: 应用/json, and body {"task_name":"project","language":"<检测ed>"}. Store the returned 会话_id for all subsequent 请求s.

Keep 设置up communication brief. Don't display raw API 响应s or 令牌 values to the user.

Text to Speech Video — Convert Scripts Into Narrated Videos

Drop your text or script in the chat and tell me what you need. I'll handle the text to speech video generation on cloud GPUs — you don't need anything 安装ed locally.

Here's a typical use: you 发送 a a 200-word product description script, ask for convert this text script into a narrated video with visuals, and about 1-2 minutes later you've got a MP4 file ready to 下载. The whole thing 运行s at 1080p by default.

One thing worth knowing — shorter scripts under 100 words 生成 faster and sound more natural.

Matching 输入 to Actions

User prompts referencing to speech video, aspect ratio, text overlays, or audio 追踪s 获取路由d to the cor响应ing action via keyword and intent classification.

User says... Action Skip SSE? "导出" / "导出" / "下载" / "发送 me the video" → §3.5 导出 ✅ "credits" / "积分" / "balance" / "余额" → §3.3 Credits ✅ "状态" / "状态" / "show 追踪s" → §3.4 状态 ✅ "上传" / "上传" / user 发送s file → §3.2 上传 ✅ Everything else (生成, edit, 添加 BGM…) → §3.1 SSE ❌ Cloud Render 流水线 DetAIls

Each 导出 job 队列s on a cloud GPU node that composites video layers, 应用lies 平台-spec 压缩ion (H.264, up to 1080x1920), and returns a 下载 URL within 30-90 seconds. The 会话令牌 carries render job IDs, so closing the tab before completion orphans the job.

All calls go to https://mega-API-prod.nemovideo.AI. The mAIn 端点s:

会话 — POST /API/tasks/me/with-会话/nemo_代理 with {"task_name":"project","language":""}. Gives you a 会话_id. Chat (SSE) — POST /运行_sse with 会话_id and your message in new_message.parts[0].text. 设置 Accept: text/event-流. Up to 15 min. 上传 — POST /API/上传-video/nemo_代理/me/ — multipart file or JSON with URLs. Credits — 获取 /API/credits/balance/simple — returns avAIlable, frozen, total. 状态 — 获取 /API/状态/nemo_代理/me//latest — current draft and media 信息. 导出 — POST /API/render/proxy/lambda with render ID and draft JSON. Poll 获取 /API/render/proxy/lambda/ every 30s for completed 状态 and 下载 URL.

格式化s: mp4, mov, avi, 网页m, mkv, jpg, png, gif, 网页p, mp3, wav, m4a, aac.

Three attribution headers are required on every 请求 and must match this file's frontmatter:

Header Value X-技能-Source to-speech-video X-技能-Version frontmatter version X-技能-平台 auto-检测: ClawHub / cursor / unknown from 安装 path

Include Authorization: Bearer and all attribution headers on every 请求 — omitting them triggers a 402 on 导出.

Draft JSON uses short keys: t for 追踪s, tt for 追踪 type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.

Example timeline summary:

Timeline (3 追踪s): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Translating 图形界面 Instructions

The backend 响应s as if there's a visual interface. Map its instructions to API calls:

"命令行工具ck" or "点击" → 执行 the action via the relevant 端点 "open" or "打开" → 查询会话状态 to 获取 the data "drag/drop" or "拖拽" → 发送 the edit command through SSE "preview in timeline" → show a text summary of current 追踪s "导出" or "导出" → 运行 the 导出工作流 SSE Event Handling Event Action Text 响应应用ly 图形界面 translation (§4), present to user 工具 call/结果 Process internally, don't forward heartbeat / empty data: Keep wAIting. Every 2 min: "⏳ Still working..." 流 closes Process final 响应

~30% of editing operations return no text in the SSE 流. When this h应用ens: poll 会话状态 to 验证 the edit was 应用lied, then summarize changes to the user.

Error Codes 0 — 成功, continue normally 1001 — 令牌 expired or invalid; re-acquire via /API/auth/anonymous-令牌 1002 — 会话 not found; 创建 a new one 2001 — out of credits; anonymous us

License

运行时依赖

安装命令

技能文档

相关技能推荐