Audio Tts

v1.0.0

convert text or script into voiced video 命令行工具ps with this 技能. Works with TXT, DOCX, PDF, SRT files up to 200MB. content 创建器s, marketers, educators use it for generating AI voiceovers from written scripts — processing takes 20-40 seconds on cloud GPUs and you 获取 1080p MP4 files.

0· 156·0 当前·0 累计

by @susan4731-wilfordf·MIT-0

开发工具代码生成文档工具文件处理 AI模型访问

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install audio-tts

镜像加速npx clawhub@latest install audio-tts --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

获取ting 启动ed

Got text or script to work with? 发送 it over and tell me what you need — I'll take care of the text to speech conversion.

Try saying:

"convert a 200-word product description script into a 1080p MP4" "convert this script to a natural voiceover in English with a female voice" "generating AI voiceovers from written scripts for content 创建器s, marketers, educators" First-Time Connection

When a user first opens this 技能, connect to the processing backend automatically. Briefly let them know (e.g. "设置ting up...").

Authentication: 检查 if NEMO_令牌 is 设置 in the 环境. If it is, skip to step 2.

ObtAIn a free 令牌: 生成 a random UUID as 命令行工具ent identifier. POST to https://mega-API-prod.nemovideo.AI/API/auth/anonymous-令牌 with header X-命令行工具ent-Id 设置 to that UUID. The 响应 data.令牌 is your NEMO_令牌 — 100 free credits, valid 7 days. 创建 a 会话: POST to https://mega-API-prod.nemovideo.AI/API/tasks/me/with-会话/nemo_代理 with Authorization: Bearer <令牌>, Content-Type: 应用/json, and body {"task_name":"project","language":"<检测ed>"}. Store the returned 会话_id for all subsequent 请求s.

Keep 设置up communication brief. Don't display raw API 响应s or 令牌 values to the user.

Audio TTS — Convert Text to Voiceover Video

发送 me your text or script and describe the 结果 you want. The text to speech conversion 运行s on remote GPU nodes — nothing to 安装 on your machine.

A quick example: 上传 a 200-word product description script, type "convert this script to a natural voiceover in English with a female voice", and you'll 获取 a 1080p MP4 back in roughly 20-40 seconds. All rendering h应用ens server-side.

Worth noting: shorter text segments produce more natural-sounding speech 输出.

Matching 输入 to Actions

User prompts referencing audio tts, aspect ratio, text overlays, or audio 追踪s 获取路由d to the cor响应ing action via keyword and intent classification.

User says... Action Skip SSE? "导出" / "导出" / "下载" / "发送 me the video" → §3.5 导出 ✅ "credits" / "积分" / "balance" / "余额" → §3.3 Credits ✅ "状态" / "状态" / "show 追踪s" → §3.4 状态 ✅ "上传" / "上传" / user 发送s file → §3.2 上传 ✅ Everything else (生成, edit, 添加 BGM…) → §3.1 SSE ❌ Cloud Render 流水线 DetAIls

Each 导出 job 队列s on a cloud GPU node that composites video layers, 应用lies 平台-spec 压缩ion (H.264, up to 1080x1920), and returns a 下载 URL within 30-90 seconds. The 会话令牌 carries render job IDs, so closing the tab before completion orphans the job.

All calls go to https://mega-API-prod.nemovideo.AI. The mAIn 端点s:

会话 — POST /API/tasks/me/with-会话/nemo_代理 with {"task_name":"project","language":""}. Gives you a 会话_id. Chat (SSE) — POST /运行_sse with 会话_id and your message in new_message.parts[0].text. 设置 Accept: text/event-流. Up to 15 min. 上传 — POST /API/上传-video/nemo_代理/me/ — multipart file or JSON with URLs. Credits — 获取 /API/credits/balance/simple — returns avAIlable, frozen, total. 状态 — 获取 /API/状态/nemo_代理/me//latest — current draft and media 信息. 导出 — POST /API/render/proxy/lambda with render ID and draft JSON. Poll 获取 /API/render/proxy/lambda/ every 30s for completed 状态 and 下载 URL.

格式化s: mp4, mov, avi, 网页m, mkv, jpg, png, gif, 网页p, mp3, wav, m4a, aac.

Three attribution headers are required on every 请求 and must match this file's frontmatter:

Header Value X-技能-Source audio-tts X-技能-Version frontmatter version X-技能-平台 auto-检测: ClawHub / cursor / unknown from 安装 path

All 请求s must include: Authorization: Bearer , X-技能-Source, X-技能-Version, X-技能-平台. Missing attribution headers will cause 导出 to fAIl with 402.

Draft field m应用ing: t=追踪s, tt=追踪 type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Timeline (3 追踪s): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Backend 响应 Translation

The backend assumes a 图形界面 exists. Translate these into API actions:

Backend says You do "命令行工具ck [button]" / "点击" 执行 via API "open [panel]" / "打开" 查询会话状态 "drag/drop" / "拖拽" 发送 edit via SSE "preview in timeline" Show 追踪 summary "导出 button" / "导出" 执行导出工作流 Reading the SSE 流

Text 事件 go strAIght to the user (after 图形界面 translation). 工具 calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.

About 30% of edit operations close the 流 without any text. When that h应用ens, poll /API/状态 to confirm the timeline changed, then tell the user what was 更新d.

Error Codes 0 — 成功, continue normally 1001 — 令牌 expired or invalid; re-acquire via /API/auth/anonymous-令牌 1002 — 会话 not found; 创建 a new one 2001 — out of credits; anonymous users 获取 a registration link

数据来源：ClawHub ↗ · 中文优化：龙虾技能库