Qwen3-TTS VoiceDesign — 自然语言文本到语音，自定义语音设计

Name: Qwen3-TTS VoiceDesign — 自然语言文本到语音，自定义语音设计
Author: xiaoyaner0201

xiaoyaner0201

Qwen3-TTS VoiceDesign — 自然语言文本到语音，自定义语音设计

v1.0.0

Qwen3-TTS VoiceDesign 实现文本到语音的功能，允许通过自然语言描述和基于种子（seed）的音调固定设计自定义语音。包含 OpenAI 兼容的 API 服务器、一键式设置和批量种子探索工具，适用于语音生成、语音设计和集成到 OpenClaw。

0· 521·0 当前·0 累计

by @xiaoyaner0201·MIT-0

音频处理 AI模型访问开发工具 API工具

下载技能包

License

MIT-0

最后更新

2026/4/10

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

安全

medium confidence

该包和脚本与自托管 TTS 服务器一致：安装 Python 依赖项、下载 TTS 模型并提供客户端/服务器脚本。文件中没有请求无关凭据或隐藏数据泄露端点，但安装前应审查一些操作风险。

评估建议

["预计大型下载 (~3.5GB) 和安装许多包（包括 torch/CUDA）；建议在受控环境或 VM/容器中运行。","服务器启动时清除 HTTP(S)_PROXY 环境变量；如果需要代理，请绑定到 127.0.0.1 或配置防火墙。","验证信任 ModelScope/HuggingFace 和指定模型仓库。","避免传递不受信任的文本到客户端 shell 脚本。","如果暴露服务器，确保安全（防火墙、反向代理、认证）"]...

详细分析 ▾

✓ 用途与能力

Name/description (Qwen3-TTS VoiceDesign TTS server + client tools) matches the included scripts: a FastAPI server, client helpers, setup script and seed-batching tooling. The declared behavior (model download, one-click setup, OpenAI-compatible API) is consistent with the code.

ℹ 指令范围

SKILL.md instructs running setup.sh which creates a venv, pip-installs dependencies, downloads the model (ModelScope or Hugging Face), and runs the server; the runtime scripts only reference their .env and local files. Notable scope items: the server code clears proxy environment variables at start (potentially bypassing a corporate proxy), and the docs show guidance to register scheduled tasks or systemd units (these are only instructions, not executed automatically). The client scripts build JSON bodies via shell interpolation (potential for malformed input/escaping issues if used with untrusted text).

ℹ 安装机制

There is no platform install spec, but setup.sh will pip-install packages (qwen-tts, soundfile, pydub, uvicorn, fastapi, numpy and possibly modelscope and torch from the official PyTorch index). It downloads the ~3.5GB model via ModelScope or Hugging Face. These are expected for a local TTS runtime but do involve network access and large binary downloads; the sources used (ModelScope/HuggingFace, PyTorch wheel index) are standard release hosts rather than arbitrary shorteners.

✓ 凭证需求

The skill requests no credentials and exposes only environment variables relevant to running a local TTS server (seed, instruct, model path, host/port, format). The only surprising behavior is that the server explicitly clears HTTP(S) proxy environment variables at startup, which may affect network routing on hosts that rely on proxies; this is operational (not credential) behavior and not an attempt to read secrets.

✓ 持久化与权限

The skill is not always-enabled and does not attempt to change other skills' config. setup.sh suggests how to create systemd units or a Windows scheduled task, but it does not automatically create system-level services or elevate privileges. You must run setup/start manually, so persistence is user-controlled.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/2/25

初始发布：通过自然语言和种子固定的 VoiceDesign 声音设计，OpenAI 兼容 API 服务器，一键式设置和批量种子探索工具。

● 可疑

安装命令点击复制

官方npx clawhub@latest install qwen3-tts-voicedesign

镜像加速npx clawhub@latest install qwen3-tts-voicedesign --registry https://cn.clawhub-mirror.com

技能文档

自然语言文本到语音 + 基于种子的音调固定设计... （中间代码块和 Markdown 格式保持不变，仅翻译了非代码部分）

Text → Speech with natural language voice descriptions + seed-based timbre fixation.

Quick Start

# Generate speech (uses server defaults) TTS_URL=http://your-server:8881 scripts/say.sh "Hello world!" # Save to file scripts/say.sh "Save this" output.mp3

# Batch compare seeds (voice exploration) scripts/batch_seeds.sh "Hello world!" 42 123 201 456 789 /tmp/seeds

Environment Variables

All config via env vars — text is the only required argument:

Variable	Default	Description
`TTS_URL`	`http://localhost:8881`	Server base URL (client side)
`TTS_SEED`	`4096`	Random seed → controls timbre
`TTS_INSTRUCT`	(generic female voice)	Voice description prompt
`TTS_MODEL_PATH`	`Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign`	Model weights path
`TTS_PORT`	`8881`	Server listen port
`TTS_HOST`	`0.0.0.0`	Server bind address
`TTS_FORMAT`	`mp3`	Output format: `mp3` / `wav`

Server reads from .env file in its directory. Client scripts read from shell env.

Voice Description Example

30岁男性播音员，声音低沉磁性，
语速稳重从容，咬字清晰标准，
像新闻联播主播的专业感，又带一点温暖。

Tip: Once you've found your perfect voice (description + seed), set them as server defaults in .env. Then client calls only need to pass text.

API

OpenAI-Compatible

curl -X POST $TTS_URL/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello!"}' -o speech.mp3

Custom (seed + instruct override)

curl -X POST $TTS_URL/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "seed": 201, "instruct": "温柔女生"}' -o speech.mp3

GET (quick test)

curl "$TTS_URL/tts?text=Hello&seed=201" -o test.mp3

Seed Mechanics

Same (description + seed) → same timbre. Different seeds → completely different voices.

⚠️ Seeds are purely random — seed 42 and 43 can sound completely different. Finding a voice = opening blind boxes.

Workflow: fix description → batch 30-40 seeds → listen → shortlist 2-3 → compare across scenarios → pick.

Deploy Your Own

# One-click setup (Python 3.10+ and CUDA GPU required)
bash scripts/setup.sh ./my-tts
# Configure voice in .env
echo 'TTS_SEED=201' >> ./my-tts/.env
echo 'TTS_INSTRUCT=Your voice description here' >> ./my-tts/.env# Start server
bash scripts/setup.sh start ./my-tts

Setup installs: qwen-tts, soundfile, pydub, uvicorn, fastapi, torch (CUDA). Downloads VoiceDesign model (~3.5GB) via ModelScope (China) or HuggingFace.

Requirements: CUDA GPU with 4GB+ VRAM, Python 3.10+, ~4GB disk.

Scripts

Script	Purpose
`scripts/say.sh`	Generate speech — `say.sh "text" [output.mp3]`
`scripts/batch_seeds.sh`	Compare seeds — `batch_seeds.sh "text" seed1 seed2 ...`
`scripts/tts_server.py`	FastAPI server (fully env-configurable)
`scripts/setup.sh`	One-click deploy (venv + deps + model download)

OpenClaw Integration

In openclaw.json:

{
  "env": { "OPENAI_TTS_BASE_URL": "http://:8881/v1" },
  "messages": {
    "tts": {
      "provider": "openai",
      "openai": { "apiKey": "dummy", "model": "qwen3-tts", "voice": "default" },
      "timeoutMs": 120000
    }
  }
}

Server Management

# Health check
curl -s $TTS_URL/health
# Start (foreground)
python tts_server.py
# Start (background, Linux/macOS)
nohup python tts_server.py > server.log 2>&1 &
# Auto-restart (Windows — scheduled task + guard script)
# Create tts_guard.bat:
#   @echo off
#   :loop
#   python tts_server.py
#   timeout /t 10
#   goto loop
# Register: schtasks /create /tn "TTS-Guard" /tr "tts_guard.bat" /sc onlogon /rl highest
# Auto-restart (Linux — systemd)
# See setup.sh output for systemd unit template# Stop
# Linux/macOS: kill $(lsof -ti:8881)
# Windows: for /f "tokens=5" %a in ('netstat -aon ^| findstr :8881') do taskkill /PID %a /F

Troubleshooting

Connection refused → Server not running; start it
30s+ first request → Cold start (model loading ~60s); subsequent requests 10-15s
Behind proxy → Set NO_PROXY= on client side
Windows firewall → netsh advfirewall firewall add rule name="TTS" dir=in action=allow protocol=TCP localport=8881
No flash-attn on Windows → Expected; falls back to PyTorch SDPA (slower but works)
PowerShell corrupts Chinese → Edit .env/config via Python or SCP, not PowerShell Set-Content
Process dies on SSH disconnect → Use scheduled task (Windows) or systemd (Linux) instead of foreground

Voice Design Tips

Describe like casting a voice actor:

Age/gender: "18岁女大学生" / "30岁男性播音员"
Texture: "柔和温暖" / "清脆明亮" / "低沉磁性"
Emotion: "轻柔细腻" / "活泼开朗"
Accent: "南方口音软糯" / "台湾腔" / "东北大碴子味"
Metaphor: "像棉花糖" / "像播音主持" (helps the model capture feeling)

⚠️ Timbre ≠ description. Description controls style/emotion; seed controls timbre. Don't put personality traits ("灵动俏皮") in description — that's the seed's job.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Quick Start

Environment Variables

Voice Description Example

API

OpenAI-Compatible

Custom (seed + instruct override)

GET (quick test)

Seed Mechanics

Deploy Your Own

Scripts

OpenClaw Integration

Server Management

Troubleshooting

Voice Design Tips

安装命令点击复制