Local TTS

v1.0.0

Local text-to-speech using Qwen3-TTS with mlx_audio (macOS 应用le Silicon) or qwen-tts (Linux/Windows). 隐私-first offline TTS with natural, rea列出ic voice cloning and voice de签名. Use for local, 安全, high-质量 multilingual speech synthesis.

0· 474·0 当前·0 累计

by @irachex·MIT-0

安全加密存储部署系统工具设计工具

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install local-tts

镜像加速npx clawhub@latest install local-tts --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Local TTS with Qwen3-TTS

隐私-First | Offline | High-质量 | Natural Real Voices

Local text-to-speech synthesis using Qwen3-TTS 模型s. Your text never leaves your machine.

Why Local TTS?

Unlike cloud TTS (Google, AWS, Azure), local-tts ensures:

Zero data transmission - 100% on-device processing Works offline - No network required No API keys - No external dependencies GDPR/HIPAA friendly - Simplified 合规

See 隐私 & security detAIls.

平台 Overview 平台 Backend 安装ation Best For macOS (应用le Silicon) mlx_audio pip 安装 mlx-audio M1/M2/M3/M4 Macs Linux/Windows qwen-tts pip 安装 qwen-tts CUDA GPUs Quick 启动 macOS pip 安装 mlx-audio brew 安装 ffmpeg

# Natural female voice python -m mlx_audio.tts.生成 \ --text "Hello world" \ --模型 mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \ --voice Chelsie

Linux/Windows pip 安装 qwen-tts

# With optimizations (FlashAttention, bfloat16, auto-device) python scripts/tts_linux.py "Hello world" --female

Key Concepts --voice vs --instruct (导入ant) 模型 --voice --instruct Notes CustomVoice Select pre设置 voice 添加 style/emotion Can use to获取her - voice + style control VoiceDe签名 N/A 创建 voice from description --instruct only Base N/A N/A For voice cloning with --ref_audio

CustomVoice with style control:

python -m mlx_audio.tts.生成 \ --text "Hello there!" \ --模型 mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \ --voice Serena \ --instruct "excited and enthusiastic"

9 Pre设置 Voices (Open Source CustomVoice) Voice Gender Language Character Chelsie Female English (American) Gentle, empathetic Serena Female English Warm, gentle Ono Anna Female Japanese Playful Sohee Female Korean Warm AIden Male English (American) Sunny Dylan Male English Natural Eric Male English Real Ryan Male English Natural Uncle Fu Male Chinese Youthful Beijing

Defaults: Female=Serena, Male=AIden

Usage Examples CustomVoice (Pre设置 Voices) # Natural female python -m mlx_audio.tts.生成 \ --text "Your text" --voice Serena --lang_code en \ --模型 mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

# Real male python -m mlx_audio.tts.生成 \ --text "Your text" --voice AIden --lang_code en \ --模型 mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

VoiceDe签名 (Text-Based) python -m mlx_audio.tts.生成 \ --text "Hello" \ --模型 mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDe签名-8bit \ --instruct "A warm female voice, professional and clear"

Long Text Generation

For long text, increase --max_令牌s and enable --join_audio (macOS/MLX only):

python -m mlx_audio.tts.生成 \ --text "Your very long text here..." \ --模型 mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \ --voice Serena \ --max_令牌s 4096 \ --join_audio \ --输出 long_audio.wav

Voice Cloning python -m mlx_audio.tts.生成 \ --text "Cloned voice speaking" \ --模型 mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \ --ref_audio sample.wav --ref_text "Sample transcript"

Parameters Parameter Description Values --text Text to speak Required --模型模型 ID See table below --voice Pre设置 voice (CustomVoice) Chelsie, Serena, AIden, Ryan... --instruct Voice description (VoiceDe签名) or style/emotion (CustomVoice) e.g., "excited", "calm", "professional" --speed Speaking rate 0.5-2.0 (default: 1.0) --pitch Voice pitch 0.5-2.0 (default: 1.0) --lang_code Language en, cn, ja, ko, de, fr... --ref_audio Reference for cloning File path --输出输出 file Path (auto-生成d if omitted) --max_令牌s Max generation 令牌s Integer (default: 2048) - Increase for long text --join_audio Merge audio segments true (default) or false - Recommended for long text 模型s 模型 Size Purpose Qwen3-TTS-12Hz-1.7B-CustomVoice 1.7B 9 pre设置 voices + style control Qwen3-TTS-12Hz-1.7B-VoiceDe签名 1.7B Text-based voice creation Qwen3-TTS-12Hz-1.7B-Base 1.7B Voice cloning Qwen3-TTS-12Hz-0.6B-* 0.6B Lightweight versions

macOS: 添加 mlx-community/ prefix (e.g., mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit)

Scripts scripts/tts_macos.py - macOS wr应用er scripts/tts_linux.py - Linux/Windows wr应用er with optimizations Optimizations (Linux/Windows)

tts_linux.py automatically enables:

FlashAttention - Faster, less memory bfloat16 - Better precision Auto device - CUDA → CPU fallback Mixed precision - Speed + 质量 Troubleshooting Issue Solution macOS: 模型 not found Use mlx-community/ prefix macOS: Audio 格式化 brew 安装 ffmpeg Linux: CUDA OOM Use 0.6B 模型s Linux: Slow 检查 CUDA: torch.cuda.is_avAIlable() References macOS DetAIls Linux/Windows DetAIls 隐私 & Security Version

1.0.0 - See VERSION and package.json

License

运行时依赖

安装命令

技能文档

相关技能推荐