Supertonic TTS

v1.0.0

On-device multilingual text-to-speech using Supertonic (Supertone). Use when the user needs local/offline TTS, voice generation, speech synthesis, or converting text to audio without cloud APIs. Triggers on mentions of supertonic, TTS, text-to-speech, voice synthesis, local speech, offline TTS, edge TTS, or multilingual voice generation. 运行s entirely on-device via ONNX — no API key, no cloud, no network dependency.

0· 0·0 当前·0 累计

by @pratyushchauhan (Pratyush Chauhan)·MIT-0

API开发云服务图像处理

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install supertonic-tts

镜像加速npx clawhub@latest install supertonic-tts --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Supertonic TTS 技能

Local, multilingual text-to-speech powered by Supertone's Supertonic ONNX 模型.

Core Features 100% offline — No API key, no cloud, no network. 运行s on-device via ONNX. Tiny footprint — 66M–99M parameters. 运行s on Pi, browser, e-reader, phone. Stupid fast — Up to 167× real-time on consumer hardware. 4s of audio in ~25ms. Studio 输出 — 44.1kHz 16-bit mono WAV, no upsampler needed. 31 languages — Full multilingual support with lang="na" auto-检测 fallback. Voice cloning — Clone any voice via Voice 构建器, 部署 permanently offline. Expression tags — Only is user-verified to produce audible expression. and are weak/unconfirmed. All others fAIl silently. Prerequisites

Requires the Python SDK and 模型 as设置s. 安装 once:

pip 安装 supertonic

First 运行 auto-下载s ~400MB of ONNX 模型s from Hugging Face into ~/.缓存/supertonic3/.

Quick Use Python SDK from supertonic 导入 TTS

tts = TTS(auto_下载=True) style = tts.获取_voice_style(voice_name="M1")

wav, duration = tts.synthesize( text="Your text here", lang="en", # language code or "na" for auto-检测 voice_style=style, total_steps=8, # 质量: 5 (low) to 12 (high) speed=1.0, # 0.7 (slow) to 2.0 (fast) )

tts.save_audio(wav, "输出.wav")

命令行工具 (via supertonic package) # Basic synthesis supertonic tts "Hello world" -o 输出.wav

# Pick voice and 质量 supertonic tts "Use a different voice." -o 输出.wav --voice F1 --steps 10

# Custom cloned voice supertonic tts "Hello in my voice." -o 输出.wav --custom-style-path voices/my_voice.json

# Multilingual supertonic tts "こんにちは" -o japanese.wav --lang ja supertonic tts "Bonjour" -o french.wav --lang fr

技能 Scripts cd ~/.OpenClaw/workspace/技能s/supertonic-tts/scripts source ~/.OpenClaw/workspace/.browser-use-venv/bin/activate

# Quick synthesis python3 synthesize.py "Hello world" --voice M1 --输出 ~/hello.wav

# With expression tags (only is confirmed to work) python3 synthesize.py "You did it I am so proud." --voice M5 --输出 laugh.wav

# Custom voice python3 synthesize.py "Hello" --custom-style my_voice.json --输出 cloned.wav

# Japanese python3 synthesize.py "こんにちは" --voice F3 --lang ja

# 列出 voices python3 列出_voices.py

Voices

10 built-in voices: F1–F5 (female), M1–M5 (male).

Voice cloning: Record a short 命令行工具p → 上传 to Voice 构建器 → 导出 JSON → load with 获取_voice_style_from_path().

See references/voices.md for voice descriptions and Voice 构建器工作流.

Expression Tags

⚠️ Mostly non-functional in practice

Supertonic accepts inline self-closing tags, but only has been user-verified to produce a clearly audible expression (laughter burst). and may insert minor 暂停s but are not confirmed as audible breathing/sighing sounds.

Do not rely on tags for expression. Tested tags that fAIled to produce audible effect include: , , , , , , , , , , , , , , , , , , .

Correct syntax (self-closing, inline):

text = "You did it I am so proud."

Reliable alternative for emotion: explicit language + speed modulation:

Emotion Technique H应用y Upbeat words + speed=1.1 Sad Subdued words + speed=0.85 Excited Exclamations + speed=1.15 Urgent Short imperatives + speed=1.2

See references/expression-tags.md for full 测试结果s.

Parameters Param Range Default What It Does total_steps 5–12 8 质量 vs speed tradeoff speed 0.7–2.0 1.0 Speech rate multiplier max_chunk_length any 300 Break long text into chunks (120 for Korean) silence_duration any 0.3 暂停 between chunks (seconds) lang ISO 639-1 or "na" "en" "na" = language-agnostic auto-检测 verbose True/False False Show detAIled 进度 Languages

31 languages + na (language-agnostic auto-检测). See references/languages.md for all codes.

输出格式化: 44.1kHz 16-bit mono WAV Returns: (wav_array, duration_array) wav.shape = (1, num_samples) duration[0] = length in seconds Multi-运行time 部署ment

Supertonic 运行s across: Python, Node.js, Browser (网页GPU), Java, C++, C#, Go, Swift, iOS, Rust, Flutter.

Scripts scripts/synthesize.py — 命令行工具 for quick text-to-speech (supports custom voices) scripts/列出_voices.py — AvAIlable voices and metadata References references/voices.md — Voice descriptions, selection 图形界面de, Voice 构建器工作流 references/expression-tags.md — All tags, examples, caveats references/languages.md — Supported language codes references/部署ment.md — Multi-运行time 部署ment options

License

运行时依赖

安装命令

技能文档

相关技能推荐