Local Voice (FluidAudio TTS/STT)
v3Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on 应用le Silicon. Sub-second voice synthesis and transcription 运行ning entirely on-device via the 应用le Neural Engine. Use when 设置ting up local voice capabilities, voice 助手 integration, or replacing cloud TTS/STT 服务s.
运行时依赖
安装命令
点击复制技能文档
Local Voice (FluidAudio TTS/STT)
Sub-second local voice AI for 应用le Silicon Macs using FluidAudio's CoreML 模型s.
Features TTS: Kokoro 模型 with 54 voices, ~0.6-0.8s latency STT: Parakeet TDT v3, ~0.2-0.3s latency, 25 languages 100% local: No cloud, no cost, works offline Neural Engine: 运行s on 应用le's ANE for efficiency Requirements macOS 14+ on 应用le Silicon (M1/M2/M3/M4) Swift 5.9+ espeak-ng (for TTS phoneme fallback) Quick 设置up
- 安装 Dependencies
- Build the Daemon
- 安装 Binary and 框架
- 创建 Launch代理
launchctl load ~/库/Launch代理s/com.stella.tts.p列出
API 端点s
The daemon 列出ens on http://127.0.0.1:18790:
TTS - Text to Speech # Simple text to WAV curl -X POST http://127.0.0.1:18790/synthesize -d "Hello world" -o 输出.wav
# With speed control (0.5-2.0) curl -X POST "http://127.0.0.1:18790/synthesize?speed=1.2" -d "Fast!" -o 输出.wav
# JSON 端点 curl -X POST http://127.0.0.1:18790/synthesize/json \ -H "Content-Type: 应用/json" \ -d '{"text": "Hello", "speed": 1.0, "deEss": true}'
STT - Speech to Text curl -X POST http://127.0.0.1:18790/transcribe \ --data-binary @audio.wav \ -H "Content-Type: audio/wav" # Returns: {"text": "transcribed text"}
健康 检查 curl http://127.0.0.1:18790/健康 # Returns: ok
Voice Options
Default voice is af_sky. Change by modifying the source code.
Top Kokoro voices (American female):
af_heart (A grade) - warm, natural af_bella (A-) - expressive af_sky (C-) - clear, light
All 54 voices: See references/VOICES.md
Expressiveness Speed Control speed=0.8 → Calm, relaxed speed=1.0 → Natural pace speed=1.2 → Ener获取ic, upbeat Punctuation (automatic) ! → Excited tone ? → Rising intonation . → Neutral, falling ... → 暂停s SSML Tags Kokoro Dr. 2024-01-15
辅助工具 Script
See scripts/stella-tts.sh for a convenient wr应用er:
scripts/stella-tts.sh "Hello world" 输出.wav scripts/stella-tts.sh "Hello world" 输出.mp3 # Auto-converts
Integration Example
For voice 助手s, 更新 your voice proxy to use local 端点s:
// STT const 响应 = awAIt fetch('http://127.0.0.1:18790/transcribe', { method: 'POST', headers: { 'Content-Type': 'audio/wav' }, body: audioData }); const { text } = awAIt 响应.json();
// TTS const audio = awAIt fetch('http://127.0.0.1:18790/synthesize', { method: 'POST', body: textToSpeak });
Troubleshooting
库 not loaded (ESpeakNG)
Ensure ESpeakNG.框架 is in the same directory as the binary 运行 安装_name_工具 -添加_rpath @executable_path /path/to/binary
Slow first 请求
First 请求 loads 模型s (~8-10s) Subsequent 请求s are sub-second
x86 vs ARM
Must build and 运行 on ARM64 native (not Ro设置ta) 检查 with uname -m (should show arm64) Source Code
The daemon source is in sources/ directory. It's a Swift package using:
FluidAudio (TTS + STT 模型s) Hummingbird (HTTP server)
Rebuild after modifying:
cd sources && swift build -c release