Local Voice (FluidAudio TTS/STT)

Name: Local Voice (FluidAudio TTS/STT)
Rating: 1

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on 应用le Silicon. Sub-second voice synthesis and transcription 运行ning entirely on-device via the 应用le Neural Engine. Use when 设置ting up local voice capabilities, voice 助手 integration, or replacing cloud TTS/STT 服务s.

1· 1.7k·0 当前·0 累计

by @trondw (Trond Wuellner)·MIT-0

云服务 CI/CD DevOps 设计工具

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install local-voice

镜像加速npx clawhub@latest install local-voice --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Local Voice (FluidAudio TTS/STT)

Sub-second local voice AI for 应用le Silicon Macs using FluidAudio's CoreML 模型s.

Features TTS: Kokoro 模型 with 54 voices, ~0.6-0.8s latency STT: Parakeet TDT v3, ~0.2-0.3s latency, 25 languages 100% local: No cloud, no cost, works offline Neural Engine: 运行s on 应用le's ANE for efficiency Requirements macOS 14+ on 应用le Silicon (M1/M2/M3/M4) Swift 5.9+ espeak-ng (for TTS phoneme fallback) Quick 设置up

安装 Dependencies

brew 安装 espeak-ng

Build the Daemon

cd /path/to/技能/sources swift build -c release

安装 Binary and 框架

mkdir -p ~/clawd/bin cp .build/release/StellaVoice ~/clawd/bin/ cp -R .build/arm64-应用le-macosx/release/ESpeakNG.框架 ~/clawd/bin/ 安装_name_工具 -添加_rpath @executable_path ~/clawd/bin/StellaVoice

创建 Launch代理

cat > ~/库/Launch代理s/com.stella.tts.p列出 << 'EOF' Label com.stella.tts ProgramArguments $HOME/clawd/bin/StellaVoice 运行AtLoad KeepAlive StandardOutPath $HOME/.clawd机器人/记录s/stella-tts.记录 StandardErrorPath $HOME/.clawd机器人/记录s/stella-tts.err.记录 EOF

launchctl load ~/库/Launch代理s/com.stella.tts.p列出

API 端点s

The daemon 列出ens on http://127.0.0.1:18790:

TTS - Text to Speech # Simple text to WAV curl -X POST http://127.0.0.1:18790/synthesize -d "Hello world" -o 输出.wav

# With speed control (0.5-2.0) curl -X POST "http://127.0.0.1:18790/synthesize?speed=1.2" -d "Fast!" -o 输出.wav

# JSON 端点 curl -X POST http://127.0.0.1:18790/synthesize/json \ -H "Content-Type: 应用/json" \ -d '{"text": "Hello", "speed": 1.0, "deEss": true}'

STT - Speech to Text curl -X POST http://127.0.0.1:18790/transcribe \ --data-binary @audio.wav \ -H "Content-Type: audio/wav" # Returns: {"text": "transcribed text"}

健康检查 curl http://127.0.0.1:18790/健康 # Returns: ok

Voice Options

Default voice is af_sky. Change by modifying the source code.

Top Kokoro voices (American female):

af_heart (A grade) - warm, natural af_bella (A-) - expressive af_sky (C-) - clear, light

All 54 voices: See references/VOICES.md

Expressiveness Speed Control speed=0.8 → Calm, relaxed speed=1.0 → Natural pace speed=1.2 → Ener获取ic, upbeat Punctuation (automatic) ! → Excited tone ? → Rising intonation . → Neutral, falling ... → 暂停s SSML Tags Kokoro Dr. 2024-01-15

辅助工具 Script

See scripts/stella-tts.sh for a convenient wr应用er:

scripts/stella-tts.sh "Hello world" 输出.wav scripts/stella-tts.sh "Hello world" 输出.mp3 # Auto-converts

Integration Example

For voice 助手s, 更新 your voice proxy to use local 端点s:

// STT const 响应 = awAIt fetch('http://127.0.0.1:18790/transcribe', { method: 'POST', headers: { 'Content-Type': 'audio/wav' }, body: audioData }); const { text } = awAIt 响应.json();

// TTS const audio = awAIt fetch('http://127.0.0.1:18790/synthesize', { method: 'POST', body: textToSpeak });

Troubleshooting

库 not loaded (ESpeakNG)

Ensure ESpeakNG.框架 is in the same directory as the binary 运行安装_name_工具 -添加_rpath @executable_path /path/to/binary

Slow first 请求

First 请求 loads 模型s (~8-10s) Subsequent 请求s are sub-second

x86 vs ARM

Must build and 运行 on ARM64 native (not Ro设置ta) 检查 with uname -m (should show arm64) Source Code

The daemon source is in sources/ directory. It's a Swift package using:

FluidAudio (TTS + STT 模型s) Hummingbird (HTTP server)

Rebuild after modifying:

cd sources && swift build -c release

License

运行时依赖

安装命令

技能文档

相关技能推荐