首页龙虾技能列表 › macOS Local Voice — 技能工具

🎙️ macOS Local Voice — 技能工具

v1.0.0

[自动翻译] Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully of...

0· 1,664·13 当前·13 累计
by @strrl (STRRL)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/11
安全扫描
VirusTotal
可疑
查看报告
OpenClaw
安全
high confidence
The skill's code, instructions, and required binaries line up with its stated purpose (local macOS STT/TTS) and do not request unrelated credentials or network access.
评估建议
This skill appears coherent and local-only, but review these before installing: 1) Install yap and ffmpeg from trusted package sources (Homebrew/taps shown in README). 2) The scripts invoke local system commands (yap, say, osascript, ffmpeg) — ensure you are comfortable granting the skill the ability to execute those on your Mac. 3) Generated audio is saved under ~/.openclaw/media/outbound and the SKILL suggests using the agent 'message' tool to send it — verify recipients before sending sensiti...
详细分析 ▾
用途与能力
Name/description (local macOS STT/TTS) match the actual requirements and behavior: required binaries are yap, say, and osascript (all relevant), code uses Apple Speech.framework via yap and AVFoundation via osascript, and ffmpeg is optional for format conversion.
指令范围
SKILL.md directs the agent to run the included scripts and to use local system settings for downloading voices. The instructions do not ask the agent to read unrelated files, export environment secrets, or contact remote endpoints. It does reference the agent 'message' tool for sending generated audio (expected for delivering voice notes).
安装机制
This is instruction-only with included scripts (no install spec). The README suggests installing yap/ffmpeg via Homebrew — a standard, traceable source. No downloads from arbitrary URLs or archive extraction are present.
凭证需求
No environment variables or credentials are requested. The scripts use HOME to write output under ~/.openclaw/media/outbound (reasonable for generated media). There are no requests for unrelated secrets or config paths.
持久化与权限
Skill is not forced always-on and does not modify other skills or system-wide settings. It only creates per-user media files in a reasonable path and requires user action to download premium voices.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/2/12

Initial release: Node.js rewrite. STT (yap) + TTS (say) + voice detection (JXA/AVFoundation). Fully offline, no API keys.

● 可疑

安装命令 点击复制

官方npx clawhub@latest install macos-local-voice
镜像加速npx clawhub@latest install macos-local-voice --registry https://cn.clawhub-mirror.com

技能文档

Fully local speech-to-text (STT) and text-to-speech (TTS) on macOS. No API keys, no network, no cloud. All processing happens on-device.

Requirements

  • macOS (Apple Silicon recommended, Intel works too)
  • yap CLI in PATH — install via brew install finnvoor/tools/yap
  • ffmpeg in PATH (optional, needed for ogg/opus output) — brew install ffmpeg
  • say and osascript are macOS built-in

Speech-to-Text (STT)

Transcribe an audio file to text using Apple's on-device speech recognition.

node {baseDir}/scripts/stt.mjs  [locale]
  • audio_file: path to audio (ogg, m4a, mp3, wav, etc.)
  • locale: optional, e.g. zh_CN, en_US, ja_JP. If omitted, uses system default.
  • Outputs transcribed text to stdout.

Supported STT locales

Use node {baseDir}/scripts/stt.mjs --locales to list all supported locales.

Key locales: en_US, en_GB, zh_CN, zh_TW, zh_HK, ja_JP, ko_KR, fr_FR, de_DE, es_ES, pt_BR, ru_RU, vi_VN, th_TH.

Language detection tips

  • If the user's recent messages are in Chinese → use zh_CN
  • If in English → use en_US
  • If mixed or unclear → try without locale (system default)

Text-to-Speech (TTS)

Convert text to an audio file using macOS native TTS.

node {baseDir}/scripts/tts.mjs "" [voice_name] [output_path]
  • text: the text to speak
  • voice_name: optional, e.g. Yue (Premium), Tingting, Ava (Premium). If omitted, auto-selects the best available voice based on text language.
  • output_path: optional, defaults to a timestamped file in ~/.openclaw/media/outbound/
  • Outputs the generated audio file path to stdout.
  • If ffmpeg is available, output is ogg/opus (ideal for messaging platforms). Otherwise aiff.

Sending as voice note

After generating the audio file, send it using the message tool:

message action=send media= asVoice=true

Voice Management

List available voices, check readiness, or find the best voice for a language:

node {baseDir}/scripts/voices.mjs list [locale]     # List voices, optionally filter by locale
node {baseDir}/scripts/voices.mjs check ""     # Check if a specific voice is downloaded and ready
node {baseDir}/scripts/voices.mjs best        # Get the highest quality voice for a locale

Quality levels

  • 1 = compact (low quality, always available)
  • 2 = enhanced (mid quality, may need download)
  • 3 = premium (highest quality, needs download from System Settings)

If a voice is not available

Tell the user: "Voice X is not downloaded. Go to System Settings → Accessibility → Spoken Content → System Voice → Manage Voices to download it."

Notes

  • The say command silently falls back to a default voice if the requested voice is not available (exit code 0, no error). Always use voices.mjs check before calling tts.mjs with a specific voice name.
  • Premium voices (e.g. Yue (Premium), Ava (Premium)) sound significantly better but must be manually downloaded by the user.
  • Siri voices are not accessible via the speech synthesis API.
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务