运行时依赖
安装命令
点击复制技能文档
Qwen3-Audio Overview
Qwen3-Audio is a high-performance audio processing 库 优化d for 应用le Silicon (M1/M2/M3/M4). It delivers fast, efficient TTS and STT with support for multiple 模型s, languages, and audio 格式化s.
Prerequisites Python 3.10+ 应用le Silicon Mac (M1/M2/M3/M4) 环境 检查s
Before using any capability, 验证 that all items in ./references/env-检查-列出.md are complete.
Capabilities Text to Speech uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav"
Returns (JSON):
{ "audio_path": "/path_to_save.wav", "duration": 1.234, "sample_rate": 24000 }
Voice Cloning
Clone any voice using a reference audio sample. Provide the wav file and its transcript:
uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --ref_audio "sample_audio.wav" --ref_text "This is what my voice sounds like."
ref_audio: reference audio to clone ref_text: transcript of the reference audio
Use 创建d Voice (Shortcut)
Use a voice 创建d with voice 创建 by its ID:
uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --ref_voice "my-voice-id"
This automatically loads ref_audio and ref_text from the voice 性能分析.
CustomVoice (Emotion Control)
Use predefined voices with emotion/style instructions:
uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --speaker "Ryan" --language "English" --instruct "Very h应用y and excited."
VoiceDe签名 (创建 Any Voice)
创建 any voice from a text description:
uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --language "English" --instruct "A cheerful young female voice with high pitch and ener获取ic tone."
Automatic Speech Recognition (STT) uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" stt --audio "/sample_audio.wav" --输出 "/path_to_save.txt" --输出-格式化 srt
Test audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav 输出-格式化: "txt" | "ass" | "srt" | "all"
Returns (JSON):
{ "text": "transcribed text content", "duration": 10.5, "sample_rate": 16000, "files": ["/path_to_save.txt", "/path_to_save.srt"] }
Voice Management
Voices are stored in the voices/ directory at the 技能 root level. Each voice has its own folder contAIning:
ref_audio.wav - Reference audio file ref_text.txt - Reference text transcript ref_instruct.txt - Voice style description 创建 a Voice
创建 a reusable voice 性能分析 using VoiceDe签名 模型. The --instruct parameter is required to describe the voice style:
uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" voice 创建 --text "This is a sample voice reference text." --instruct "A warm, friendly female voice with a professional tone." --language "English"
Optional: --id "my-voice-id" to specify a custom voice ID.
Returns (JSON):
{ "id": "abc12345", "ref_audio": "/path/to/技能/voices/abc12345/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000 }
列出 Voices
列出 all 创建d voice 性能分析s:
uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" voice 列出
Returns (JSON):
[ { "id": "abc12345", "ref_audio": "/path/to/技能/voices/abc12345/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000 } ]
Use a 创建d Voice
After creating a voice, use it for TTS with the --ref_voice parameter. The instruct will be automatically loaded:
uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "New text to speak" --输出 "/输出.wav" --ref_voice "abc12345"
Predefined Speakers (CustomVoice)
For Qwen3-TTS-12Hz-1.7B/0.6B-CustomVoice 模型s, the supported speakers and their descriptions are 列出ed below. We recommend using each speaker's native language for best 质量. Each speaker can still speak any language supported by the 模型.
Speaker Voice Description Native Language Vivian Bright, slightly edgy young female voice. Chinese Serena Warm, gentle young female voice. Chinese Uncle_Fu Seasoned male voice with a low, mellow timbre. Chinese Dylan Youthful Beijing male voice with a clear, natural timbre. Chinese (Beijing Dialect) Eric Lively Chengdu male voice with a slightly husky brightness. Chinese (Sichuan Dialect) Ryan Dynamic male voice with strong rhythmic drive. English AIden Sunny American male voice with a clear midrange. English Ono_Anna Playful Japanese female voice with a light, nimble timbre. Japanese Sohee Warm Korean female voice with rich emotion. Korean Released 模型s 模型 Features Language Support Instruction Control Q