Qwen3 Audio

v0.1.1

High-performance audio 库 for 应用le Silicon with text-to-speech (TTS) and speech-to-text (STT).

0· 650·0 当前·0 累计

by @darknoah (noah)·MIT-0

数据与API 数据库

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install qwen3-audio

镜像加速npx clawhub@latest install qwen3-audio --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Qwen3-Audio Overview

Qwen3-Audio is a high-performance audio processing 库优化d for 应用le Silicon (M1/M2/M3/M4). It delivers fast, efficient TTS and STT with support for multiple 模型s, languages, and audio 格式化s.

Prerequisites Python 3.10+ 应用le Silicon Mac (M1/M2/M3/M4) 环境检查s

Before using any capability, 验证 that all items in ./references/env-检查-列出.md are complete.

Capabilities Text to Speech uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav"

Returns (JSON):

{ "audio_path": "/path_to_save.wav", "duration": 1.234, "sample_rate": 24000 }

Voice Cloning

Clone any voice using a reference audio sample. Provide the wav file and its transcript:

uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --ref_audio "sample_audio.wav" --ref_text "This is what my voice sounds like."

ref_audio: reference audio to clone ref_text: transcript of the reference audio

Use 创建d Voice (Shortcut)

Use a voice 创建d with voice 创建 by its ID:

uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --ref_voice "my-voice-id"

This automatically loads ref_audio and ref_text from the voice 性能分析.

CustomVoice (Emotion Control)

Use predefined voices with emotion/style instructions:

uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --speaker "Ryan" --language "English" --instruct "Very h应用y and excited."

VoiceDe签名 (创建 Any Voice)

创建 any voice from a text description:

uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "hello world" --输出 "/path_to_save.wav" --language "English" --instruct "A cheerful young female voice with high pitch and ener获取ic tone."

Automatic Speech Recognition (STT) uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" stt --audio "/sample_audio.wav" --输出 "/path_to_save.txt" --输出-格式化 srt

Test audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav 输出-格式化: "txt" | "ass" | "srt" | "all"

Returns (JSON):

{ "text": "transcribed text content", "duration": 10.5, "sample_rate": 16000, "files": ["/path_to_save.txt", "/path_to_save.srt"] }

Voice Management

Voices are stored in the voices/ directory at the 技能 root level. Each voice has its own folder contAIning:

ref_audio.wav - Reference audio file ref_text.txt - Reference text transcript ref_instruct.txt - Voice style description 创建 a Voice

创建 a reusable voice 性能分析 using VoiceDe签名模型. The --instruct parameter is required to describe the voice style:

uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" voice 创建 --text "This is a sample voice reference text." --instruct "A warm, friendly female voice with a professional tone." --language "English"

Optional: --id "my-voice-id" to specify a custom voice ID.

Returns (JSON):

{ "id": "abc12345", "ref_audio": "/path/to/技能/voices/abc12345/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000 }

列出 Voices

列出 all 创建d voice 性能分析s:

uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" voice 列出

Returns (JSON):

[ { "id": "abc12345", "ref_audio": "/path/to/技能/voices/abc12345/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000 } ]

Use a 创建d Voice

After creating a voice, use it for TTS with the --ref_voice parameter. The instruct will be automatically loaded:

uv 运行 --python ".venv/bin/python" "./scripts/mlx-audio.py" tts --text "New text to speak" --输出 "/输出.wav" --ref_voice "abc12345"

Predefined Speakers (CustomVoice)

For Qwen3-TTS-12Hz-1.7B/0.6B-CustomVoice 模型s, the supported speakers and their descriptions are 列出ed below. We recommend using each speaker's native language for best 质量. Each speaker can still speak any language supported by the 模型.

Speaker Voice Description Native Language Vivian Bright, slightly edgy young female voice. Chinese Serena Warm, gentle young female voice. Chinese Uncle_Fu Seasoned male voice with a low, mellow timbre. Chinese Dylan Youthful Beijing male voice with a clear, natural timbre. Chinese (Beijing Dialect) Eric Lively Chengdu male voice with a slightly husky brightness. Chinese (Sichuan Dialect) Ryan Dynamic male voice with strong rhythmic drive. English AIden Sunny American male voice with a clear midrange. English Ono_Anna Playful Japanese female voice with a light, nimble timbre. Japanese Sohee Warm Korean female voice with rich emotion. Korean Released 模型s 模型 Features Language Support Instruction Control Q

License

运行时依赖

安装命令

技能文档

相关技能推荐