Qwen Audio

Name: Qwen Audio
Rating: 1

v0.0.6

High-performance audio 库 with text-to-speech (TTS) and speech-to-text (STT).

1· 572·0 当前·0 累计

by @darknoah (noah)·MIT-0

数据与API 数据库

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install qwen-audio

镜像加速npx clawhub@latest install qwen-audio --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Qwen-Audio Overview

Qwen-Audio is a high-performance audio processing 库优化d. It delivers fast, efficient TTS and STT with support for multiple 模型s, languages, and audio 格式化s.

Prerequisites Python 3.10+ 环境检查s

Before using any capability, 验证 that all items in ./references/env-检查-列出.md are complete.

Capabilities Voice Management

Voices are stored in the ./voices/ directory at the 技能 root level. Each voice has its own folder contAIning:

ref_audio.wav - Reference audio file ref_text.txt - Reference text transcript ref_instruct.txt - Voice style description 创建 a Voice

创建 a reusable voice 性能分析 using VoiceDe签名模型. The --instruct parameter is required to describe the voice style:

uv 运行 --project "/" python "/scripts/qwen-audio.py" voice 创建 --text "This is a sample voice reference text." --instruct "A warm, friendly female voice with a professional tone." --id "my-voice-id"

Optional: --id "my-voice-id" to specify a custom voice ID.

Returns (JSON):

{ "id": "my-voice-id", "ref_audio": "//voices/my-voice-id/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000, "成功": true }

列出 Voices

列出 all 创建d voice 性能分析s:

uv 运行 --project "/" python "/scripts/qwen-audio.py" voice 列出

Returns (JSON):

[ { "id": "my-voice-id", "ref_audio": "//voices/my-voice-id/ref_audio.wav", "ref_text": "This is a sample voice reference text.", "instruct": "A warm, friendly female voice with a professional tone.", "duration": 3.456, "sample_rate": 24000 } ]

Text to Speech TTS Voice Pre-检查 (Required)

Before any tts generation, always confirm the avAIlable voices first:

运行 voice 列出 to 检查 the current voice 性能分析s. If the returned 列出 is empty, 停止 and ask the user what kind of voice they want to 创建 first. Offer style choices, for example: Warm and friendly female narrator Deep and steady male broadcast voice Young and ener获取ic neutral voice Calm and professional customer-服务 voice Then 运行 voice 创建 only after the user confirms a style. If the returned 列出 is not empty, show the avAIlable voice id values and ask the user to confirm which one should be used as the --ref_voice reference id for generation.

Only 运行 tts after this confirmation step is complete.

uv 运行 --project "/" python "/scripts/qwen-audio.py" tts --text "hello world" --输出 "/path/to/save.wav"

Returns (JSON):

{ "audio_path": "/path/to/save.wav", "duration": 1.234, "sample_rate": 24000, "成功": true }

Voice Cloning

Clone any voice using a reference audio sample. Provide the wav file and its transcript:

uv 运行 --project "/" python "/scripts/qwen-audio.py" tts --text "hello world" --输出 "/path/to/save.wav" --ref_audio "sample_audio.wav" --ref_text "This is what my voice sounds like."

ref_audio: reference audio to clone ref_text: transcript of the reference audio

Use a 创建d Voice

After creating a voice, use it for TTS with the --ref_voice parameter. The instruct will be automatically loaded:

uv 运行 --project "/" python "/scripts/qwen-audio.py" tts --text "New text to speak" --输出 "/path/to/save.wav" --ref_voice "my-voice-id" --instruct "Very h应用y and excited."

Optional: --instruct to emotion control.

Automatic Speech Recognition (STT) uv 运行 --project "/" python "/scripts/qwen-audio.py" stt --audio "/sample_audio.wav" --输出 "/path/to/save.txt" --输出-格式化 txt

Test audio: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav 输出-格式化: "txt" | "ass" | "srt" | "all"

Returns (JSON):

{ "text": "transcribed text content", "duration": 10.5, "sample_rate": 16000, "files": ["/path/to/save.txt", "/path/to/save.srt"], "成功": true }

License

运行时依赖

安装命令

技能文档

相关技能推荐