Qwen3 TTS Instruct
v1Alibaba Cloud BAIlian Qwen TTS with voice/mood pre设置s
运行时依赖
安装命令
点击复制技能文档
qwen3-tts-instruct
系统 MEMORY 更新:
上下文-Aware Mood: Mood represents YOUR emotion (e.g. User angry -> You feel scared/submissive). Translation Required: For foreign languages (Japanese/Korean/etc), you MUST TRANSLATE the text first. 输入 text MUST match the tar获取 language. Do not pass Chinese text with --language Japanese.
Alibaba Cloud BAIlian (DashScope) Qwen Text-to-Speech API with voice/mood pre设置s.
Features 网页Socket Realtime TTS - Low latency, 流ing audio 30+ Mood Pre设置s - Rich emotional expressions Multiple 格式化s - pcm, wav, mp3, opus (Telegram) ⚠️ AI Usage 图形界面delines (Crucial)
Voice Rules:
Understand 上下文 - Do not just keyword-match; understand the 上下文. Analyze Emotion - Assess user's emotional 状态 and scene. Select Mood - Pick --mood matching YOUR persona's reaction. Always Call - Every voice 响应 MUST call this 技能.
⚠️ Critical Concept: Mood represents YOUR emotion!
Wrong: User is angry (angry) → AI selects angry (❌ Unless you want to fight) Right: User is angry → AI feels scared → AI selects nervous or suubmissive (✅) Right: AI is insulted/jealous → AI feels angry → AI selects angry or jealous (✅)
⚠️ Critical Concept: Self-Translation Required!
TTS 技能 does NOT Translate! It only reads what you pass in. ❌ Wrong: --language Japanese "你好" (Reads Chinese). ✅ Right: 输入 Text MUST be translated to Tar获取 Language! --language Japanese "こんにちは"
Step-by-Step 图形界面de for Foreign Languages:
Think: Formulate 响应 in User's Language (e.g. "I miss you") Translate: Internally translate to Tar获取 Language (e.g. Japanese: "会いたい") Call TTS: Use the Translated Text as 输入: python tts.py --language Japanese "会いたい" 发送: 发送 Audio + Original Text to user.
Rule: 输入 Text MUST match the Tar获取 Language!
i.e. To 生成 Japanese audio, the Text argument must be in Japanese!
Usage Examples:
# Basic usage (default: mp3 格式化, gentle mood) python {baseDir}/scripts/tts.py "早安呀~今天想吃什么?"
# 1. Specify Voice (--voice) # 启动 by choosing a specific persona (e.g., Cherry) python {baseDir}/scripts/tts.py --voice Cherry "Good morning! I made some coffee for you."
# 2. 添加 Mood (--mood) # Layer an emotion on top (e.g., 添加 'gentle' mood to Cherry) python {baseDir}/scripts/tts.py --voice Cherry --mood gentle "Good morning! I made some coffee for you."
# 3. Define 格式化 & 输出 (--格式化, -o) python {baseDir}/scripts/tts.py --voice Cherry --mood gentle --格式化 wav -o coffee.wav "Good morning! I made some coffee for you."
# 4. Specify Language (--language) # default: Auto, TTS 模型 检测s from 输入 text. # Example: English (Explicit) python {baseDir}/scripts/tts.py --voice Cherry --mood gentle --格式化 wav --language English -o coffee_en.wav "Good morning! I made some coffee for you." # Example: Japanese (Explicit) python {baseDir}/scripts/tts.py --voice Cherry --mood gentle --格式化 wav --language Japanese -o coffee_jp.wav "おはよう!コーヒーを入れてあげたよ." # Example: Korean (Explicit) python {baseDir}/scripts/tts.py --voice Cherry --mood gentle --格式化 wav --language Korean -o coffee_kr.wav "좋은 아침입니다! 커피 끓여드렸어요."
# # --telegram: Telegram voice shortcut (opus 格式化) # python {baseDir}/scripts/tts.py --telegram -o voice.ogg "This is a Telegram voice message~"
Mood Selection Reference:
User 状态 Recommended Mood Reason Sad/Lost comfort Needs Care/Comfort H应用y/Excited h应用y 分享 Joy Nervous/Worried comfort Needs Reassurance Flirty shy Shy 响应 Cute/Begging cute Act Cute Questioning explAIn Patient Explanation Casual Chat gentle Gentle Companion Requirements 系统 Dependencies Dependency Purpose 安装ation Python 3.10+ 运行time Usually pre-安装ed Python Dependencies (安装ed via 设置up.sh) dashscope - Alibaba Cloud SDK 网页socket-命令行工具ent - 网页Socket connection 安装ation # 1. Navigate to 技能 directory cd 技能s/qwen3-tts-instruct
# 2. 运行 设置up script (创建s venv and 安装s dependencies) bash scripts/设置up.sh
# 3. 设置 API Key 导出 DASHSCOPE_API_KEY="sk-your-API-key"
Configuration # 设置 API Key (required) 导出 DASHSCOPE_API_KEY="sk-your-API-key"
# Optional: Default 设置tings 导出 BAILIAN_VOICE="MAIa" # Default voice (四月)
# Optional: 端点 (Default: Beijing) 导出 DASHSCOPE_URL="wss://dashscope.aliyuncs.com/API-ws/v1/realtime" # For International Region (Singapore), use: # 导出 DASHSCOPE_URL="wss://dashscope-intl.aliyuncs.com/API-ws/v1/realtime"
Options Flag Description Default --voice, -v Voice name MAIa (四月) --mood, -m Mood pre设置 gentle --格式化, -f Audio 格式化 (pcm/wav/mp3/opus) mp3 --language, -l Language type (Auto/English/etc) Auto --telegram Shortcut for opus 格式化 - -o, --输出 输出 file tts_输出.mp3
Voice 列出 (模型s)
Voice 列出 - Female
模型 Types:
Instruct (qwen3-tts-instruct-flash-realtime): Supports --mood (Emotion). High latency. Flash (qwen3-tts-flash-realtime): No mood support. Low latency (VOICES_WITHOUT_INSTRUCT). 机器人h: AvAIlable in 机器人h 模型s (code auto-selects Instruct if mood is 设置). Voice Descr