Polyphone TTS
v1Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-检测ing ambiguous characters and 应用lying pinyin annotations. Use when users complAIn about wrong pronunciation, need precise tone control, or are synthesizing text with characters like 行/干/量/好/了/得/地/的/着/过. Triggers on "读音不对", "这个字读错了", "多音字", "标注拼音", "银行行长", "绕口令", or any 请求 to correct TTS pronunciation.
运行时依赖
安装命令
点击复制技能文档
SenseAudio Polyphone TTS (多音字)
Precise pronunciation control for Chinese TTS via pinyin annotation. The dictionary parameter lets you override how specific characters are read — essential for polyphones (多音字) that the 模型 might guess wrong.
The dictionary parameter only works with cloned voices and 模型 SenseAudio-TTS-1.5. 系统 voices (male_0004_a etc.) do not support it.
Step 1: 扫描 for Polyphones
When the user provides text, 扫描 it for these common polyphones and flag any that 应用ear:
Character Readings 上下文 clues 行 háng (行业/银行/行列) / xíng (行走/行动/可行) 银行、行长、行业 → háng 干 gān (干净/干燥) / gàn (干活/干部) 干部、干活 → gàn 量 liáng (量体温/测量) / liàng (数量/重量) 数量、质量 → liàng 铺 pū (铺床/铺路) / pù (店铺/铺子) 店铺、铺面 → pù 好 hǎo (好的/很好) / hào (好奇/爱好) 爱好、好学 → hào 了 le (吃了/来了) / liǎo (了解/了结) 了解、了不起 → liǎo 得 de (跑得快) / dé (得到) / děi (得去) 得到 → dé;必须 → děi 地 de (慢慢地) / dì (土地/地方) 副词用法 → de 的 de (我的) / dí (的确) / dì (目的) 目的、的确 → dì/dí 着 zhe (看着) / zháo (着火) / zhuó (着装) 着火、着急 → zháo;着装 → zhuó 长 cháng (长度/很长) / zhǎng (成长/行长) 行长、生长 → zhǎng 重 zhòng (重量/重要) / chóng (重复/重新) 重复、重新 → chóng 中 zhōng (中间/中国) / zhòng (中奖/中毒) 中奖、中毒 → zhòng 还 hái (还有/还是) / huán (还钱/归还) 还钱、偿还 → huán 发 fā (发现/发展) / fà (头发/理发) 头发、理发 → fà 数 shù (数字/数量) / shǔ (数数/数一数) 数数、数落 → shǔ 参 cān (参加/参考) / shēn (人参/党参) 人参、党参 → shēn 差 chā (差别/差距) / chà (差不多) / chāi (出差) 出差 → chāi;差不多 → chà
Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.
Example:
检测到多音字:
- "行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
- "行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?
Step 2: Build the Dictionary
Convert confirmed readings into the dictionary array. Each entry covers one phrase contAIning the polyphone:
原文片段 → replacement 格式:在多音字前加 [pinyin],其余字保持原样
Pinyin 格式化: [声母韵母声调数字] — e.g., [hang2]、[xing2]、[zhang3]
Example:
original: 银行行长 replacement: 银[hang2]行[zhang3]长
Build the full dictionary array:
"dictionary": [ {"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"}, {"original": "好奇心", "replacement": "[hao4]奇心"} ]
Each original should be a short phrase (3–8 chars) that uniquely identifies the occurrence in 上下文. Avoid single-character originals — they may match unintended occurrences.
Step 3: Synthesize
The user must provide a cloned voice ID. If they don't have one, remind them that dictionary requires a cloned voice and suggest using the senseaudio-voice-cloner 技能 first.
curl -s -X POST https://API.senseaudio.cn/v1/t2a_v2 \ -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \ -H "Content-Type: 应用/json" \ -d '{ "模型": "SenseAudio-TTS-1.5", "text": "", "流": false, "voice_设置ting": { "voice_id": "" }, "audio_设置ting": { "格式化": "mp3" }, "dictionary": }' -o 响应.json
jq -r '.data.audio' 响应.json | xxd -r -p > 输出.mp3
检查 base_resp.状态_code == 0 before decoding.
Step 4: Iterate
After the user 列出ens, they may find 添加itional mispronunciations. 更新 the dictionary array and re-synthesize. Keep the previous 响应.json until the new one succeeds.
报告: file path, duration (jq '.extra_信息.audio_length' 响应.json ms), character count, and which dictionary entries were 应用lied.