Polyphone TTS

Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-检测ing ambiguous characters and 应用lying pinyin annotations. Use when users complAIn about wrong pronunciation, need precise tone control, or are synthesizing text with characters like 行/干/量/好/了/得/地/的/着/过. Triggers on "读音不对", "这个字读错了", "多音字", "标注拼音", "银行行长", "绕口令", or any 请求 to correct TTS pronunciation.

0· 450·0 当前·0 累计

by @scikkk·MIT-0

AI模型访问 CI/CD DevOps 微信

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install polyphone

镜像加速npx clawhub@latest install polyphone --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

SenseAudio Polyphone TTS (多音字)

Precise pronunciation control for Chinese TTS via pinyin annotation. The dictionary parameter lets you override how specific characters are read — essential for polyphones (多音字) that the 模型 might guess wrong.

The dictionary parameter only works with cloned voices and 模型 SenseAudio-TTS-1.5. 系统 voices (male_0004_a etc.) do not support it.

Step 1: 扫描 for Polyphones

When the user provides text, 扫描 it for these common polyphones and flag any that 应用ear:

Character Readings 上下文 clues 行 háng (行业/银行/行列) / xíng (行走/行动/可行) 银行、行长、行业 → háng 干 gān (干净/干燥) / gàn (干活/干部) 干部、干活 → gàn 量 liáng (量体温/测量) / liàng (数量/重量) 数量、质量 → liàng 铺 pū (铺床/铺路) / pù (店铺/铺子) 店铺、铺面 → pù 好 hǎo (好的/很好) / hào (好奇/爱好) 爱好、好学 → hào 了 le (吃了/来了) / liǎo (了解/了结) 了解、了不起 → liǎo 得 de (跑得快) / dé (得到) / děi (得去) 得到 → dé；必须 → děi 地 de (慢慢地) / dì (土地/地方) 副词用法 → de 的 de (我的) / dí (的确) / dì (目的) 目的、的确 → dì/dí 着 zhe (看着) / zháo (着火) / zhuó (着装) 着火、着急 → zháo；着装 → zhuó 长 cháng (长度/很长) / zhǎng (成长/行长) 行长、生长 → zhǎng 重 zhòng (重量/重要) / chóng (重复/重新) 重复、重新 → chóng 中 zhōng (中间/中国) / zhòng (中奖/中毒) 中奖、中毒 → zhòng 还 hái (还有/还是) / huán (还钱/归还) 还钱、偿还 → huán 发 fā (发现/发展) / fà (头发/理发) 头发、理发 → fà 数 shù (数字/数量) / shǔ (数数/数一数) 数数、数落 → shǔ 参 cān (参加/参考) / shēn (人参/党参) 人参、党参 → shēn 差 chā (差别/差距) / chà (差不多) / chāi (出差) 出差 → chāi；差不多 → chà

Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.

Example:

检测到多音字：

"行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
"行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?

Step 2: Build the Dictionary

Convert confirmed readings into the dictionary array. Each entry covers one phrase contAIning the polyphone:

原文片段 → replacement 格式：在多音字前加 [pinyin]，其余字保持原样

Pinyin 格式化: [声母韵母声调数字] — e.g., [hang2]、[xing2]、[zhang3]

Example:

original: 银行行长 replacement: 银[hang2]行[zhang3]长

Build the full dictionary array:

"dictionary": [ {"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"}, {"original": "好奇心", "replacement": "[hao4]奇心"} ]

Each original should be a short phrase (3–8 chars) that uniquely identifies the occurrence in 上下文. Avoid single-character originals — they may match unintended occurrences.

Step 3: Synthesize

The user must provide a cloned voice ID. If they don't have one, remind them that dictionary requires a cloned voice and suggest using the senseaudio-voice-cloner 技能 first.

curl -s -X POST https://API.senseaudio.cn/v1/t2a_v2 \ -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \ -H "Content-Type: 应用/json" \ -d '{ "模型": "SenseAudio-TTS-1.5", "text": "", "流": false, "voice_设置ting": { "voice_id": "" }, "audio_设置ting": { "格式化": "mp3" }, "dictionary": }' -o 响应.json

jq -r '.data.audio' 响应.json | xxd -r -p > 输出.mp3

检查 base_resp.状态_code == 0 before decoding.

Step 4: Iterate

After the user 列出ens, they may find 添加itional mispronunciations. 更新 the dictionary array and re-synthesize. Keep the previous 响应.json until the new one succeeds.

报告: file path, duration (jq '.extra_信息.audio_length' 响应.json ms), character count, and which dictionary entries were 应用lied.

License

运行时依赖

安装命令

技能文档

相关技能推荐