📦 Text to Speech — 文本转语音
v2.22.0调用 HeyGen Starfish TTS 模型,将任意文本快速生成高质量语音文件,支持多音色选择,适用于配音、播报、语音内容创作等场景。
详细分析 ▾
运行时依赖
版本
自动发布自提交 ce047e148b0ae2d2b598d79d38ee6712cb4a6a67
安装命令
点击复制技能文档
使用 HeyGen 专有的 Starfish TTS 模型通过 v3 API 从文本生成语音音频文件。此技能用于独立音频生成——与视频创作分开。
认证
所有请求都需要 X-Api-Key 头。设置 HEYGEN_API_KEY 环境变量。
curl -X GET "https://api.heygen.com/v3/voices?engine=starfish" \
-H "X-Api-Key: $HEYGEN_API_KEY"
工具选择
如果 HeyGen MCP 工具可用(mcp__heygen__*),优先使用它们而非直接调用 HTTP API。
| 任务 | MCP 工具 | 备用方案(直接 API) |
|---|---|---|
| 列出 TTS 语音 | mcp__heygen__list_audio_voices | GET /v3/voices?engine=starfish |
| 生成语音音频 | mcp__heygen__text_to_speech | POST /v3/voices/speech |
默认工作流程
- 使用
mcp__heygen__list_audio_voices(或GET /v3/voices?engine=starfish)列出语音 - 选择符合所需语言、性别和功能的语音
- 使用文本和 voice_id 调用
mcp__heygen__text_to_speech(或POST /v3/voices/speech) - 使用返回的
audio_url下载或播放音频
列出 TTS 语音
检索与 Starfish TTS 模型兼容的语音。
注意: 使用统一的GET /v3/voices端点配合engine=starfish过滤器来返回仅支持 TTS 的语音。并非所有视频语音都支持 Starfish TTS。响应是分页的——使用next_token获取其他页面。
查询参数
| 参数 | 类型 | 描述 |
|---|---|---|
engine | string | 按引擎过滤(TTS 语音使用 starfish) |
type | string | public 或 private |
language | string | 按语言过滤 |
gender | string | 按性别过滤 |
limit | integer | 每页结果数,1-100 |
token | string | 来自 next_token 的分页游标 |
curl
curl -X GET "https://api.heygen.com/v3/voices?engine=starfish" \
-H "X-Api-Key: $HEYGEN_API_KEY"
TypeScript
interface AudioVoiceItem { voice_id: string; name: string; language: string; gender: "female" | "male" | "unknown"; preview_audio_url: string | null; support_pause: boolean; support_locale: boolean; type: string; }interface TTSVoicesResponse { error: null | string; data: AudioVoiceItem[]; has_more: boolean; next_token: string | null; }
async function listTTSVoices(): Promise { const allVoices: AudioVoiceItem[] = []; let token: string | null = null;
do { const url = new URL("https://api.heygen.com/v3/voices"); url.searchParams.set("engine", "starfish"); if (token) url.searchParams.set("token", token);
const response = await fetch(url.toString(), { headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! }, });
const json: TTSVoicesResponse = await response.json(); if (json.error) { throw new Error(json.error); }
allVoices.push(...json.data); token = json.next_token; } while (token);
return allVoices; }
Python
import requests import osdef list_tts_voices() -> list: all_voices = [] token = None
while True: params = {"engine": "starfish"} if token: params["token"] = token
response = requests.get( "https://api.heygen.com/v3/voices", headers={"X-Api-Key": os.environ["HEYGEN_API_KEY"]}, params=params, )
data = response.json() if data.get("error"): raise Exception(data["error"])
all_voices.extend(data["data"])
if not data.get("has_more"): break token = data.get("next_token")
return all_voices
响应格式
{
"error": null,
"data": [
{
"voice_id": "f38a635bee7a4d1f9b0a654a31d050d2",
"name": "Chill Brian",
"language": "English",
"gender": "male",
"preview_audio_url": "https://resource.heygen.ai/text_to_speech/WpSDQvmLGXEqXZVZQiVeg6.mp3",
"support_pause": true,
"support_locale": false,
"type": "public"
}
],
"has_more": false,
"next_token": null
}
生成语音音频
使用指定语音将文本转换为语音音频。
端点
POST https://api.heygen.com/v3/voices/speech
请求字段
| 字段 | 类型 | 必填 | 描述 |
|---|---|---|---|
text | string | Y | 要转换的文本内容(1-5000 个字符) |
voice_id | string | Y | 来自 GET /v3/voices?engine=starfish 的语音 ID |
input_type | string | "text"(默认)或 "ssml" 用于完整 SSML 标记 | |
speed | number | 语音速度,0.5-2.0(默认:1.0) | |
language | string | 基本语言代码(例如 "en"、"pt")。如果省略则自动检测 | |
locale | string | 多语言语音的 BCP-47 语言区域(例如 "en-US"、"pt-BR") |
curl
curl -X POST "https://api.heygen.com/v3/voices/speech" \
-H "X-Api-Key: $HEYGEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello! Welcome to our product demo.",
"voice_id": "YOUR_VOICE_ID",
"speed": 1.0
}'
TypeScript
interface TTSRequest { text: string; voice_id: string; input_type?: "text" | "ssml"; speed?: number; language?: string; locale?: string; }interface WordTimestamp { word: string; start: number; end: number; }
interface TTSResponse { error: null | string; data: { audio_url: string; duration: number; request_id?: string; word_timestamps?: WordTimestamp[]; }; }
async function textToSpeech(request: TTSRequest): Promise { const response = await fetch( "https://api.heygen.com/v3/voices/speech", { method: "POST", headers: { "X-Api-Key": process.env.HEYGEN_API_KEY!, "Content-Type": "application/json", }, body: JSON.stringify(request), } );
const json: TTSResponse = await response.json(); if (json.error) { throw new Error(json.error); }
return json.data; }
Python
import requests import osdef text_to_speech( text: str, voice_id: str, input_type: str = "text", speed: float = 1.0, language: str | None = None, locale: str | None = None, ) -> dict: payload = { "text": text, "voice_id": voice_id, "speed": speed, }
if input_type != "text": payload["input_type"] = input_type if language: payload["language"] = language if locale: payload["locale"] = locale
response = requests.post( "https://api.heygen.com/v3/voices/speech", headers={ "X-Api-Key": os.environ["HEYGEN_API_KEY"], "Content-Type": "application/json", }, json=payload, )
data = response.json() if data.get("error"): raise Exception(data["error"])
return data["data"]
响应格式
{
"error": null,
"data": {
"audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
"duration": 5.526,
"request_id": "p38QJ52hfgNlsYKZZmd9",
"word_timestamps": [
{ "word": "", "start": 0.0, "end": 0.0 },
{ "word": "Hey", "start": 0.079, "end": 0.219 },
{ "word": "there,", "start": 0.239, "end": 0.459 },
{ "word": "", "start": 5.526, "end": 5.526 }
]
}
}
使用示例
基础 TTS
const result = await textToSpeech({ text: "Welcome to our quarterly earnings call.", voice_id: "YOUR_VOICE_ID", });
console.log(Audio URL: ${result.audio_url}); console.log(Duration: ${result.duration}s);
调整速度
const result = await textToSpeech({
text: "We're thrilled to announce our newest feature!",
voice_id: "YOUR_VOICE_ID",
speed: 1.1,
});
多语言语音的语言和语言区域设置
const result = await textToSpeech({
text: "Bem-vindo ao nosso produto.",
voice_id: "MULTILINGUAL_VOICE_ID",
language: "pt",
locale: "pt-BR",
});
使用 SSML 输入
const result = await textToSpeech({
text: 'Hello and welcome!',
voice_id: "YOUR_VOICE_ID",
input_type: "ssml",
});
查找语音并生成音频
async function generateSpeech(text: string, language: string): Promise { const voices = await listTTSVoices();const voice = voices.find( (v) => v.language.toLowerCase().includes(language.toLowerCase()) );
if (!voice) { throw new Error(
No TTS voice found for language: ${language}); }const result = await textToSpeech({ text, voice_id: voice.voice_id, });
return result.audio_url; }
const audioUrl = await generateSpeech("Hello and welcome!", "english");
使用 Break 标签暂停
在文本中使用 SSML 风格的 break 标签来添加暂停:
word word
规则:
- 使用带
s后缀的秒数: - 标签前后必须有空格
- 自闭合标签格式
使用 v3,你还可以使用 input_type: "ssml" 来获得完整的 SSML 支持,允许使用更丰富的标记,不仅限于 break 标签:
{
"text": "Welcome! Let's get started.",
"voice_id": "YOUR_VOICE_ID",
"input_type": "ssml"
}
最佳实践
- 使用
GET /v3/voices?engine=starfish查找兼容的语音——统一的/v3/voices端点提供所有语音类型,因此engine=starfish过滤器对于 TTS 至关重要
- 在设置
locale之前检查support_locale——只有多语言语音支持语言区域选择
- 将速度保持在 0.8-1.2 之间以获得自然的输出
- 使用
preview_audio_url预览语音然后再生成(某些语音可能为 null)
- 在响应中使用
word_timestamps进行字幕同步或定时文本叠加
- 在文本中使用 SSML break 标签来插入停顿:
word word
- 当需要超越简单 break 标签的完整 SSML 标记控制时,使用
input_type: "ssml"
- 对语音列表进行分页——v3 端点返回分页结果;使用
has_more和next_token获取所有语音