Text to Speech — 使用 HeyGen Starfish 模型将文本转换为语音

Name: Text to Speech — 使用 HeyGen Starfish 模型将文本转换为语音
Rating: 1 (1 reviews)
Author: Michael Wang

Michael Wang

Text to Speech — 使用 HeyGen Starfish 模型将文本转换为语音

v2.22.0

利用 HeyGen 的 Starfish TTS 模型，从文本生成语音音频。适用于生成独立语音文件、转换文本为语音（支持语音选择和速度控制）、为视频、播客等创建音频等场景。

1· 697·5 当前·5 累计

by @michaelwang11394 (Michael Wang)·MIT-0

AI模型访问代码生成文件处理

下载技能包

License

MIT-0

最后更新

2026/4/6

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能的请求和指令与 HeyGen Starfish TTS 集成一致，仅需要 HEYGEN_API_KEY，SKILL.md 指导调用 HeyGen 的 v3 TTS 端点或 MCP 工具。

评估建议

该技能在 HeyGen TTS 使用上内部一致，但来源未知（无主页）。安装前请：（1）确认信任发布者或与 HeyGen 官方文档比较 API 调用；（2）仅提供专用 HeyGen API 密钥，必要时旋转/吊销；（3）避免发送敏感或个人可识别文本；（4）注意 HeyGen 账户的成本和速率限制；（5）如果使用 MCP 工具，请检查其实现或策略。...

详细分析 ▾

✓ 用途与能力

名称和描述（通过 HeyGen Starfish 的文本转语音）与声明的要求匹配：单个 HEYGEN_API_KEY 和对 https://api.heygen.com/v3/* 的调用。没有超出 TTS 集成范围的额外请求（无额外凭据、二进制文件或配置路径）

✓ 指令范围

SKILL.md 指导列出语音并发布到 /v3/voices/speech（或使用 mcp__heygen__* 工具）。它不要求代理读取无关文件、其他环境变量或向 HeyGen 之外的端点传输数据。示例代码仅引用 HEYGEN_API_KEY。

✓ 安装机制

仅指令的技能，无安装规格和代码文件；安装期间不会写入磁盘或下载内容。这是最低风险的安装模型。

✓ 凭证需求

仅需要 HEYGEN_API_KEY 并声明为 primaryEnv。指令一致使用该密钥，不引用额外的秘密或无关的凭据。

✓ 持久化与权限

技能不请求 always:true、系统范围的更改或访问其他技能的配置。默认允许自主调用，但对于功能技能这是预期的。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv2.22.02026/3/4

自动发布自提交 ce047e148b0ae2d2b598d79d38ee6712cb4a6a67

● 无害

安装命令点击复制

官方npx clawhub@latest install text-to-speech-heygen

镜像加速npx clawhub@latest install text-to-speech-heygen --registry https://cn.clawhub-mirror.com

技能文档

文本转语音（HeyGen Starfish）

使用 HeyGen 的 Starfish TTS 模型通过 v3 API 从文本生成语音音频文件。该技能用于独立的音频生成，与视频创建分离。

认证

所有请求需要 X-Api-Key 头。设置 HEYGEN_API_KEY 环境变量。

curl -X GET "https://api.heygen.com/v3/voices?engine=starfish" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

...（由于字符限制，完整的 cn_skill_md_content 未全部显示，但您可以根据需要自行补充）

Generate speech audio files from text using HeyGen's in-house Starfish TTS model via the v3 API. This skill is for standalone audio generation — separate from video creation.

Authentication

All requests require the X-Api-Key header. Set the HEYGEN_API_KEY environment variable.

curl -X GET "https://api.heygen.com/v3/voices?engine=starfish" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

Tool Selection

If HeyGen MCP tools are available (mcp__heygen__*), prefer them over direct HTTP API calls.

Task	MCP Tool	Fallback (Direct API)
List TTS voices	`mcp__heygen__list_audio_voices`	`GET /v3/voices?engine=starfish`
Generate speech audio	`mcp__heygen__text_to_speech`	`POST /v3/voices/speech`

Default Workflow

List voices with mcp__heygen__list_audio_voices (or GET /v3/voices?engine=starfish)
Pick a voice matching desired language, gender, and features
Call mcp__heygen__text_to_speech (or POST /v3/voices/speech) with text and voice_id
Use the returned audio_url to download or play the audio

List TTS Voices

Retrieve voices compatible with the Starfish TTS model.

Note: This uses the unified GET /v3/voices endpoint with the engine=starfish filter to return only TTS-compatible voices. Not all video voices support Starfish TTS. The response is paginated — use next_token to fetch additional pages.

Query Parameters

Param	Type	Description
`engine`	string	Filter by engine (use `starfish` for TTS voices)
`type`	string	`public` or `private`
`language`	string	Filter by language
`gender`	string	Filter by gender
`limit`	integer	Results per page, 1-100
`token`	string	Pagination cursor from `next_token`

curl

curl -X GET "https://api.heygen.com/v3/voices?engine=starfish" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

TypeScript

interface AudioVoiceItem {
  voice_id: string;
  name: string;
  language: string;
  gender: "female" | "male" | "unknown";
  preview_audio_url: string | null;
  support_pause: boolean;
  support_locale: boolean;
  type: string;
}
interface TTSVoicesResponse {
  error: null | string;
  data: AudioVoiceItem[];
  has_more: boolean;
  next_token: string | null;
}
async function listTTSVoices(): Promise {
  const allVoices: AudioVoiceItem[] = [];
  let token: string | null = null;
  do {
    const url = new URL("https://api.heygen.com/v3/voices");
    url.searchParams.set("engine", "starfish");
    if (token) url.searchParams.set("token", token);
    const response = await fetch(url.toString(), {
      headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! },
    });
    const json: TTSVoicesResponse = await response.json();
    if (json.error) {
      throw new Error(json.error);
    }
    allVoices.push(...json.data);
    token = json.next_token;
  } while (token);  return allVoices;
}

Python

import requests
import os
def list_tts_voices() -> list:
    all_voices = []
    token = None
    while True:
        params = {"engine": "starfish"}
        if token:
            params["token"] = token
        response = requests.get(
            "https://api.heygen.com/v3/voices",
            headers={"X-Api-Key": os.environ["HEYGEN_API_KEY"]},
            params=params,
        )
        data = response.json()
        if data.get("error"):
            raise Exception(data["error"])
        all_voices.extend(data["data"])
        if not data.get("has_more"):
            break
        token = data.get("next_token")    return all_voices

Response Format

{
  "error": null,
  "data": [
    {
      "voice_id": "f38a635bee7a4d1f9b0a654a31d050d2",
      "name": "Chill Brian",
      "language": "English",
      "gender": "male",
      "preview_audio_url": "https://resource.heygen.ai/text_to_speech/WpSDQvmLGXEqXZVZQiVeg6.mp3",
      "support_pause": true,
      "support_locale": false,
      "type": "public"
    }
  ],
  "has_more": false,
  "next_token": null
}

Generate Speech Audio

Convert text to speech audio using a specified voice.

Endpoint

POST https://api.heygen.com/v3/voices/speech

Request Fields

Field	Type	Req	Description
`text`	string	Y	Text content to convert (1-5000 characters)
`voice_id`	string	Y	Voice ID from `GET /v3/voices?engine=starfish`
`input_type`	string	`"text"` (default) or `"ssml"` for full SSML markup
`speed`	number	Speech speed, 0.5-2.0 (default: 1.0)
`language`	string	Base language code (e.g., `"en"`, `"pt"`). Auto-detected if omitted
`locale`	string	BCP-47 locale for multilingual voices (e.g., `"en-US"`, `"pt-BR"`)

curl

curl -X POST "https://api.heygen.com/v3/voices/speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to our product demo.",
    "voice_id": "YOUR_VOICE_ID",
    "speed": 1.0
  }'

TypeScript

interface TTSRequest {
  text: string;
  voice_id: string;
  input_type?: "text" | "ssml";
  speed?: number;
  language?: string;
  locale?: string;
}
interface WordTimestamp {
  word: string;
  start: number;
  end: number;
}
interface TTSResponse {
  error: null | string;
  data: {
    audio_url: string;
    duration: number;
    request_id?: string;
    word_timestamps?: WordTimestamp[];
  };
}
async function textToSpeech(request: TTSRequest): Promise {
  const response = await fetch(
    "https://api.heygen.com/v3/voices/speech",
    {
      method: "POST",
      headers: {
        "X-Api-Key": process.env.HEYGEN_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(request),
    }
  );
  const json: TTSResponse = await response.json();
  if (json.error) {
    throw new Error(json.error);
  }  return json.data;
}

Python

import requests
import os
def text_to_speech(
    text: str,
    voice_id: str,
    input_type: str = "text",
    speed: float = 1.0,
    language: str | None = None,
    locale: str | None = None,
) -> dict:
    payload = {
        "text": text,
        "voice_id": voice_id,
        "speed": speed,
    }
    if input_type != "text":
        payload["input_type"] = input_type
    if language:
        payload["language"] = language
    if locale:
        payload["locale"] = locale
    response = requests.post(
        "https://api.heygen.com/v3/voices/speech",
        headers={
            "X-Api-Key": os.environ["HEYGEN_API_KEY"],
            "Content-Type": "application/json",
        },
        json=payload,
    )
    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])    return data["data"]

Response Format

{
  "error": null,
  "data": {
    "audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
    "duration": 5.526,
    "request_id": "p38QJ52hfgNlsYKZZmd9",
    "word_timestamps": [
      { "word": "", "start": 0.0, "end": 0.0 },
      { "word": "Hey", "start": 0.079, "end": 0.219 },
      { "word": "there,", "start": 0.239, "end": 0.459 },
      { "word": "", "start": 5.526, "end": 5.526 }
    ]
  }
}

Usage Examples

Basic TTS

const result = await textToSpeech({
  text: "Welcome to our quarterly earnings call.",
  voice_id: "YOUR_VOICE_ID",
});console.log(Audio URL: ${result.audio_url});
console.log(Duration: ${result.duration}s);

With Speed Adjustment

const result = await textToSpeech({
  text: "We're thrilled to announce our newest feature!",
  voice_id: "YOUR_VOICE_ID",
  speed: 1.1,
});

With Language and Locale for Multilingual Voices

const result = await textToSpeech({
  text: "Bem-vindo ao nosso produto.",
  voice_id: "MULTILINGUAL_VOICE_ID",
  language: "pt",
  locale: "pt-BR",
});

With SSML Input

const result = await textToSpeech({
  text: 'Hello  and welcome!',
  voice_id: "YOUR_VOICE_ID",
  input_type: "ssml",
});

Find a Voice and Generate Audio

async function generateSpeech(text: string, language: string): Promise {
  const voices = await listTTSVoices();
  const voice = voices.find(
    (v) => v.language.toLowerCase().includes(language.toLowerCase())
  );
  if (!voice) {
    throw new Error(No TTS voice found for language: ${language});
  }
  const result = await textToSpeech({
    text,
    voice_id: voice.voice_id,
  });
  return result.audio_url;
}const audioUrl = await generateSpeech("Hello and welcome!", "english");

Pauses with Break Tags

Use SSML-style break tags in your text for pauses:

word  word

Rules:

Use seconds with s suffix:
Must have spaces before and after the tag
Self-closing tag format

With v3, you can also use input_type: "ssml" for full SSML support, allowing richer markup beyond just break tags:

{
  "text": "Welcome!  Let's get started.",
  "voice_id": "YOUR_VOICE_ID",
  "input_type": "ssml"
}

Best Practices

Use GET /v3/voices?engine=starfish to find compatible voices — the unified /v3/voices endpoint serves all voice types, so the engine=starfish filter is essential for TTS
Check support_locale before setting a locale — only multilingual voices support locale selection
Keep speed between 0.8-1.2 for natural-sounding output
Preview voices using the preview_audio_url before generating (may be null for some voices)
Use word_timestamps in the response for caption syncing or timed text overlays
Use SSML break tags in your text for pauses: word word
Use input_type: "ssml" when you need full SSML markup control beyond simple break tags
Paginate voice listing — the v3 endpoint returns paginated results; use has_more and next_token to fetch all voices

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

认证

Authentication

Tool Selection

Default Workflow

List TTS Voices

Query Parameters

curl

TypeScript

Python

Response Format

Generate Speech Audio

Endpoint

Request Fields

curl

TypeScript

Python

Response Format

Usage Examples

Basic TTS

With Speed Adjustment

With Language and Locale for Multilingual Voices

With SSML Input

Find a Voice and Generate Audio

Pauses with Break Tags

Best Practices

安装命令点击复制