ElevenLabs TTS — 语音合成

Name: ElevenLabs TTS — 语音合成
Rating: 1 (6 reviews)
Author: shaharsh

shaharsh

🎙️ ElevenLabs TTS — 语音合成

v2.4.0

ElevenLabs TTS 是 OpenClaw 的 ElevenLabs 集成，提供带情感音频标签的文本转语音服务，支持 WhatsApp 声音消息发送。使用 ElevenLabs API 生成真实的 AI 声音，支持多语言、情感标签和语音合成。

6· 5,719·30 当前·30 累计·💬 3

by @shaharsha (shaharsh)·MIT-0

API工具 AI模型访问代码生成测试工具

下载技能包

License

MIT-0

最后更新

2026/4/11

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

安全

medium confidence

该技能的要求和运行指令与 ElevenLabs TTS 集成一致，但存在元数据不一致和有限的来源证明，安装前应验证。

评估建议

该技能如所宣称生成 ElevenLabs v3 TTS 带音频标签并转换 WhatsApp 音频。安装前：1) 验证来源；2) 确保 ffmpeg 安装；3) 使用专用 ElevenLabs API 密钥；4) 注意 exec 命令权限；5) 若需更强保证，请求发布者提供规范源或签名包。...

详细分析 ▾

✓ 用途与能力

名称/描述（ElevenLabs TTS）与指令匹配：调用 ElevenLabs v3，使用音频标签，选择声音，转换 WhatsApp 音频。

ℹ 指令范围

SKILL.md 仅包含指令，完全在 TTS 范围内。

✓ 安装机制

无安装规范和代码文件 — 最低风险类别。

ℹ 凭证需求

仅要求 ELEVENLABS_API_KEY 作为主要凭据。

✓ 持久化与权限

always 为 false，技能可由用户调用，允许正常自动调用。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv2.4.02026/2/4

修复：更新 TTS 文件路径至 /tmp/openclaw/tts-*/，添加最新文件安全检查，添加工作空间复制步骤以兼容消息工具。

● 可疑

安装命令点击复制

官方npx clawhub@latest install elevenlabs-tts

镜像加速npx clawhub@latest install elevenlabs-tts --registry https://cn.clawhub-mirror.com

技能文档

ElevenLabs TTS (文本转语音) 生成带情感标签的语音消息，使用 ElevenLabs v3 和音频标签。... （中间内容为代码块、命令行指令和 Markdown 格式，保持不变）...

Generate expressive voice messages using ElevenLabs v3 with audio tags.

Prerequisites

ElevenLabs API Key (ELEVENLABS_API_KEY): Required. Get one at elevenlabs.io → Profile → API Keys. Configure in openclaw.json under messages.tts.elevenlabs.apiKey.
ffmpeg: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.

Quick Start Examples

Storytelling (emotional journey):

[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!

Horror/Suspense (building dread):

[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!

Conversation with reactions:

[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.

Hebrew (romantic moment):

[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?

Spanish (celebration to reflection):

[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.

Configuration (OpenClaw)

In openclaw.json, configure TTS under messages.tts:

{
  "messages": {
    "tts": {
      "provider": "elevenlabs",
      "elevenlabs": {
        "apiKey": "sk_your_api_key_here",
        "voiceId": "pNInz6obpgDQGcFmaJgB",
        "modelId": "eleven_v3",
        "languageCode": "en",
        "voiceSettings": {
          "stability": 0.5,
          "similarityBoost": 0.75,
          "style": 0,
          "useSpeakerBoost": true,
          "speed": 1
        }
      }
    }
  }
}

Getting your API Key:

Go to https://elevenlabs.io
Sign up/login
Click profile → API Keys
Copy your key

Recommended Voices for v3

These premade voices are optimized for v3 and work well with audio tags:

Voice	ID	Gender	Accent	Best For
Adam	`pNInz6obpgDQGcFmaJgB`	Male	American	Deep narration, general use
Rachel	`21m00Tcm4TlvDq8ikWAM`	Female	American	Calm narration, conversational
Brian	`nPczCjzI2devNBz1zQrb`	Male	American	Deep narration, podcasts
Charlotte	`XB0fDUnXU5powFXDhCwa`	Female	English-Swedish	Expressive, video games
George	`JBFqnCBsd6RMkjVDRZzb`	Male	British	Raspy narration, storytelling

Finding more voices:

Browse: https://elevenlabs.io/voice-library
v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH
API: GET https://api.elevenlabs.io/v1/voices

Voice selection tips:

Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
Match voice character to your use case (whispering voice won't shout well)
For expressive IVCs, include varied emotional tones in training samples

Model Settings

Model: eleven_v3 (alpha) - ONLY model supporting audio tags
Languages: 70+ supported with full audio tag control

Stability Modes

Mode	Stability	Description
Creative	0.3-0.5	More emotional/expressive, may hallucinate
Natural	0.5-0.7	Balanced, closest to original voice
Robust	0.7-1.0	Highly stable, less responsive to tags

For audio tags, use Creative (0.5) or Natural. Higher stability reduces tag responsiveness.

Speed Control

Range: 0.7 (slow) to 1.2 (fast), default 1.0

Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].

Critical Rules

Length Limits

Optimal: <800 characters per segment (best quality)
Maximum: 10,000 characters (API hard limit)
Quality degrades with longer text - voice becomes inconsistent

Audio Tags - Best Practices for Natural Sound

How many tags to use:

1-2 tags per sentence or phrase (not more!)
Tags persist until the next tag - no need to repeat
Overusing tags sounds unnatural and robotic

Where to place tags:

At emotional transition points
Before key dramatic moments
When energy/pace changes

Context matters:

Write text that matches the tag emotion
Longer text with context = better interpretation
Example: [nervous] I... I'm not sure about this. What if it doesn't work? works better than [nervous] Hello.

Combine tags for nuance:

[nervously][whispers] = nervous whispering
[excited][laughs] = excited laughter
Keep combinations to 2 tags max

Regenerate for best results:

v3 is non-deterministic - same text = different outputs
Generate 3+ versions, pick the best
Small text tweaks can improve results

Match tag to voice:

Don't use [shouts] on a whispering voice
Don't use [whispers] on a loud/energetic voice
Test tags with your chosen voice

SSML Not Supported

v3 does NOT support SSML break tags. Use audio tags and punctuation instead.

Punctuation Effects (use with tags!)

Punctuation enhances audio tags:

Ellipses (...) → dramatic pauses: [nervous] I... I don't know...
CAPS → emphasis: [excited] That's AMAZING!
Dashes (—) → interruptions: [explaining] So what you do is— [interrupting] Wait!
Question marks → uncertainty: [nervous] Are you sure about this?
Exclamation! → energy boost: [happy] We did it!

Combine tags + punctuation for maximum effect:

[tired] It was a long day... [sighs] Nobody listens anymore.

WhatsApp Voice Messages

Complete Workflow

Generate with tts tool (returns Opus in /tmp/openclaw/tts-/)

Copy to workspace (message tool only allows workspace paths)

Send with message tool

Cleanup - delete the workspace copy

Step-by-Step

1. Generate TTS (add [pause] at end to prevent cutoff):

tts text="[excited] This is amazing! [pause]" channel=whatsapp

2. Find the LATEST file (⚠️ CRITICAL - always use the newest file!):

find /tmp/openclaw/tts- /tmp/tts- -name ".opus" -o -name ".mp3" -o -name ".ogg" 2>/dev/null | xargs ls -t | head -1

The tts tool now outputs to /tmp/openclaw/tts-/ (NOT /tmp/tts-/). Old files may exist in /tmp/tts-*/ from previous sessions - never use those!

3. If file is MP3, convert to Opus:

ffmpeg -i /path/to/voice.mp3 -c:a libopus -b:a 64k -vbr on -application voip /path/to/voice.ogg

If already .opus, skip this step.

4. Copy to workspace and send:

cp /tmp/openclaw/tts-xxx/voice.opus ~/. openclaw/workspace/voice-temp.ogg

message action=send channel=whatsapp target="+972..." filePath="/root/.openclaw/workspace/voice-temp.ogg" asVoice=true message=" "

5. Cleanup:

rm /root/.openclaw/workspace/voice-temp.ogg

WhatsApp requires a non-empty message body to send voice notes. Use a single space as the message.

Why Opus?

Format	iOS	Android	Transcribe
MP3	✅ Works	❌ May fail	❌ No
Opus (.ogg)	✅ Works	✅ Works	✅ Yes

Always convert to Opus - it's the only format that:

Works on all devices (iOS + Android)
Supports WhatsApp's transcribe button

Audio Cutoff Fix

ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:

[excited] This is amazing! [pause]

Long-Form Audio (Podcasts)

For content >800 chars:

Split into short segments (<800 chars each)
Generate each with tts tool
Concatenate with ffmpeg:

   cat > list.txt << EOF
   file '/path/file1.mp3'
   file '/path/file2.mp3'
   EOF
   ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3

Convert to Opus for WhatsApp
Send as single voice message

Important: Don't mention "part 2" or "chapter" - keep it seamless.

Multi-Speaker Dialogue

v3 can handle multiple characters in one generation:

Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!

Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]

Audio Tags Quick Reference

Category	Tags	When to Use
Emotions	[excited], [happy], [sad], [angry], [nervous], [curious]	Main emotional state - use 1 per section
Delivery	[whispers], [shouts], [soft], [rushed], [drawn out]	Volume/speed changes
Reactions	[laughs], [sighs], [gasps], [clears throat], [gulps]	Natural human moments - sprinkle sparingly
Pacing	[pause], [hesitates], [stammers], [breathes]	Dramatic timing
Character	[French accent], [British accent], [robotic tone]	Character voice shifts
Dialogue	[interrupting], [overlapping], [cuts in]	Multi-speaker conversations

Most effective tags (reliable results):

Emotions: [excited], [nervous], [sad], [happy]
Reactions: [laughs], [sighs], [whispers]
Pacing: [pause]

Less reliable (test and regenerate):

Sound effects: [explosion], [gunshot]
Accents: results vary by voice

Full tag list: See references/audio-tags.md

Troubleshooting

Tags read aloud?

Verify using eleven_v3 model
Use IVC/premade voices, not PVC
Simplify tags (no "tone" suffix)
Increase text length (250+ chars)

Voice inconsistent?

Segment is too long - split at <800 chars
Regenerate (v3 is non-deterministic)
Try lower stability setting

WhatsApp won't play?

Convert to Opus format (see above)

No emotion despite tags?

Voice may not match tag style
Try Creative stability mode (0.5)
Add more context around the tag

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Prerequisites

Quick Start Examples

Configuration (OpenClaw)

Recommended Voices for v3

Model Settings

Stability Modes

Speed Control

Critical Rules

Length Limits

Audio Tags - Best Practices for Natural Sound

SSML Not Supported

Punctuation Effects (use with tags!)

WhatsApp Voice Messages

Complete Workflow

Step-by-Step

Why Opus?

Audio Cutoff Fix

Long-Form Audio (Podcasts)

Multi-Speaker Dialogue

Audio Tags Quick Reference

Troubleshooting

安装命令点击复制