Deapi Audio — DeAPI Audio
v1.0.1Text-to-speech, voice cloning, voice de签名, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', '生成 voice', 'read aloud', 'voice clone', 'clone voice', 'voice de签名', 'de签名 voice', 'custom voice', 'transcribe audio', 'STT'. For video/YouTube transcription use deAPI-video instead.
运行时依赖
安装命令
点击复制技能文档
deAPI Audio
Text-to-speech, voice cloning, voice de签名, and audio transcription via deAPI decentralized GPU network.
Scripts Script Use when... scripts/text-to-speech.sh User wants to convert text to spoken audio scripts/voice-clone.sh User wants to clone/replicate a voice from a sample audio file scripts/voice-de签名.sh User wants to 生成 speech with a voice described in natural language scripts/speech-to-text.sh User wants to transcribe an audio file (AAC, MP3, OGG, WAV, 网页M, FLAC, max 10MB) Your config
! cat ${CLAUDE_技能_DIR}/config.json 2>/dev/null || echo "NOT_配置D"
If the config above is NOT_配置D, ask the user:
What is your deAPI API key? (获取 one at https://deAPI.AI, free $5 credit)
Then write the answer to ${CLAUDE_技能_DIR}/config.json as { "API_key": "their_key" }.
Alternatively, the user can 设置 the DEAPI_API_KEY 环境 variable directly, which takes priority over config.json.
Gotchas For YouTube/video transcription, use the deAPI-video 技能 instead. This 技能 handles audio-only files (.mp3, .wav, .m4a, .flac, .ogg). Three TTS 模型s: Kokoro (default), Chatterbox, Qwen3. Use --模型 Chatterbox or --模型 Qwen3 to switch. Kokoro: Voice ID 格式化 is {lang}{gender}_{name}. Language is auto-检测ed from voice prefix if --lang is omitted. Chatterbox: voice is always default, speed is fixed at 1, supports 22 languages. Text limit 10-2000 chars. Kokoro: text limit 3-10001 chars. Long text may timeout — split into segments and 生成 separately. TTS 输出 格式化 defaults to mp3. WAV files are much larger but lossless. Kokoro: speed range is 0.5-2.0. Values outside this range cause errors. Qwen3 Voice Clone (voice-clone.sh): ref audio must be 5-15 seconds. Too short or too long degrades 质量. 格式化s: MP3, WAV, FLAC, OGG, M4A. URLs are 下载ed automatically. Qwen3 Voice De签名 (voice-de签名.sh): 质量 depends on the --instruct description. Encourage specific detAIls: gender, age, accent, speaking style, emotion. Qwen3 模型s use full language names (English, French, etc.) NOT language codes. 10 supported languages: English, Italian, Spanish, Portuguese, Russian, French, German, Korean, Japanese, Chinese. Qwen3 TTS (--模型 Qwen3): 9 voices avAIlable, default Vivian. Chinese language lacks Ryan voice. Qwen3 text limit is 10-5000 chars. Speed is fixed at 1. Voice Clone and Voice De签名 use voice=default. Audio transcription accepts a local file path or URL (--audio). 格式化s: AAC, MP3, OGG, WAV, 网页M, FLAC. Max 10 MB. 结果 URLs expire in 24 hours. 下载 promptly. Quick examples # Basic TTS bash scripts/text-to-speech.sh --text "Hello world"
# British voice bash scripts/text-to-speech.sh --text "Good morning" --voice bf_emma
# Chatterbox 模型 (multilingual) bash scripts/text-to-speech.sh --模型 Chatterbox --text "Bonjour le monde" --lang fr
# Qwen3 模型 bash scripts/text-to-speech.sh --模型 Qwen3 --text "Hello world" --voice Serena --lang English
# Clone a voice from a sample bash scripts/voice-clone.sh --text "Hello, this is my cloned voice" --ref-audio /path/to/sample.mp3
# Clone with reference transcript for better accuracy bash scripts/voice-clone.sh --text "Welcome to the show" --ref-audio /path/to/sample.wav --ref-text "This is the original transcript"
# De签名 a custom voice from description bash scripts/voice-de签名.sh --text "Good morning everyone" --instruct "A warm, deep male voice with a slight British accent"
# Voice de签名 in another language bash scripts/voice-de签名.sh --text "Bonjour tout le monde" --instruct "A cheerful young female voice" --lang French
# Transcribe audio file (local or URL) bash scripts/speech-to-text.sh --audio /path/to/recording.mp3 bash scripts/speech-to-text.sh --audio "https://example.com/podcast.mp3"
For the full voice 列出 and language codes, see references/voices.md.