详细分析 ▾
运行时依赖
版本
批量发布自 all-task-skills-dedup
安装命令
点击复制技能文档
使用 OpenAI 的 TTS API 从文本生成高质量口语音频。
认证
API 密钥以环境变量形式提供:
OPENAI_API_KEY
模型
gpt-4o-mini-tts- 最新,最可靠。支持音调/风格指令。tts-1- 较低延迟,较低质量tts-1-hd- 较高质量,较高延迟
语音选项
内置语音(针对英语优化):
alloy,ash,ballad,coral,echo,fablenova,onyx,sage,shimmer,versemarin,cedar- 推荐以获得最佳质量
注意:tts-1 和 tts-1-hd 仅支持:alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer。
Python 示例
from pathlib import Path from openai import OpenAIclient = OpenAI() # 使用 OPENAI_API_KEY 环境变量
# 基本用法 with client.audio.speech.with_streaming_response.create( model="gpt-4o-mini-tts", voice="coral", input="Hello, world!", ) as response: response.stream_to_file("output.mp3")
# 带音调指令(仅 gpt-4o-mini-tts) with client.audio.speech.with_streaming_response.create( model="gpt-4o-mini-tts", voice="coral", input="Today is a wonderful day!", instructions="Speak in a cheerful and positive tone.", ) as response: response.stream_to_file("output.mp3")
处理长文本
对于长文档,请分块并拼接:
from openai import OpenAI from pydub import AudioSegment import tempfile import re import osclient = OpenAI()
def chunk_text(text, max_chars=4000): """在句子边界处将文本分割成块。""" sentences = re.split(r'(?<=[.!?])\s+', text) chunks = [] current_chunk = "" for sentence in sentences: if len(current_chunk) + len(sentence) < max_chars: current_chunk += sentence + " " else: if current_chunk: chunks.append(current_chunk.strip()) current_chunk = sentence + " " if current_chunk: chunks.append(current_chunk.strip()) return chunks
def text_to_audiobook(text, output_path): """将长文本转换为音频文件。""" chunks = chunk_text(text) audio_segments = [] for chunk in chunks: with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as tmp: tmp_path = tmp.name with client.audio.speech.with_streaming_response.create( model="gpt-4o-mini-tts", voice="coral", input=chunk, ) as response: response.stream_to_file(tmp_path) segment = AudioSegment.from_mp3(tmp_path) audio_segments.append(segment) os.unlink(tmp_path) # 拼接所有片段 combined = audio_segments[0] for segment in audio_segments[1:]: combined += segment combined.export(output_path, format="mp3")
输出格式
mp3- 默认,通用用途opus- 低延迟流式传输aac- 数字压缩(YouTube、iOS)flac- 无损压缩wav- 未压缩,低延迟pcm- 原始采样(24kHz,16位)
with client.audio.speech.with_streaming_response.create(
model="gpt-4o-mini-tts",
voice="coral",
input="Hello!",
response_format="wav", # 指定格式
) as response:
response.stream_to_file("output.wav")
最佳实践
- 使用
marin或cedar语音以获得最佳质量 - 长内容在句子边界处分割文本
- 使用
wav或pcm以获得最低延迟 - 添加
instructions参数来控制音调/风格(仅 gpt-4o-mini-tts)