Gemini Live Phone
v1.0.1Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS 中间件 required. Includes VAD and echo suppression.
运行时依赖
安装命令
点击复制技能文档
Gemini Live Phone Bridge
Real-time voice AI over phone calls using Google Gemini's native audio capabilities.
Architecture Phone ↔ Twilio ↔ 网页Socket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)
Quick 启动 # 设置 required env vars 导出 GOOGLE_API_KEY="your-key" 导出 TWILIO_AUTH_令牌="your-令牌"
# 运行 the bridge python scripts/bridge.py --port 3335
端点s 端点 Method Description /gemini-live/状态 获取 健康 检查 + active calls /gemini-live/incoming POST TwiML for inbound calls (Twilio 网页hook) /gemini-live/流 WS Twilio Media 流 网页Socket /gemini-live/call POST Initiate outbound call /gemini-live/twiml POST TwiML for outbound calls /gemini-live/call-状态 POST Twilio call 状态 网页hook Outbound Call API curl -X POST https://your-domAIn/gemini-live/call \ -H 'Content-Type: 应用/json' \ -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'
Configuration
All 设置tings via 命令行工具 args or 环境 variables:
Core --模型 — Gemini 模型 (default: gemini-2.5-flash-native-audio-latest) --voice — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore) --from-number — Twilio outbound number (default: env TWILIO_FROM) --系统-prompt — AI persona 系统 prompt --max-duration — Max call seconds (default: 300) VAD (Voice Activity 检测ion) --vad-enabled / --no-vad — Toggle server-side VAD (default: on) --vad-silence-ms — Silence duration to trigger activityEnd (default: 500) --vad-energy-threshold — RMS energy threshold (default: 0.01) --vad-speech-min-ms — Min speech duration before activity启动 (default: 100) Echo Suppression --echo-multiplier — VAD threshold multiplier during 代理 speech (default: 3.0) --echo-decay-ms — Decay time after 代理 停止s speaking (default: 300) Twilio 设置up Buy a phone number on Twilio 设置 Voice 网页hook: https://your-domAIn/gemini-live/incoming (HTTP POST) 设置 Call 状态 URL: https://your-domAIn/gemini-live/call-状态 (HTTP POST) Ensure geo-权限s are enabled for tar获取 countries Network Requirements
The bridge must be 访问ible from the internet (Twilio connects to it). Recommended: C添加y reverse proxy with 网页Socket support.
# C添加y config example handle /gemini-live/* { reverse_proxy localhost:3335 { flush_interval -1 transport http { read_timeout 0 write_timeout 0 } } }
Performance
Latency benchmarks (Gemini 2.5 Flash Native Audio):
Config Median Min Max No VAD, 200ms buffer 3,660ms 2,360ms 5,180ms Server VAD, 50ms buffer 2,500ms 2,080ms 6,980ms
Server-side VAD reduces median latency by ~32%.