首页龙虾技能列表 › Gemini STT — 技能工具

🎤 Gemini STT — 技能工具

v1.1.0

Transcribe audio files using Google's Gemini API or Vertex AI

2· 3,029·11 当前·11 累计
by @araa47·MIT-0
下载技能包
License
MIT-0
最后更新
2026/2/26
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The skill appears to do what it claims—transcribe audio via Gemini or Vertex AI—and its code and instructions are consistent with that purpose, but the registry metadata omits required authentication details and should be corrected/verified before use.
评估建议
This skill is coherent with its stated purpose, but before installing: (1) be aware it requires authentication—either set GEMINI_API_KEY or run 'gcloud auth application-default login' and ensure a proper GCP project is configured; the registry metadata currently omits these requirements. (2) Using ADC (gcloud) will cause the script to call 'gcloud auth print-access-token' and use your ADC permissions to call Vertex; prefer a least-privilege service account or isolated environment if you are conc...
详细分析 ▾
用途与能力
Skill name/description (Gemini/Vertex STT) match the code and runtime instructions. The only mismatch is registry metadata claiming 'no required env vars' while SKILL.md and the script require either GEMINI_API_KEY or Google ADC (gcloud). This is an inconsistency in metadata, not in functionality.
指令范围
Runtime instructions and the script are scoped to reading an audio file, base64-encoding it, and calling Google Gemini or Vertex endpoints. It invokes 'gcloud' only to obtain an access token/project configuration. It does not read unrelated system files or send data to unexpected endpoints.
安装机制
No install spec; the skill is instruction-only with a single Python script that uses only the standard library. Low risk from installation artifacts.
凭证需求
Authentication requirements (GEMINI_API_KEY or gcloud ADC and possibly GOOGLE_CLOUD_PROJECT/CLOUDSDK_CORE_PROJECT) are appropriate for contacting Gemini/Vertex. However, the skill metadata declares no required environment variables or primary credential, which is inaccurate and could mislead users about needed credentials.
持久化与权限
The skill does not request permanent inclusion (always:false), does not modify other skills or system settings, and does not persist credentials. It runs commands locally (gcloud) but does not escalate privileges or change system-wide configuration.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

🖥️ OSLinux · macOS

版本

latestv1.1.02026/1/14

Added support for Google Vertex AI with Application Default Credentials (ADC). Now supports both GEMINI_API_KEY and gcloud ADC authentication methods. Auto-detects authentication method.

● 无害

安装命令 点击复制

官方npx clawhub@latest install gemini-stt
镜像加速npx clawhub@latest install gemini-stt --registry https://cn.clawhub-mirror.com

技能文档

Transcribe audio files using Google's Gemini API or Vertex AI. Default model is gemini-2.0-flash-lite for fastest transcription.

Authentication (choose one)

Option 1: Vertex AI with Application Default Credentials (Recommended)

gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID

The script will automatically detect and use ADC when available.

Option 2: Direct Gemini API Key

Set GEMINI_API_KEY in environment (e.g., ~/.env or ~/.clawdbot/.env)

Requirements

  • Python 3.10+ (no external dependencies)
  • Either GEMINI_API_KEY or gcloud CLI with ADC configured

Supported Formats

  • .ogg / .opus (Telegram voice messages)
  • .mp3
  • .wav
  • .m4a

Usage

# Auto-detect auth (tries ADC first, then GEMINI_API_KEY)
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg

# Force Vertex AI python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex

# With a specific model python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro

# Vertex AI with specific project and region python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1

# With Clawdbot media python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg

Options

OptionDescription
Path to the audio file (required)
--model, -mGemini model to use (default: gemini-2.0-flash-lite)
--vertex, -vForce use of Vertex AI with ADC
--project, -pGCP project ID (for Vertex, defaults to gcloud config)
--region, -rGCP region (for Vertex, default: us-central1)

Supported Models

Any Gemini model that supports audio input can be used. Recommended models:

ModelNotes
gemini-2.0-flash-liteDefault. Fastest transcription speed.
gemini-2.0-flashFast and cost-effective.
gemini-2.5-flash-liteLightweight 2.5 model.
gemini-2.5-flashBalanced speed and quality.
gemini-2.5-proHigher quality, slower.
gemini-3-flash-previewLatest flash model.
gemini-3-pro-previewLatest pro model, best quality.
See Gemini API Models for the latest list.

How It Works

  • Reads the audio file and base64 encodes it
  • Auto-detects authentication:
- If ADC is available (gcloud), uses Vertex AI endpoint - Otherwise, uses GEMINI_API_KEY with direct Gemini API
  • Sends to the selected Gemini model with transcription prompt
  • Returns the transcribed text

Example Integration

For Clawdbot voice message handling:

# Transcribe incoming voice message
TRANSCRIPT=$(python ~/.claude/skills/gemini-stt/transcribe.py "$AUDIO_PATH")
echo "User said: $TRANSCRIPT"

Error Handling

The script exits with code 1 and prints to stderr on:

  • No authentication available (neither ADC nor GEMINI_API_KEY)
  • File not found
  • API errors
  • Missing GCP project (when using Vertex)

Notes

  • Uses Gemini 2.0 Flash Lite by default for fastest transcription
  • No external Python dependencies (uses stdlib only)
  • Automatically detects MIME type from file extension
  • Prefers Vertex AI with ADC when available (no API key management needed)
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务