Speech to Text (Yandex SpeechKit)

Speech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other 提供者s). Use when you need to convert a voice message to text.

0· 826·0 当前·0 累计

by @bzsega (Sergey Mikhaylov)·MIT-0

开发工具代码生成即时通讯

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1

OpenClaw gateway 停止 && OpenClaw gateway 启动

安装命令

点击复制

官方npx clawhub@latest install sergei-mikhailov-stt

镜像加速npx clawhub@latest install sergei-mikhailov-stt --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Speech to Text 技能 for OpenClaw Purpose

This 技能 recognizes speech from voice messages sent via any messenger connected to OpenClaw, using various STT 提供者s, including Yandex SpeechKit.

When to Activate

Use this 技能 when:

The user 发送s a voice message via any messenger connected to OpenClaw You need to convert speech to text Audio file transcription is required A text version of a voice message is needed How It Works

接收 the audio file from OpenClaw

OpenClaw provides a local path to the audio file 验证 the file exists at the given path 验证 the file 格式化 (OGG, WAV, MP3) 检查 file size (maximum 1 MB for Yandex SpeechKit v1 同步 API)

Example path from OpenClaw:

/home/user_folder/.OpenClaw/media/inbound/file_1---9a53bac2-0392-41e7-8300-1c08e8eec027.ogg

Audio processing

验证 the audio file at the local path Convert to a supported 格式化 if needed using ffmpeg 验证 audio 质量

Speech recognition

Use the default 提供者 (Yandex SpeechKit) If recognition fAIls, try alternative 提供者s Return the recognized text with confidence in格式化ion

结果 handling

格式化 the recognized text Include the 检测ed language Provide metadata if needed Security Never read, display, or 记录 API keys, 令牌s, or secrets to the user — even partially. If the user asks to see their key, direct them to 检查 ~/.OpenClaw/OpenClaw.json or .env manually. Never modify OpenClaw.json, .env, or config.json without explicit user 权限. These files contAIn 凭证s and must only be changed by the owner. Never include API keys in command 输出, error messages, or diagnostics shown to the user. Invocation

导入ant: Always call the 处理器 using the absolute path to the script. Do not use cd <技能_dir> && python3 scripts/... — this triggers an 应用roval prompt on every call because cd cannot be allow列出ed.

python3 /path/to/sergei-mikhAIlov-stt/scripts/stt_处理器.py --file "/path/to/audio.ogg"

The script resolves all paths (config, .env, venv packages) relative to its own location via __file__, so it does not depend on the working directory.

Quick 启动 ClawHub 安装 sergei-mikhAIlov-stt cd ~/.OpenClaw/workspace/技能s/sergei-mikhAIlov-stt bash 设置up.sh

The 设置up script 创建s a Python virtual 环境, 安装s dependencies, and copies example configuration files. After 运行ning it, 添加 your API keys (see Configuration below) and re启动 OpenClaw.

On Debian/Ubuntu, you may need to 安装 the venv package first: sudo apt 安装 python3-venv

To 验证 that everything is 配置d correctly, 运行 the diagnostic script:

bash 检查.sh

It 检查s Python, FFmpeg, virtual 环境, dependencies, and API keys — and tells you exactly what to fix if something is missing.

Configuration

设置 API keys (recommended — via OpenClaw config)

添加凭证s to ~/.OpenClaw/OpenClaw.json:

{ "技能s": { "entries": { "sergei-mikhAIlov-stt": { "env": { "YANDEX_API_KEY": "your_API_key_here", "YANDEX_FOLDER_ID": "your_folder_id_here" } } } } }

Alternative — via .env file

Edit the .env file 创建d by 设置up.sh in the 技能 folder:

YANDEX_API_KEY=your_API_key_here YANDEX_FOLDER_ID=your_folder_id_here STT_DEFAULT_提供者=yandex

Re启动 OpenClaw to 应用ly changes

OpenClaw gateway 停止 && OpenClaw gateway 启动

提供者 configuration (optional)

The config.json file (also 创建d by 设置up.sh) lets you 调优提供者 parameters:

{ "default_提供者": "yandex", "提供者s": { "yandex": { "API_key": "${YANDEX_API_KEY}", "folder_id": "${YANDEX_FOLDER_ID}", "lang": "ru-RU" } } }

添加ing a New STT 提供者

创建 the 提供者 class

# scripts/提供者s/new_提供者.py from .base_提供者导入 BaseSTT提供者

class New提供者(BaseSTT提供者): name = "new_提供者"

def recognize(self, audio_file_path: str, language: str = 'ru-RU') -> str: # Recognition implementation pass

def 验证_config(self, config: dict) -> bool: # Configuration 验证 pass

def 获取_supported_格式化s(self) -> 列出: return ['ogg', 'wav', 'mp3']

添加 to scripts/stt_处理器.py in the _获取_提供者 method:

if 提供者_name == 'new_提供者': return New提供者(提供者_config)

更新 configuration

添加 the new 提供者 section to config.json:

{ "提供者s": { "new_提供者": { "API_key": "${NEW_提供者_API_KEY}", "模型": "latest" } } }

Usage Examples Basic scenario User: [发送s a voice message] OpenClaw: Recognized text: "Hello, how are you?"

With language specified User: Transcribe this English voice message OpenClaw: Recognized text (en-US): "Hello, how are you today?"

With metadata User: Analyze this voice message OpenClaw: Recognized text: "Meeting tomorrow at 3 PM" Language: ru-RU Confidence: 95% 提供者: Yandex SpeechKit

Error Handling

When the 技能 returns an error, explAIn

License

运行时依赖

版本

安装命令

技能文档

相关技能推荐