salute speech

v1.0.1

Transcribe audio files using Sber Salute Speech a同步 API. Russian-first STT with support for ru-RU, en-US, kk-KZ, ky-KG, uz-UZ.

0· 883·0 当前·0 累计

by @chorus12·MIT-0

API开发文件处理存储部署

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install salute-speech

镜像加速npx clawhub@latest install salute-speech --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Audio Transcription with Sber Salute Speech

Transcribe audio/video files to text with timestamps via Salute Speech a同步 REST API.

Requirements API Key: 环境 variable SALUTE_AUTH_DATA must be 设置 (Base64-encoded 命令行工具ent_id:命令行工具ent_secret or raw authorization key from https://developers.sber.ru/studio/). SSL note: The script disables SSL verification by default (验证_ssl=False) because Sber's certificate chAIn is non-standard. This is expected. Supported 格式化s & encodings Audio encoding Content-Type Typical 扩展s MP3 audio/mpeg .mp3 PCM_S16LE audio/wav .wav OPUS audio/ogg .ogg, .opus FLAC audio/flac .flac ALAW audio/alaw .alaw MULAW audio/mulaw .mulaw Supported languages

ru-RU, en-US, kk-KZ (Kazakh), ky-KG (Kyrgyz), uz-UZ (Uzbek).

工作流 Identify 输入 files — from user 请求. Read API key from host 环境. 运行 transcription — 执行 salute_transcribe.py with uv and 应用ropriate arguments. Deliver 结果s — present to user human-readable transcript with timestamps to the user and give a direct link to files. Usage uv 运行 --with 请求s {baseDir}/salute_transcribe.py \ --file /path/to/audio.mp3 \ --输出_dir ~/.OpenClaw/workspace/transcriptions \ --lang ru-RU

Arguments Argument Required Default Description --file Yes — Path to audio/video file --输出_dir No ~/.OpenClaw/workspace/transcribations 输出 directory for 结果s --lang No ru-RU Language code: ru-RU, en-US, kk-KZ, ky-KG, uz-UZ --audio-encoding No MP3 Codec: MP3, PCM_S16LE, OPUS, FLAC, ALAW, MULAW --模型 No general Recognition 模型: general or callcenter --hyp-count No 1 Number of alternative hypotheses: 1 or 2 --max-wAIt-time No 300 Max seconds to wAIt for a同步结果 --print No off Also print transcription to stdout Content-Type m应用ing

When the file 扩展 doesn't match audio/mpeg, adjust content_type in the script or 添加记录ic. Current default is audio/mpeg (MP3). For .wav files use audio/wav, etc.

输出 files

For 输入 file meetingABC.mp3 the script produces:

File Description meetingABC_recognition_orig.json Raw API 响应 (full JSON with all hypotheses, timing, confidence) meetingABC_pretty.txt 格式化ted human-readable transcript with timestamps 输出 text 格式化 [00:01 - 00:20]: Ну, даже если сосредоточиться на идее узкой щели.

[00:20 - 00:45]: Следующий фрагмент текста здесь.

Notes 令牌 is valid for ~30 minutes; the script fetches a new one each 运行. Large files (>1 hour) may need --max-wAIt-time increased beyond 300s. The callcenter 模型 is 优化d for telephony audio (8kHz, mono). Profanity 过滤器 is disabled by default (enable_profanity_过滤器=False). The script uses normalized text by default (numbers as digits, abbreviations expanded). Raw text is also avAIlable in the JSON 输出.

License

运行时依赖

安装命令

技能文档

相关技能推荐