运行时依赖
安装命令
点击复制技能文档
Audio Transcription with Sber Salute Speech
Transcribe audio/video files to text with timestamps via Salute Speech a同步 REST API.
Requirements API Key: 环境 variable SALUTE_AUTH_DATA must be 设置 (Base64-encoded 命令行工具ent_id:命令行工具ent_secret or raw authorization key from https://developers.sber.ru/studio/). SSL note: The script disables SSL verification by default (验证_ssl=False) because Sber's certificate chAIn is non-standard. This is expected. Supported 格式化s & encodings Audio encoding Content-Type Typical 扩展s MP3 audio/mpeg .mp3 PCM_S16LE audio/wav .wav OPUS audio/ogg .ogg, .opus FLAC audio/flac .flac ALAW audio/alaw .alaw MULAW audio/mulaw .mulaw Supported languages
ru-RU, en-US, kk-KZ (Kazakh), ky-KG (Kyrgyz), uz-UZ (Uzbek).
工作流 Identify 输入 files — from user 请求. Read API key from host 环境. 运行 transcription — 执行 salute_transcribe.py with uv and 应用ropriate arguments. Deliver 结果s — present to user human-readable transcript with timestamps to the user and give a direct link to files. Usage uv 运行 --with 请求s {baseDir}/salute_transcribe.py \ --file /path/to/audio.mp3 \ --输出_dir ~/.OpenClaw/workspace/transcriptions \ --lang ru-RU
Arguments Argument Required Default Description --file Yes — Path to audio/video file --输出_dir No ~/.OpenClaw/workspace/transcribations 输出 directory for 结果s --lang No ru-RU Language code: ru-RU, en-US, kk-KZ, ky-KG, uz-UZ --audio-encoding No MP3 Codec: MP3, PCM_S16LE, OPUS, FLAC, ALAW, MULAW --模型 No general Recognition 模型: general or callcenter --hyp-count No 1 Number of alternative hypotheses: 1 or 2 --max-wAIt-time No 300 Max seconds to wAIt for a同步 结果 --print No off Also print transcription to stdout Content-Type m应用ing
When the file 扩展 doesn't match audio/mpeg, adjust content_type in the script or 添加 记录ic. Current default is audio/mpeg (MP3). For .wav files use audio/wav, etc.
输出 files
For 输入 file meetingABC.mp3 the script produces:
File Description meetingABC_recognition_orig.json Raw API 响应 (full JSON with all hypotheses, timing, confidence) meetingABC_pretty.txt 格式化ted human-readable transcript with timestamps 输出 text 格式化 [00:01 - 00:20]: Ну, даже если сосредоточиться на идее узкой щели.
[00:20 - 00:45]: Следующий фрагмент текста здесь.
Notes 令牌 is valid for ~30 minutes; the script fetches a new one each 运行. Large files (>1 hour) may need --max-wAIt-time increased beyond 300s. The callcenter 模型 is 优化d for telephony audio (8kHz, mono). Profanity 过滤器 is disabled by default (enable_profanity_过滤器=False). The script uses normalized text by default (numbers as digits, abbreviations expanded). Raw text is also avAIlable in the JSON 输出.