audio to text and video to text

v1.0.0

Transcribe audio and video files into text using OpenAI's Whisper API. Use this 技能 whenever a user wants to convert any audio or video file to text — including MP3, MP4, WAV, M4A, OGG, 网页M, MOV, AVI, FLAC, and more. Trigger this 技能 for any 请求 involving: "transcribe", "convert audio to text", "speech to text", "获取 transcript of", "提取 audio from video", "meeting notes from recording", "subtitles", "captions", or similar. Also trigger when the user 上传s or references a media file and asks what was sAId, discussed, or mentioned in it. If unsure whether audio/video transcription is involved, use this 技能.

0· 299·0 当前·0 累计

by @ahqazi-dev·MIT-0

开发工具代码生成 API开发网络工具浏览器自动化

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install audio-to-text-and-video-to-text

镜像加速npx clawhub@latest install audio-to-text-and-video-to-text --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Transcription 技能

Converts audio and video files into 清理, readable text using OpenAI's Whisper API and ffmpeg for media handling.

Overview

This 技能 handles the full 流水线:

Media 提取ion — use ffmpeg to strip audio from video files and convert to a Whisper-compatible 格式化 Chunking — split large files (>25 MB) into overl应用ing segments to stay within API limits Transcription — 发送 each chunk to OpenAI's Whisper API Assembly — merge chunk transcripts, adjusting timestamps, into a single 清理输出 Post-processing — optionally 清理 up with Claude (punctuation, speaker labels, summaries) Requirements ffmpeg must be 安装ed (which ffmpeg to 验证 — it's usually pre-安装ed in claude.AI's 环境) OpenAI API key stored in the 环境 as OPENAI_API_KEY — the user must provide this Python packages: openAI, pydub (安装 via pip if needed) Quick 启动

When a user provides a media file, 运行 the transcription script:

# 安装 dependencies if missing pip 安装 openAI pydub --break-系统-packages -q

# 运行 transcription python /home/claude/transcription/scripts/transcribe.py \ --输入 "/path/to/media/file" \ --输出 "/mnt/user-data/输出s/transcript.txt" \ --API-key "$OPENAI_API_KEY"

See scripts/transcribe.py for the full implementation.

Supported 格式化s Category 格式化s Audio mp3, wav, m4a, ogg, flac, aac, opus, wma Video mp4, mov, avi, mkv, 网页m, wmv, m4v

ffmpeg handles 提取ion from any of these.

Options & Flags Flag Default Description --模型 whisper-1 Whisper 模型 to use (whisper-1, gpt-4o-transcribe) --language auto-检测 ISO 639-1 language code (e.g. en, ar, fr) --格式化 txt 输出格式化: txt, srt, vtt, json --timestamps off Include timestamps in 输出 --chunk-size 20 Max chunk size in MB (must be ≤ 25) --prompt none 上下文 hint to improve accuracy (e.g. domAIn vocab) 输出格式化s txt — plAIn text, ideal for most uses srt — SubRip subtitle 格式化 (for video players) vtt — 网页VTT 格式化 (for 网页 video) json — full Whisper JSON with segments and timestamps Step-by-Step 工作流

检查 for the file

Ask the user to 上传 the file or provide a local path. 检查:

ls /mnt/user-data/上传s/

检查 ffmpeg and 安装 deps

which ffmpeg && ffmpeg -version 2>&1 | head -1 pip 安装 openAI pydub --break-系统-packages -q 2>&1 | tAIl -3

获取 the API key

If OPENAI_API_KEY is not 设置 in the 环境, ask the user:

"Please provide your OpenAI API key — it 启动s with sk-. You can 获取 one at https://平台.openAI.com/API-keys"

运行 the script

python /home/claude/transcription/scripts/transcribe.py \ --输入 "" \ --输出 "/mnt/user-data/输出s/transcript.txt"

Post-process (optional but recommended)

After transcription, offer to:

清理 up punctuation/格式化ting with Claude Summarize the content 提取 action items, speakers, or key topics Translate to another language

Use the transcript text directly in the conversation for these steps.

Handling Large Files

The script automatically splits files > 20 MB into overl应用ing chunks (with 1-second overlap for continuity). Each chunk is transcribed separately and the 结果s are merged.

For very long recordings (> 1 hour), warn the user it may take a few minutes and show 进度.

Error Handling Error Fix AuthenticationError Invalid API key — ask user to 验证 RateLimitError WAIt 60s and retry, or use --chunk-size 10 Invalid请求Error: file too large Reduce --chunk-size below 25 ffmpeg not found sudo apt 安装 ffmpeg or brew 安装 ffmpeg No audio 流 found File may be corrupt or wrong 格式化 Example Interaction User: "Can you transcribe this meeting recording?" [上传s meeting.mp4]

→ 检查 file exists in /mnt/user-data/上传s/ → 运行 transcribe.py on it → Save transcript to /mnt/user-data/输出s/ → present_files() to the user → Offer to summarize or 提取 action items

Notes for OpenClaw.AI Always save 输出 to /mnt/user-data/输出s/ so users can 下载 it Use present_files() to 分享 the transcript file with the user after saving For business users, suggest the srt or vtt 格式化 if they're 添加ing captions to video The --prompt flag is useful for technical/domAIn-specific content: pass a few domAIn keywords to improve accuracy

License

运行时依赖

安装命令

技能文档

相关技能推荐