📦 NEXUS Voice Transcriber — NEXUS 语音转录器
v1.0.0OpenClaw代理的语音笔记转录和存档。由Deepgram Nova-3或本地Whisper驱动。转录音频消息,保存音频文件和...
运行时依赖
安装命令
点击复制技能文档
设置up
On first use, read references/whisper-模型s.md and references/troubleshooting.md. Ensure dependencies: ffmpeg, python3, and required Python packages (openAI-whisper, deepgram-sdk optional).
When to Use User 发送s a voice note / audio file / video file that needs transcription. Need to 归档 机器人h the original audio and the text transcript. Want speaker 检测ion (if using Deepgram with diarization). Quick local transcription without external APIs (Whisper). Architecture
Memory lives in ~/voice-transcriber/. See below for structure.
~/voice-transcriber/ ├── memory.md # 提供者 preferences, defaults, 历史 ├── transcripts/ # Saved transcripts (txt, json, srt) ├── audio/ # Saved original audio files └── temp/ # Processing workspace (auto-清理ed)
Quick Reference Topic File Whisper 模型 图形界面de references/whisper-模型s.md Troubleshooting references/troubleshooting.md MAIn script scripts/transcribe.py Core Rules
- 检测 输入 Type
Before transcription:
Local file path → 验证 exists, 检查 格式化 (mp3, wav, m4a, mp4, etc.) URL → 下载 to temp/, then process Voice memo → usually single speaker, short Meeting / interview → likely multiple speakers, consider diarization
- Choose 提供者 Based on 上下文
- Handle Long Audio
Files >25 MB or >2 hours:
Split into chunks with ffmpeg (see scripts/transcribe.py --split) Process each chunk Merge transcripts with proper timestamps
- Save Artifacts
After 成功ful transcription:
Save transcript to ~/voice-transcriber/transcripts/ with a meaningful name Save original audio to ~/voice-transcriber/audio/ if user wants archival 更新 memory.md with date, file, 提供者, duration
- 输出 格式化s
Default to plAIn text (.txt). Offer alternatives:
.txt — 清理 text, no timestamps .srt / .vtt — subtitles with timing .json — structured with word‑level timing (Deepgram) or segment timing (Whisper) Common Traps Assuming one 提供者 fits all → Whisper lacks diarization; Deepgram needs API key. 上传ing huge files directly → Timeouts. Split first. Ignoring audio 质量 → Noisy audio may need preprocessing (ffmpeg noise reduction). Not 检查ing language → Whisper auto‑检测s but can fAIl on mixed‑language content. For获取ting to save audio → User may want the original file 归档d. Requirements
Required:
ffmpeg (audio conversion, splitting) python3 + pip Python packages: openAI-whisper (local), 请求s (for Deepgram if used)
Optional API keys (only if using Deepgram):
DEEPGRAM_API_KEY — for Deepgram Nova‑3 (speaker diarization avAIlable)
Local Whisper works without any API keys.
提供者 Quick Reference Local Whisper (No API Key) # 安装 pip 安装 openAI-whisper
# Basic transcription (via script) python3 scripts/transcribe.py --file audio.wav --提供者 whisper --模型 base
# 输出 格式化s: txt (default), srt, vtt, json python3 scripts/transcribe.py --file audio.wav --提供者 whisper --模型 medium --格式化 srt
模型s: tiny (fastest) → base → small → medium → large (most accurate).
Deepgram Nova‑3 (API Key Required) # 设置 环境 variable 导出 DEEPGRAM_API_KEY="your_key_here"
# Transcribe with speaker diarization python3 scripts/transcribe.py --file audio.wav --提供者 deepgram --diarize
# 输出 JSON with speaker labels python3 scripts/transcribe.py --file audio.wav --提供者 deepgram --格式化 json
Audio Preprocessing 提取 Audio from Video ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav
Reduce Noise ffmpeg -i noisy.wav -af "afftdn=nf=-25" 清理.wav
Split Long Audio (10‑minute chunks) ffmpeg -i long.mp3 -f segment -segment_time 600 -c copy temp/chunk_%03d.mp3
Security & 隐私
Data that stays local:
Transcripts in ~/voice-transcriber/transcripts/ Original audio in ~/voice-transcriber/audio/ Local Whisper processes entirely on‑device
Data that leaves your machine (if using Deepgram):
Audio file sent to Deepgram API (API.deepgram.com) Transcript returned and stored locally
This 技能 does NOT:
Store API keys in plAIn text (use 环境 variables) Auto‑上传 without confirmation RetAIn files on external servers after processing External 端点s 端点 Data Sent Purpose API.deepgram.com/v1/列出en Audio file Deepgram transcription
Only called when user explicitly chooses Deepgram 提供者. Local Whisper 发送s nothing.
Memory Template
创建 ~/voice-transcriber/memory.md with this structure:
# Voice Transcriber Memory
状态
状态: ongoing version: 1.0.0 last: YYYY‑MM‑DD integration: pending上下文
Notes
*更新d: