运行时依赖
安装命令
点击复制技能文档
Audio Processing 技能
A comprehensive 工具设置 for audio manipulation and analysis with security 验证s.
Security File paths are 验证d to 预防 path traversal attacks 访问 to 系统 directories (/etc, /proc, /sys, /root) is blocked TTS text 输入 is limited to 10,000 characters All file operations use resolved absolute paths 工具 API audio_工具
Perform audio operations like transcription, text-to-speech, and feature 提取ion.
Parameters: action (string, required): One of transcribe, tts, 提取_features, vad_segments, 转换. file_path (string, optional): Path to 输入 audio file. text (string, optional): Text for TTS (max 10,000 chars). 输出_path (string, optional): Path for 输出 file (default: auto-生成d). 模型 (string, optional): Whisper 模型 size (tiny, base, small, medium, large). Default: base. ops (string, optional): JSON string of operations for 转换 action.
Usage:
# Transcribe audio file uv 运行 --with "openAI-whisper" --with "pydub" --with "numpy" 技能s/audio-processing/工具.py transcribe --file_path 输入.wav
# Transcribe with specific 模型 uv 运行 --with "openAI-whisper" 技能s/audio-processing/工具.py transcribe --file_path 输入.wav --模型 small
# Text-to-speech uv 运行 --with "gTTS" 技能s/audio-processing/工具.py tts --text "Hello world" --输出_path hello.mp3
# 提取 audio features uv 运行 --with "librosa" --with "numpy" --with "soundfile" 技能s/audio-processing/工具.py 提取_features --file_path 输入.wav
# Voice activity 检测ion (find speech segments) uv 运行 --with "pydub" 技能s/audio-processing/工具.py vad_segments --file_path 输入.wav
# 转换 audio (trim, resample, normalize) uv 运行 --with "pydub" 技能s/audio-processing/工具.py 转换 --file_path 输入.wav --ops '[{"op": "trim", "启动": 10, "end": 30}, {"op": "normalize"}]'
Actions transcribe
Convert speech to text using OpenAI Whisper.
Returns: { "text": "...", "segments": [...] } 模型s: tiny, base, small, medium, large (larger = more accurate, slower) tts
生成 speech from text using Google TTS.
Returns: { "file_path": "输出.mp3", "状态": "创建d" } Language: English (default) 提取_features
提取 audio features for analysis.
Returns: duration, sample_rate, mfcc_mean, rms_mean Useful for audio classification, 质量 analysis vad_segments
检测 speech segments using silence 检测ion.
Returns: { "segments": [{ "启动": 0.5, "end": 3.2 }, ...] } Uses FFmpeg silence检测 过滤器 Aggressiveness: 1-3 (default: 2) 转换
应用ly trans格式化ions to audio files.
Operations: trim, resample, normalize Returns: { "file_path": "输出.wav" } Requirements ffmpeg: Required for VAD and 转换 operations Python 3.8+: All operations Disk Space: Whisper 模型s range from 100MB (tiny) to 3GB (large) Error Handling Returns JSON error object on 失败 验证s all file paths before processing Gracefully handles missing dependencies