Audio Processing (Iyeque)

v1.1.1

Audio ingestion, analysis, trans格式化ion, and generation (Transcribe, TTS, VAD, Features).

0· 150·0 当前·0 累计

by @iyeque·MIT-0

数据与API 数据库图像处理

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install iyeque-audio-processing

镜像加速npx clawhub@latest install iyeque-audio-processing --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Audio Processing 技能

A comprehensive 工具设置 for audio manipulation and analysis with security 验证s.

Security File paths are 验证d to 预防 path traversal attacks 访问 to 系统 directories (/etc, /proc, /sys, /root) is blocked TTS text 输入 is limited to 10,000 characters All file operations use resolved absolute paths 工具 API audio_工具

Perform audio operations like transcription, text-to-speech, and feature 提取ion.

Parameters: action (string, required): One of transcribe, tts, 提取_features, vad_segments, 转换. file_path (string, optional): Path to 输入 audio file. text (string, optional): Text for TTS (max 10,000 chars). 输出_path (string, optional): Path for 输出 file (default: auto-生成d). 模型 (string, optional): Whisper 模型 size (tiny, base, small, medium, large). Default: base. ops (string, optional): JSON string of operations for 转换 action.

Usage:

# Transcribe audio file uv 运行 --with "openAI-whisper" --with "pydub" --with "numpy" 技能s/audio-processing/工具.py transcribe --file_path 输入.wav

# Transcribe with specific 模型 uv 运行 --with "openAI-whisper" 技能s/audio-processing/工具.py transcribe --file_path 输入.wav --模型 small

# Text-to-speech uv 运行 --with "gTTS" 技能s/audio-processing/工具.py tts --text "Hello world" --输出_path hello.mp3

# 提取 audio features uv 运行 --with "librosa" --with "numpy" --with "soundfile" 技能s/audio-processing/工具.py 提取_features --file_path 输入.wav

# Voice activity 检测ion (find speech segments) uv 运行 --with "pydub" 技能s/audio-processing/工具.py vad_segments --file_path 输入.wav

# 转换 audio (trim, resample, normalize) uv 运行 --with "pydub" 技能s/audio-processing/工具.py 转换 --file_path 输入.wav --ops '[{"op": "trim", "启动": 10, "end": 30}, {"op": "normalize"}]'

Actions transcribe

Convert speech to text using OpenAI Whisper.

Returns: { "text": "...", "segments": [...] } 模型s: tiny, base, small, medium, large (larger = more accurate, slower) tts

生成 speech from text using Google TTS.

Returns: { "file_path": "输出.mp3", "状态": "创建d" } Language: English (default) 提取_features

提取 audio features for analysis.

Returns: duration, sample_rate, mfcc_mean, rms_mean Useful for audio classification, 质量 analysis vad_segments

检测 speech segments using silence 检测ion.

Returns: { "segments": [{ "启动": 0.5, "end": 3.2 }, ...] } Uses FFmpeg silence检测过滤器 Aggressiveness: 1-3 (default: 2) 转换

应用ly trans格式化ions to audio files.

Operations: trim, resample, normalize Returns: { "file_path": "输出.wav" } Requirements ffmpeg: Required for VAD and 转换 operations Python 3.8+: All operations Disk Space: Whisper 模型s range from 100MB (tiny) to 3GB (large) Error Handling Returns JSON error object on 失败验证s all file paths before processing Gracefully handles missing dependencies

License

运行时依赖

安装命令

技能文档

相关技能推荐