Zhipu Asr — 技能工具

Name: Zhipu Asr — 技能工具
Author: xiaofei

xiaofei

Zhipu Asr — 技能工具

v1.0.2

[自动翻译] Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audi...

0· 777·4 当前·5 累计

by @franklu0819-lang (xiaofei)·MIT-0

AI模型访问自动化文件处理开发工具

下载技能包

License

MIT-0

最后更新

2026/3/13

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

The skill is internally consistent for an ASR integration: it converts audio locally and uploads it to Zhipu's transcription API using a single API key; nothing in the code or docs requests unrelated secrets or system access.

评估建议

This skill appears to do exactly what it says: convert audio locally (ffmpeg) and upload it to Zhipu's transcription API using ZHIPU_API_KEY. Before installing, confirm you trust the destination (open.bigmodel.cn) and are comfortable sending audio (which may contain sensitive PII) to that external service. Ensure ffmpeg/curl/jq are installed from trusted package sources. Note the small metadata inconsistencies (package.json/_meta.json versions and declared required bins) — these look like packag...

详细分析 ▾

ℹ 用途与能力

The name/description, SKILL.md, and the included shell script all align with an ASR transcription skill. The only minor inconsistency is metadata in package.json/_meta.json (and package.json lists only jq in openclaw.requires while the script also requires curl and ffmpeg); this looks like sloppy packaging rather than malicious behavior.

✓ 指令范围

Runtime instructions and the script stay within scope: they read the provided audio file, optionally convert it with ffmpeg, check size/duration, and POST the file and optional context/hotwords to the declared Zhipu API endpoint. The script does not read other environment variables, system files, or contact unexpected endpoints.

✓ 安装机制

There is no install spec (instruction-only with a shipped shell script). That is low-risk; no external archives or binaries are downloaded by the skill itself. The script requires standard system tools (jq, curl, ffmpeg) already expected for this functionality.

✓ 凭证需求

Only ZHIPU_API_KEY is requested, which is proportionate to uploading audio to Zhipu's API. No other secrets or unrelated credentials are required or referenced.

✓ 持久化与权限

always is false and the skill does not attempt to modify other skills or system settings. It does not request persistent elevated presence beyond normal user invocation.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.22026/2/21

Added curl and ffmpeg to requirements; added 30s duration check; improved script cleanup and silent transcoding.

● 无害

安装命令点击复制

官方npx clawhub@latest install zhipu-asr

镜像加速npx clawhub@latest install zhipu-asr --registry https://cn.clawhub-mirror.com

技能文档

Transcribe Chinese audio files to text using Zhipu AI's GLM-ASR model.

Setup

1. Get your API Key: Get a key from Zhipu AI Console

2. Set it in your environment:

export ZHIPU_API_KEY="your-key-here"

Supported Audio Formats

WAV - Recommended, best quality
MP3 - Widely supported
OGG - Auto-converted to MP3
M4A - Auto-converted to MP3
AAC - Auto-converted to MP3
FLAC - Auto-converted to MP3
WMA - Auto-converted to MP3

Note: The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.

File Constraints

Maximum file size: 25 MB
Maximum duration: 30 seconds
Recommended sample rate: 16000 Hz or higher
Audio channels: Mono or stereo

Usage

Basic Transcription

Transcribe an audio file with default settings:

bash scripts/speech_to_text.sh recording.wav

Transcription with Context

Provide previous transcription or context for better accuracy:

bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容，有助于提高准确性"

Transcription with Hotwords

Use custom vocabulary to improve recognition of specific terms:

bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"

Full Options

Combine context and hotwords:

bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"

Parameters:

audio_file (required): Path to audio file (.wav or .mp3)
prompt (optional): Previous transcription or context text (max 8000 chars)
hotwords (optional): Comma-separated list of specific terms (max 100 words)

Features

Context Prompts

Why use context prompts:

Improves accuracy in long conversations
Helps with domain-specific terminology
Maintains consistency across multiple segments

When to use:

Multi-part conversations or meetings
Technical or specialized content
Continuing from previous transcriptions

Example:

bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容：讨论了项目进展和下一步计划"

Hotwords

What are hotwords: Custom vocabulary list that boosts recognition accuracy for specific terms.

Best use cases:

Proper names (people, places)
Domain-specific terminology
Company names and products
Technical jargon
Industry-specific terms

Examples:

# Medical transcription bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案" # Business meeting bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"

# Tech discussion bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"

Workflow Examples

Transcribe a Meeting

# Part 1 bash scripts/speech_to_text.sh meeting_part1.wav # Part 2 with context bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"

# Part 3 with context bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"

Transcribe a Lecture

bash scripts/speech_to_text.sh lecture.wav "" "教授,课程名称,专业术语1,专业术语2"

Process Multiple Files

for file in recording_*.wav; do
    bash scripts/speech_to_text.sh "$file"
done

Audio Quality Tips

Best practices for accurate transcription:

Clear audio source

- Minimize background noise - Use good quality microphone - Speak clearly and at moderate pace

Optimal audio settings

- Sample rate: 16000 Hz or higher - Bit depth: 16-bit or higher - Single channel (mono) is sufficient

File preparation

- Remove silence from beginning/end - Normalize audio levels - Ensure consistent volume

Output Format

The script outputs JSON with:

id: Task ID
created: Request timestamp (Unix timestamp)
request_id: Unique request identifier
model: Model name used
text: Transcribed text

Example output:

{
  "id": "task-12345",
  "created": 1234567890,
  "request_id": "req-abc123",
  "model": "glm-asr-2512",
  "text": "你好，这是转录的文本内容"
}

Troubleshooting

File Size Issues:

Split audio files larger than 25 MB
Reduce sample rate or bit depth
Use compression (MP3) for smaller files

Duration Issues:

Split recordings longer than 30 seconds
Process segments separately
Use context prompts to maintain continuity

Poor Accuracy:

Improve audio quality
Use hotwords for specific terms
Provide context prompts
Ensure clear speech and minimal noise

Format Issues:

Ensure file is .wav or .mp3
Check file is not corrupted
Verify audio can be played by standard players

Limitations

Maximum audio duration: 30 seconds per request
File size limit: 25 MB
Maximum hotwords: 100 terms
Context prompt limit: 8000 characters
Best performance with Chinese language audio

Performance Notes

Typical transcription time: 1-3 seconds
Real-time or faster for most audio
Processing time scales with audio quality and length

Transcribe Chinese audio files to text using Zhipu AI's GLM-ASR model.

Setup

1. Get your API Key: Get a key from Zhipu AI Console

2. Set it in your environment:

export ZHIPU_API_KEY="your-key-here"

Supported Audio Formats

WAV - Recommended, best quality
MP3 - Widely supported
OGG - Auto-converted to MP3
M4A - Auto-converted to MP3
AAC - Auto-converted to MP3
FLAC - Auto-converted to MP3
WMA - Auto-converted to MP3

Note: The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.

File Constraints

Maximum file size: 25 MB
Maximum duration: 30 seconds
Recommended sample rate: 16000 Hz or higher
Audio channels: Mono or stereo

Usage

Basic Transcription

Transcribe an audio file with default settings:

bash scripts/speech_to_text.sh recording.wav

Transcription with Context

Provide previous transcription or context for better accuracy:

bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容，有助于提高准确性"

Transcription with Hotwords

Use custom vocabulary to improve recognition of specific terms:

bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"

Full Options

Combine context and hotwords:

bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"

Parameters:

audio_file (required): Path to audio file (.wav or .mp3)
prompt (optional): Previous transcription or context text (max 8000 chars)
hotwords (optional): Comma-separated list of specific terms (max 100 words)

Features

Context Prompts

Why use context prompts:

Improves accuracy in long conversations
Helps with domain-specific terminology
Maintains consistency across multiple segments

When to use:

Multi-part conversations or meetings
Technical or specialized content
Continuing from previous transcriptions

Example:

bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容：讨论了项目进展和下一步计划"

Hotwords

What are hotwords: Custom vocabulary list that boosts recognition accuracy for specific terms.

Best use cases:

Proper names (people, places)
Domain-specific terminology
Company names and products
Technical jargon
Industry-specific terms

Examples:

# Medical transcription bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案" # Business meeting bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"

# Tech discussion bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"

Workflow Examples

Transcribe a Meeting

# Part 1 bash scripts/speech_to_text.sh meeting_part1.wav # Part 2 with context bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"

# Part 3 with context bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"

Transcribe a Lecture

bash scripts/speech_to_text.sh lecture.wav "" "教授,课程名称,专业术语1,专业术语2"

Process Multiple Files

for file in recording_*.wav; do
    bash scripts/speech_to_text.sh "$file"
done

Audio Quality Tips

Best practices for accurate transcription:

Clear audio source

- Minimize background noise - Use good quality microphone - Speak clearly and at moderate pace

Optimal audio settings

- Sample rate: 16000 Hz or higher - Bit depth: 16-bit or higher - Single channel (mono) is sufficient

File preparation

- Remove silence from beginning/end - Normalize audio levels - Ensure consistent volume

Output Format

The script outputs JSON with:

id: Task ID
created: Request timestamp (Unix timestamp)
request_id: Unique request identifier
model: Model name used
text: Transcribed text

Example output:

{
  "id": "task-12345",
  "created": 1234567890,
  "request_id": "req-abc123",
  "model": "glm-asr-2512",
  "text": "你好，这是转录的文本内容"
}

Troubleshooting

File Size Issues:

Split audio files larger than 25 MB
Reduce sample rate or bit depth
Use compression (MP3) for smaller files

Duration Issues:

Split recordings longer than 30 seconds
Process segments separately
Use context prompts to maintain continuity

Poor Accuracy:

Improve audio quality
Use hotwords for specific terms
Provide context prompts
Ensure clear speech and minimal noise

Format Issues:

Ensure file is .wav or .mp3
Check file is not corrupted
Verify audio can be played by standard players

Limitations

Maximum audio duration: 30 seconds per request
File size limit: 25 MB
Maximum hotwords: 100 terms
Context prompt limit: 8000 characters
Best performance with Chinese language audio

Performance Notes

Typical transcription time: 1-3 seconds
Real-time or faster for most audio
Processing time scales with audio quality and length

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Setup

Supported Audio Formats

File Constraints

Usage

Basic Transcription

Transcription with Context

Transcription with Hotwords

Full Options

Features

Context Prompts

Hotwords

Workflow Examples

Transcribe a Meeting

Transcribe a Lecture

Process Multiple Files

Audio Quality Tips

Output Format

Troubleshooting

Limitations

Performance Notes

Setup

Supported Audio Formats

File Constraints

Usage

Basic Transcription

Transcription with Context

Transcription with Hotwords

Full Options

Features

Context Prompts

Hotwords

Workflow Examples

Transcribe a Meeting

Transcribe a Lecture

Process Multiple Files

Audio Quality Tips

Output Format

Troubleshooting

Limitations

Performance Notes

安装命令点击复制