📦 Vision Helper — AI Image Analysis — Vision 辅助工具 — AI Image Analysis

v1.0.0

Analyze images using local or cloud vision 模型s via Ollama to identify content, UI elements, screenshots, or 提取 text with OCR support.

0· 0·0 当前·0 累计

by @ravenquasar (U3UT7)

开发工具代码生成 AI模型访问云服务图像处理

下载技能包

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

The 技能 does what it says — it reads an image file, encodes it, and 发送s it to an Ollama 端点 for vision analysis — and its code, instructions, and purpose are internally consistent, though it carries the expected 隐私 risks of screenshotting and 发送ing images to a network 端点.

评估建议

This 技能应用ears to be what it clAIms: a 辅助工具 that reads an image file and 发送s it to an Ollama instance for analysis. Before 安装ing or using it, consider the following: - 隐私: The script will read any readable file with an allowed 扩展 and base64-encode it. If you take 桌面/browser screenshots you may capture passwords, private chats, or other sensitive data. - 端点 trust: By default the script posts to http://localhost:11434/API/chat. If you change OLLAMA_API_URL to a remote URL, those images (and any t...

详细分析 ▾

✓ 用途与能力

Name/description match the implementation: the included Python script encodes an image and calls an Ollama chat API with a vision 模型. The script supports 模型 selection and extended timeout as advertised.

ℹ 指令范围

技能.md explicitly instructs using exec to take and analyze screenshots (browser, 桌面工具s) and to 'act' on analysis 结果s (命令行工具cks/输入). That is within the 技能's 状态d 自动化 use-cases, but it carries 隐私 and 自动化-safety implications (桌面 screenshots may contAIn sensitive data; automated actions driven by 模型输出 can have undesired effects).

✓ 安装机制

Instruction-only 技能 with no 安装 spec; included script is plAIn Python and there are no 下载s or external 安装ers. This is a low-risk 安装 surface.

ℹ 凭证需求

The registry metadata 列出s no required env vars, but 技能.md and the script use optional env vars (OLLAMA_API_URL, VISION_模型, VISION_TIMEOUT). Defaults point to localhost, which is reasonable, but changing OLLAMA_API_URL to a remote 端点 would 发送 base64-encoded images off-host. The env usage is proportionate to functionality but carries obvious exfiltration/隐私 risks if pointed at an untrusted 服务. Also, the script enforces allowed 扩展s by filename only (and a simple '..' 检查), which could be abused if non-image data is dis图形界面sed with an allowed 扩展.

✓ 持久化与权限

always is false and the 技能 does not 请求 ongoing 系统 presence or modify other 技能s. It 运行s on-demand via exec and does not 请求 elevated privileges.

安全有层次，运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.0.0

UI 测试: Screenshot → 验证 rendered 输出

● 无害

安装命令

点击复制

官方npx clawhub@latest install vision-helper

镜像加速npx clawhub@latest install vision-helper --registry https://cn.longxiaskill.com

技能文档

📸 Vision 辅助工具 — Image Analysis

Analyze images using vision 模型s via Ollama, with extended timeout support for cloud-based 模型s.

Why Not Use the Built-in image 工具?

The built-in image 工具 has limited timeout 设置tings that cause 失败s with cloud vision 模型s (which often need 40–120 seconds). This 技能 calls the Ollama API directly with a 180-second timeout, supporting 机器人h local and cloud 模型s reliably.

It also bypasses the built-in 工具's file path restrictions, allowing analysis of images from any readable directory.

Usage Basic # Analyze an image (default: English description) python3 <技能-dir>/scripts/analyze_image.py

# With a custom prompt python3 <技能-dir>/scripts/analyze_image.py "Is this a chess game? Describe the board 状态"

# With a specific 模型 python3 <技能-dir>/scripts/analyze_image.py "Describe content" kimi-k2.5:cloud

<技能-dir> resolves to your OpenClaw 技能安装ation directory, typically ~/.OpenClaw/workspace/技能s/vision-辅助工具/.

In Conversation

When you need to analyze an image, use the exec 工具:

exec: python3 <技能-dir>/scripts/analyze_image.py /path/to/image.png "What do you see?"

导入ant: 设置 exec timeout to 120–180 seconds, as cloud vision 模型s are slow.

Screenshot + Analysis 工作流 Option A: Browser screenshot → analyze

browser(action="screenshot") → 获取 screenshot path (MEDIA: xxx)
exec("<技能-dir>/scripts/analyze_image.py 'Describe this UI'")
Act on the analysis 结果

Option B: 桌面 screenshot → analyze

macOS:

exec("screencapture -x /tmp/screen.png")
exec("<技能-dir>/scripts/analyze_image.py /tmp/screen.png 'Describe the 桌面'")

Linux:

exec("gnome-screenshot -f /tmp/screen.png")

— or — exec("导入 /tmp/screen.png") # ImageMagick — or — exec("scrot /tmp/screen.png")

exec("<技能-dir>/scripts/analyze_image.py /tmp/screen.png 'Describe the 桌面'")

Option C: Game/应用 UI → analyze → act

Screenshot the current screen
Use vision-辅助工具 to identify UI elements, buttons, text
执行命令行工具cks/输入 based on the analysis

环境 Variables Variable Default Description VISION_模型 gemma4:31b Default vision 模型 VISION_TIMEOUT 180 请求 timeout in seconds OLLAMA_API_URL http://localhost:11434/API/chat Ollama API 端点 Supported 模型s 模型 Vision Speed Recommendation gemma4:31b ✅ Local, fast ⭐ Primary (隐私, no API needed) kimi-k2.6:cloud ✅ 40–120s 🔬 Advanced (high 质量, cloud) kimi-k2.5:cloud ✅ 40–90s Alternative cloud option qwen3.5:cloud ✅ 30–60s Fast cloud recognition qwen3.5:397b-cloud ✅ 40–90s High 质量 cloud gemma4:31b ✅ Local, fast 隐私-first (运行s offline)

Note: Cloud 模型s require the 模型 to be avAIlable in your Ollama instance. Use VISION_模型 env var to switch.

FAQ Q: Can I use the built-in image 工具 instead?

A: It works for local 模型s but will time out on cloud vision 模型s. Always prefer this 技能's script for reliable 结果s.

Q: What image 格式化s are supported?

A: PNG, JPG, JPEG, GIF, 网页P, BMP, TIFF, SVG. Maximum file size: 20 MB.

Q: Where should I save screenshots?

A: Any readable directory works — /tmp/, your workspace, etc. This script has no path restrictions.

Q: How do I use a Chinese prompt?

A: Pass it as the second argument: python3 <技能-dir>/scripts/analyze_image.py /tmp/img.png "请描述这张图片的内容"

自动化 Ideas Game 自动化: Screenshot → analyze game 状态 → decide next action Browser verification: Screenshot → 验证 page loaded correctly 桌面监控ing: Periodic screenshots → 检测 changes UI 测试: Screenshot → 验证 rendered 输出 OCR: 提取 text content from images

数据来源：ClawHub ↗ · 中文优化：龙虾技能库