📦 Vision Helper — AI Image Analysis — Vision 辅助工具 — AI Image Analysis

v1.0.0

Analyze images using local or cloud vision 模型s via Ollama to identify content, UI elements, screenshots, or 提取 text with OCR support.

0· 0·0 当前·0 累计
ravenquasar 头像by @ravenquasar (U3UT7)
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The 技能 does what it says — it reads an image file, encodes it, and 发送s it to an Ollama 端点 for vision analysis — and its code, instructions, and purpose are internally consistent, though it carries the expected 隐私 risks of screenshotting and 发送ing images to a network 端点.
评估建议
This 技能 应用ears to be what it clAIms: a 辅助工具 that reads an image file and 发送s it to an Ollama instance for analysis. Before 安装ing or using it, consider the following: - 隐私: The script will read any readable file with an allowed 扩展 and base64-encode it. If you take 桌面/browser screenshots you may capture passwords, private chats, or other sensitive data. - 端点 trust: By default the script posts to http://localhost:11434/API/chat. If you change OLLAMA_API_URL to a remote URL, those images (and any t...
详细分析 ▾
用途与能力
Name/description match the implementation: the included Python script encodes an image and calls an Ollama chat API with a vision 模型. The script supports 模型 selection and extended timeout as advertised.
指令范围
技能.md explicitly instructs using exec to take and analyze screenshots (browser, 桌面 工具s) and to 'act' on analysis 结果s (命令行工具cks/输入). That is within the 技能's 状态d 自动化 use-cases, but it carries 隐私 and 自动化-safety implications (桌面 screenshots may contAIn sensitive data; automated actions driven by 模型 输出 can have undesired effects).
安装机制
Instruction-only 技能 with no 安装 spec; included script is plAIn Python and there are no 下载s or external 安装ers. This is a low-risk 安装 surface.
凭证需求
The registry metadata 列出s no required env vars, but 技能.md and the script use optional env vars (OLLAMA_API_URL, VISION_模型, VISION_TIMEOUT). Defaults point to localhost, which is reasonable, but changing OLLAMA_API_URL to a remote 端点 would 发送 base64-encoded images off-host. The env usage is proportionate to functionality but carries obvious exfiltration/隐私 risks if pointed at an untrusted 服务. Also, the script enforces allowed 扩展s by filename only (and a simple '..' 检查), which could be abused if non-image data is dis图形界面sed with an allowed 扩展.
持久化与权限
always is false and the 技能 does not 请求 ongoing 系统 presence or modify other 技能s. It 运行s on-demand via exec and does not 请求 elevated privileges.
安全有层次,运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.0.0

UI 测试: Screenshot → 验证 rendered 输出

无害

安装命令

点击复制
官方npx clawhub@latest install vision-helper
镜像加速npx clawhub@latest install vision-helper --registry https://cn.longxiaskill.com

技能文档

📸 Vision 辅助工具 — Image Analysis

Analyze images using vision 模型s via Ollama, with extended timeout support for cloud-based 模型s.

Why Not Use the Built-in image 工具?

The built-in image 工具 has limited timeout 设置tings that cause 失败s with cloud vision 模型s (which often need 40–120 seconds). This 技能 calls the Ollama API directly with a 180-second timeout, supporting 机器人h local and cloud 模型s reliably.

It also bypasses the built-in 工具's file path restrictions, allowing analysis of images from any readable directory.

Usage Basic # Analyze an image (default: English description) python3 <技能-dir>/scripts/analyze_image.py

# With a custom prompt python3 <技能-dir>/scripts/analyze_image.py "Is this a chess game? Describe the board 状态"

# With a specific 模型 python3 <技能-dir>/scripts/analyze_image.py "Describe content" kimi-k2.5:cloud

<技能-dir> resolves to your OpenClaw 技能 安装ation directory, typically ~/.OpenClaw/workspace/技能s/vision-辅助工具/.

In Conversation

When you need to analyze an image, use the exec 工具:

exec: python3 <技能-dir>/scripts/analyze_image.py /path/to/image.png "What do you see?"

导入ant: 设置 exec timeout to 120–180 seconds, as cloud vision 模型s are slow.

Screenshot + Analysis 工作流 Option A: Browser screenshot → analyze

  • browser(action="screenshot") → 获取 screenshot path (MEDIA: xxx)
  • exec("<技能-dir>/scripts/analyze_image.py 'Describe this UI'")
  • Act on the analysis 结果

Option B: 桌面 screenshot → analyze

macOS:

  • exec("screencapture -x /tmp/screen.png")
  • exec("<技能-dir>/scripts/analyze_image.py /tmp/screen.png 'Describe the 桌面'")

Linux:

  • exec("gnome-screenshot -f /tmp/screen.png")
— or — exec("导入 /tmp/screen.png") # ImageMagick — or — exec("scrot /tmp/screen.png")
  • exec("<技能-dir>/scripts/analyze_image.py /tmp/screen.png 'Describe the 桌面'")

Option C: Game/应用 UI → analyze → act

  • Screenshot the current screen
  • Use vision-辅助工具 to identify UI elements, buttons, text
  • 执行 命令行工具cks/输入 based on the analysis

环境 Variables Variable Default Description VISION_模型 gemma4:31b Default vision 模型 VISION_TIMEOUT 180 请求 timeout in seconds OLLAMA_API_URL http://localhost:11434/API/chat Ollama API 端点 Supported 模型s 模型 Vision Speed Recommendation gemma4:31b ✅ Local, fast ⭐ Primary (隐私, no API needed) kimi-k2.6:cloud ✅ 40–120s 🔬 Advanced (high 质量, cloud) kimi-k2.5:cloud ✅ 40–90s Alternative cloud option qwen3.5:cloud ✅ 30–60s Fast cloud recognition qwen3.5:397b-cloud ✅ 40–90s High 质量 cloud gemma4:31b ✅ Local, fast 隐私-first (运行s offline)

Note: Cloud 模型s require the 模型 to be avAIlable in your Ollama instance. Use VISION_模型 env var to switch.

FAQ Q: Can I use the built-in image 工具 instead?

A: It works for local 模型s but will time out on cloud vision 模型s. Always prefer this 技能's script for reliable 结果s.

Q: What image 格式化s are supported?

A: PNG, JPG, JPEG, GIF, 网页P, BMP, TIFF, SVG. Maximum file size: 20 MB.

Q: Where should I save screenshots?

A: Any readable directory works — /tmp/, your workspace, etc. This script has no path restrictions.

Q: How do I use a Chinese prompt?

A: Pass it as the second argument: python3 <技能-dir>/scripts/analyze_image.py /tmp/img.png "请描述这张图片的内容"

自动化 Ideas Game 自动化: Screenshot → analyze game 状态 → decide next action Browser verification: Screenshot → 验证 page loaded correctly 桌面 监控ing: Periodic screenshots → 检测 changes UI 测试: Screenshot → 验证 rendered 输出 OCR: 提取 text content from images

数据来源ClawHub ↗ · 中文优化:龙虾技能库