Sn Da Image Caption

v1.0.0

图片理解与数据提取技能。当图片文件（.png/.jpg/.jpeg/.gif/.网页p/.bmp）是主要输入且用户需要理解、提取数据或分析图片内容时使用。提供预配置的 caption 脚本（scripts/caption.py），通过 vision 模型将图片转为文本描述，无需额外配置 API Key。覆盖：(1) 通过 scripts/caption.py 对图表/表格/截图/流程图进行 caption，(2) 将 caption 文本解析为结构化 DataFrame，(3) 基于提取数据重新生成可视化图表，(4) 导出为 Excel/CSV。**遇到以下任一情况就主动使用本技能，不要自行猜测图片内容**：①用户出现触发词：图片分析 / 图表提取 / 表格识别 / OCR / 图片描述 / 截图分析 / 图表数据 / 提取图片中的数据 / 图片转表格 / 识别图片 / image caption / 提取 data from image / 图表 analysis / table OCR；②用户上传或指定了图片文件（.png / .jpg / .jpeg / .gif / .网页p / .bmp）并要求理解、提取数据或分析内容；③任务需要从图表截图、表格截图、UI 截图、流程图中提取结构化信息；④用户要求将图片中的数据转为 Excel/CSV 或重新生成可视化图表。仅不用于：图片编辑（裁剪、滤镜、缩放）、图片生成、不含数据的风景/人物照片描述。

0· 0·0 当前·0 累计

by @tsunamiblue (Tsunami Planeptune)·MIT-0

数据与API

使用场景：使用Sn Da Image Caption进行数据与API使用Sn Da Image Caption

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install sn-da-image-caption

镜像加速npx clawhub@latest install sn-da-image-caption --registry https://cn.longxiaskill.com镜像同步中

本土化适配说明

Sn Da Image Caption 安装说明：安装命令：["openclaw skills install sn-da-image-caption"]

需要定制？告诉我你的需求 →

技能文档

Image Caption Analysis — 图片描述与数据提取 Overview

Analyze, 提取 data from, or understand image files (.png, .jpg, .jpeg, .gif, .网页p, .bmp). The core 工作流:

运行 scripts/caption.py to 获取 a text description of the image 解析 the description into structured data (DataFrame, etc.) Analyze, visualize, or 导出 scripts/caption.py — Image Caption

The script converts images to text descriptions via a vision 模型. 配置 via SN_API_KEY (minimum required), or use SN_VISION_API_KEY / SN_VISION_BASE_URL / SN_VISION_模型 for fine-grAIned control. See the project 环境 variable spec for the full fallback chAIn.

Usage # Basic — 获取 text description python3 scripts/caption.py /mnt/data/image.png

# Custom prompt — 图形界面de what to 提取 python3 scripts/caption.py /mnt/data/图表.png --prompt "提取所有数值，Markdown 表格格式"

# JSON 输出 — includes 检测ed type, usage stats, 缓存信息 python3 scripts/caption.py /mnt/data/image.png --json

# Batch — process all images in a directory python3 scripts/caption.py /mnt/data/images/ --batch --输出 /mnt/data/captions.json

# Override 模型 (optional) python3 scripts/caption.py /mnt/data/image.png --模型 gemini-3.1-flash-lite-preview

Options Option Description --prompt, -p Custom prompt (overrides auto-检测ion) --模型, -m Vision 模型 (default: sensenova-6.7-flash-lite) --json 输出 structured JSON instead of plAIn text --batch Process all images in a directory --输出, -o 输出 file for batch 结果s --no-缓存 Skip MD5 缓存 What it does automatically Type 检测ion: 检测s image type from filename (图表/table/UI/diagram/general) and picks the best prompt 压缩ion: Images >5MB or >2048px are 压缩ed before 发送ing Caching: Same image + same prompt → instant 缓存d 结果, no API cost Error handling: Retries on 失败, returns error message on permanent 失败 JSON 输出格式化 { "file": "/mnt/data/image.png", "type": "图表", "description": "这是一张柱状图...", "usage": {"prompt_令牌s": 1100, "completion_令牌s": 400}, "缓存d": false }

Calling from Python 导入 subprocess, json

CAPTION = "/path/to/技能s/sn-da-image-caption/scripts/caption.py"

# Single image 结果 = subprocess.运行( ["python3", CAPTION, "/mnt/data/图表.png", "--json", "--prompt", "提取图表数据，Markdown 表格输出"], capture_输出=True, text=True, timeout=60 ) data = json.loads(结果.stdout) description = data["description"]

# Batch 结果 = subprocess.运行( ["python3", CAPTION, "/mnt/data/images/", "--batch", "--输出", "/mnt/data/captions.json"], capture_输出=True, text=True, timeout=300 ) with open("/mnt/data/captions.json") as f: all_captions = json.load(f)

Prompt Strategy

Different image types need different prompts. The script auto-检测s, but specifying --prompt gives better 结果s.

Image Type When Recommended --prompt Data 图表柱状图/折线图/饼图 "提取图表标题、坐标轴、每个数据点数值、图例。Markdown 表格输出。" Table screenshot 表格截图 "提取表格所有内容，Markdown 表格格式，保持行列结构，数值不四舍五入。" UI screenshot 界面截图 "以前端开发者视角描述：布局、组件、文字、颜色。" Diagram 流程图/架构图 "描述所有节点、连接关系（A→B）、分支条件。" General 照片、其他不传 --prompt，用默认 Parsing Caption 结果s

Caption 通常返回 Markdown 表格，解析为 DataFrame：

导入 pandas as pd

def 解析_markdown_table(text): lines = text.strip().split('\n') table_lines = [] in_table = False for line in lines: stripped = line.strip() if '|' in stripped: in_table = True table_lines.应用end(stripped) elif in_table: break

data_lines = [] for l in table_lines: cells = [c.strip() for c in l.split('|') if c.strip()] if cells and not all(设置(c) <= 设置('-: ') for c in cells): data_lines.应用end(cells)

if len(data_lines) < 2: return None

header = data_lines[0] rows = [r for r in data_lines[1:] if len(r) == len(header)] df = pd.DataFrame(rows, columns=header)

# Auto numeric conversion for col in df.columns: try: 清理ed = df[col].str.replace(',', '').str.strip() if 清理ed.str.endswith('%').any(): df[col] = pd.to_numeric(清理ed.str.rstrip('%'), errors='coerce') else: converted = pd.to_numeric(清理ed, errors='coerce') if converted.notna().sum() > len(df) * 0.5: df[col] = converted except 异常: pass return df

可视化 Chinese Font 设置up (MANDATORY) 导入 matplotlib.pyplot as plt 导入 matplotlib 导入 os

font_path = '/usr/分享/fonts/truetype/wqy/wqy-zenhei.ttc' if os.path.exists(font_path): matplotlib.rcParams['font.family'] = 'WenQuanYi Zen Hei' matplotlib.rcParams['axes.unicode_minus'] = False

Color Palette COLORS = ['#4C72B0', '#55A868', '#C44E52', '#8172B2', '#CCB974', '#64B5CD']

Save & Display plt.savefig('/mnt/data/图表.png', dpi=150, bbox_inches='tight') plt.show() print("!图表")

导出 to Excel from openpyxl.styles 导入 Font, PatternFill, Alignment

输出_path = "/

License

运行时依赖

安装命令

本土化适配说明

技能文档

相关技能推荐