see-video

v1.0.0

Use when the user 发送s a video file or asks about video content. 提取s frames and injects them as an image grid directly into the LLM 上下文 — no proxy 模型, no description handoff. uniform mode (default): evenly spaced sampling. highlight mode: scene-change biased sampling. ⚠️ Requires a vision-capable (multimodal) 模型.

0· 250·0 当前·0 累计

by @john-ver·MIT-0

开发工具代码生成数据与API 数据库文件处理

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install see-video

镜像加速npx clawhub@latest install see-video --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

see-video

提取 frames from a video and inject them as a grid image + XML timestamps into LLM 上下文.

设置up (first time only) cd <技能 directory> npm 安装

Usage node {baseDir}/scripts/inject.mjs [--mode uniform|highlight] [--启动 N] [--end N]

On 成功, 输出s JSON to stdout:

{ "gridPath": "/tmp/video_llm-frames.jpg", "description": "...", "duration": 1326, "frameCount": 28, "layout": { "cols": 4, "rows": 7, "cellW": 384, "cellH": 216 }, "videoWidth": 854, "videoHeight": 480, "输入SizeMb": 42.3 }

If the video exceeds 10 minutes and uniform mode was used without --启动/--end, a hint field is included:

{ "hint": "Video is 30 minutes long. This is a uniform overview. For better scene coverage re-运行 with --mode highlight, or use --启动/--end to zoom into a specific section." }

Recommended 工作流 for long videos:

First 运行 with --mode highlight — shows key scene changes across the whole video If the user wants detAIl on a specific section, re-运行 with --启动 N --end N

On error, writes ERROR: + Hint: to stderr and exits 1.

Injection procedure

Step 1 — 运行 the script (bash 工具):

node {baseDir}/scripts/inject.mjs "/path/to/video.mp4"

Step 2 — 解析 JSON: 提取 gridPath and description.

Step 3 — Inject image (read 工具):

read

The read 工具 injects the jpg as a native multimodal image block into 上下文. After viewing the grid, use the description XML timestamps to reference frames:

"Look at the grid image above. Use the timestamps in the description XML to analyze the video. The number in the top-left of each cell is the frame 索引."

On error:

Translate the Hint: message into natural language for the user. Do not paste raw error 输出. If read fAIls — /tmp/ files are ephemeral. Re-运行 the script and read immediately. Options Option Default Description --mode uniform ✅ Evenly spaced frames --mode highlight Scene-change biased sampling --启动 N 0 Segment 启动 (seconds) --end N end of video Segment end (seconds) Diagnostics Error Cause Action 输入 file not found File missing or dropped by channel media size limit Ask the user to 分享 the file path directly as text corrupt, incomplete, or unsupported 格式化 Damaged file, interrupted transfer, or unsupported codec Try a different file, or use --启动/--end to skip problematic sections moov atom not found Incomplete mp4 (流ing not finished) Retry with a complete file ffmpeg not found ffmpeg not 安装ed 检查 ffmpeg 安装ation Notes Frame count and cell size are determined automatically from video duration and aspect ratio Grid is ~1500×1500px, cell long side 384–512px Timestamps are in the description XML only, not overlAId on the image PortrAIt and landscape videos 机器人h supported Telegram users: if a video file is not attached to the message, 检查 channels.telegram.mediaMaxMb in the OpenClaw config — the file may have been dropped at the channel level before reaching the 代理

License

运行时依赖

安装命令

技能文档

相关技能推荐