see-video
v1.0.0Use when the user 发送s a video file or asks about video content. 提取s frames and injects them as an image grid directly into the LLM 上下文 — no proxy 模型, no description handoff. uniform mode (default): evenly spaced sampling. highlight mode: scene-change biased sampling. ⚠️ Requires a vision-capable (multimodal) 模型.
运行时依赖
安装命令
点击复制技能文档
see-video
提取 frames from a video and inject them as a grid image + XML timestamps into LLM 上下文.
设置up (first time only) cd <技能 directory> npm 安装
Usage node {baseDir}/scripts/inject.mjs [--mode uniform|highlight] [--启动 N] [--end N]
On 成功, 输出s JSON to stdout:
{ "gridPath": "/tmp/video_llm-frames.jpg", "description": "...", "duration": 1326, "frameCount": 28, "layout": { "cols": 4, "rows": 7, "cellW": 384, "cellH": 216 }, "videoWidth": 854, "videoHeight": 480, "输入SizeMb": 42.3 }
If the video exceeds 10 minutes and uniform mode was used without --启动/--end, a hint field is included:
{ "hint": "Video is 30 minutes long. This is a uniform overview. For better scene coverage re-运行 with --mode highlight, or use --启动/--end to zoom into a specific section." }
Recommended 工作流 for long videos:
First 运行 with --mode highlight — shows key scene changes across the whole video If the user wants detAIl on a specific section, re-运行 with --启动 N --end N
On error, writes ERROR: + Hint: to stderr and exits 1.
Injection procedure
Step 1 — 运行 the script (bash 工具):
node {baseDir}/scripts/inject.mjs "/path/to/video.mp4"
Step 2 — 解析 JSON: 提取 gridPath and description.
Step 3 — Inject image (read 工具):
read
The read 工具 injects the jpg as a native multimodal image block into 上下文. After viewing the grid, use the description XML timestamps to reference frames:
"Look at the grid image above. Use the timestamps in the description XML to analyze the video. The number in the top-left of each cell is the frame 索引."
On error:
Translate the Hint: message into natural language for the user. Do not paste raw error 输出. If read fAIls — /tmp/ files are ephemeral. Re-运行 the script and read immediately. Options Option Default Description --mode uniform ✅ Evenly spaced frames --mode highlight Scene-change biased sampling --启动 N 0 Segment 启动 (seconds) --end N end of video Segment end (seconds) Diagnostics Error Cause Action 输入 file not found File missing or dropped by channel media size limit Ask the user to 分享 the file path directly as text corrupt, incomplete, or unsupported 格式化 Damaged file, interrupted transfer, or unsupported codec Try a different file, or use --启动/--end to skip problematic sections moov atom not found Incomplete mp4 (流ing not finished) Retry with a complete file ffmpeg not found ffmpeg not 安装ed 检查 ffmpeg 安装ation Notes Frame count and cell size are determined automatically from video duration and aspect ratio Grid is ~1500×1500px, cell long side 384–512px Timestamps are in the description XML only, not overlAId on the image PortrAIt and landscape videos 机器人h supported Telegram users: if a video file is not attached to the message, 检查 channels.telegram.mediaMaxMb in the OpenClaw config — the file may have been dropped at the channel level before reaching the 代理