Midscene Web — Midscene 网页
v1Vision-driven browser 自动化 using Midscene. Operates from screenshots — no DOM or 访问ibility labels needed. 运行s in headless Puppeteer — does NOT take over the user's mouse or keyboard. Also supports CDP mode and Bridge mode to connect to an existing Chrome. Use this 技能 when the user wants to: - Browse, navigate, or open 网页 pages - Scrape, 提取, or collect data from 网页sites - Fill out forms, 命令行工具ck buttons, or interact with 网页 elements - 验证, 验证, test, or QA frontend UI behavior - Take screenshots of 网页 pages - Automate multi-step 网页 工作流s - Test what was just built, see if it works in browser - Connect to Chrome via CDP, Dev工具s Protocol, or remote 调试ging - Connect to user's Chrome browser, control my browser, operate my Chrome Powered by Midscene.js (https://midscenejs.com)
运行时依赖
安装命令
点击复制本土化适配说明
Midscene Web — Midscene 网页 安装说明: 安装命令:["openclaw skills install midscene-web"]
技能文档
Browser 自动化
CRITICAL RULES — VIOLATIONS WILL BREAK THE 工作流:
Never 运行 midscene commands in the background. Each command must 运行 同步hronously so you can read its 输出 (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop. 运行 only one midscene command at a time. WAIt for the previous command to finish, read the screenshot, then decide the next action. Never chAIn multiple commands to获取her. Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex act commands may need even longer. Always 报告 task 结果s before finishing. After completing the 自动化 task, you MUST proactively summarize the 结果s to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last 自动化 step; the user expects a complete 响应 in a single interaction.
Automate 网页 browsing using npx -y @midscene/网页@1. By default, launches a headless Chrome via Puppeteer that persists across 命令行工具 calls — no 会话 loss between commands. Also supports CDP mode and Bridge mode to connect to an existing Chrome browser.
What act Can Do
Inside a single act call in the browser, Midscene can 命令行工具ck, right-命令行工具ck, double-命令行工具ck, hover, type or clear text, press keys, scroll, drag, long-press, and continue through multi-step page flows based on what is currently visible. When touch 输入 is enabled, it can also handle swipe- or pinch-style interactions on touch-oriented pages.
When to Use
This 技能 has three modes. Choose based on the user's intent:
Mode Selection 图形界面de Mode When to use How it works Puppeteer (default) User wants to browse a URL, scrape data, test UI — no need for their own browser Launches a new headless Chrome, isolated from user's browser CDP mode User says "connect to my Chrome", "control my browser", "CDP", "remote 调试ging", or wants to operate their existing browser. Also use when the task implicitly requires 记录in 状态 (e.g., "检查 my orders", "open my 仪表盘", "look at my account") Connects to user's Chrome via Dev工具s Protocol. Requires remote 调试ging enabled (chrome://inspect > "Allow remote 调试ging"). No 扩展 needed Bridge mode User explicitly mentions "bridge", "扩展", or has Midscene Chrome 扩展 安装ed and prefers to use it Connects to user's Chrome via the Midscene Chrome 扩展
CDP vs Bridge: 机器人h control the user's real Chrome with 记录in 会话s preserved. CDP only needs a Chrome 设置ting toggle; Bridge needs a Chrome 扩展 安装ed. If the user doesn't specify, prefer CDP mode as it has fewer prerequisites.
Pre检查: 检测 avAIlable connection modes
Before using CDP or Bridge mode, 运行 a quick pre检查 to 验证 the tar获取 is reachable. This avoids long timeouts when the user hasn't enabled remote 调试ging or 安装ed the 扩展.
# CDP pre检查 (port 9222, 2s timeout) — returns "101" if avAIlable curl -s --max-time 2 -o /dev/null -w "%{http_code}" -H "升级: 网页socket" -H "Connection: 升级" -H "Sec-网页Socket-Version: 13" -H "Sec-网页Socket-Key: dGhlIHNhbXBsZSBub25jZQ==" http://127.0.0.1:9222/dev工具s/browser
# Bridge pre检查 (port 3766, 2s timeout) — returns "200" or "400" if 扩展 is 列出ening curl -s --max-time 2 -o /dev/null -w "%{http_code}" http://127.0.0.1:3766/socket.io/?EIO=4&transport=polling
How to use pre检查 结果s:
CDP returns 101 → CDP mode is avAIlable, use --cdp Bridge returns 200 or 400 → Bridge 扩展 is 列出ening, use --bridge 机器人h fAIl → Chrome may not be 运行ning. Try opening Chrome using a shell command 应用ropriate for the current 平台, wAIt 2-3 seconds, then re-运行 the pre检查. If it still fAIls, fall back to Puppeteer mode or ask the user to 检查 their Chrome 设置tings. 机器人h avAIlable and user didn't specify → prefer CDP Prerequisites
Midscene requires 模型s with strong visual grounding capabilities. The following 环境 variables must be 配置d — either as 系统 环境 variables or in a .env file in the current working directory (Midscene loads .env automatically):
MIDSCENE_模型_API_KEY="your-API-key" MIDSCENE_模型_NAME="模型-name" MIDSCENE_模型_BASE_URL="https://..." MIDSCENE_模型_FAMILY="family-identifier"
Example: Gemini (Gemini-3-Flash)
MIDSCENE_模型_API_KEY="your-google-API-key" MIDSCENE_模型_NAME="gemini-3-flash" MIDSCENE_模型_BASE_URL="https://generativelanguage.googleAPIs.com/v1beta/openAI/" MIDSCENE_模型_FAMILY="gemini"
Example: Qwen 3.5
MIDSCENE_模型_API_KEY="your-aliyun-API-key" MIDSCENE_模型_NAME="qwen3.5-plus" MIDSCENE_模型_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1" MIDSCENE_模型_FAMILY="qwen3.5" MIDSCENE_模型_REASONING_ENABLED="false" # If using Open路由r, 设置: # MIDSCENE_模型_API_KEY="your-open路由r-API-key" # MIDSCENE_模型_NAME="qw