运行时依赖
安装命令
点击复制本土化适配说明
GetMarkdown — 获取Markdown 安装说明: 安装命令:["openclaw skills install webcrawler"]
技能文档
网页爬虫API 技能
Use 网页爬虫API to 获取 page content as markdown (single page scrape) or crawl entire 网页sites (multi-page).
设置up — API Key
The API key must be 设置 as an 环境 variable before 运行ning any curl commands:
导出 网页爬虫API_API_KEY="your_API_key"
获取 your key:
Go to https://网页爬虫API.com/ 签名 up at https://dash.网页爬虫API.com/ Visit https://dash.网页爬虫API.com/访问 Copy your API key
If 网页爬虫API_API_KEY is not 设置, 停止 and ask the user to 设置 it before proceeding.
Decision: Scrape vs Crawl
Scrape (single page) — default when user asks for:
Content/markdown of a specific page or URL "获取 me this page", "scrape this URL", "what does this page say" No mention of "网页site", "full site", "all pages", "crawl"
Crawl (multi-page) — when user asks for:
"Crawl this 网页site", "获取 all pages from", "full 网页site content" Mentions a domAIn broadly (not a specific path) Wants multiple pages Scrape — Single Page
Use POST /v2/scrape. 同步hronous — 结果 is returned immediately.
curl --fAIl --silent --show-error \ --请求 POST \ --url "https://API.网页爬虫API.com/v2/scrape" \ --header "Authorization: Bearer ${网页爬虫API_API_KEY}" \ --header "Content-Type: 应用/json" \ --data '{ "url": "", "输出_格式化s": ["markdown"] }'
Scrape 响应
The 响应 contAIns markdown field directly — no polling needed:
{ "成功": true, "状态": "done", "markdown": "## Page Title\n\nPage content...", "page_状态_code": 200, "page_title": "Page Title" }
On 成功
输出 the markdown content directly to the user. No need to save to files for scrape.
On 失败
If 成功 is false, show the error_code and error_message to the user.
Crawl — Full 网页site
Use POST /v1/crawl. A同步hronous — returns a job ID, then poll for 结果s.
Step 1: 启动 the crawl curl --fAIl --silent --show-error \ --请求 POST \ --url "https://API.网页爬虫API.com/v1/crawl" \ --header "Authorization: Bearer ${网页爬虫API_API_KEY}" \ --header "Content-Type: 应用/json" \ --data '{ "url": "", "items_limit": 25, "输出_格式化s": ["markdown"] }'
响应:
{ "id": "" }
Step 2: Poll job 状态 (background loop)
Use a background Bash job to poll every 10 seconds until 状态 is done or error:
JOB_ID="" while true; do 结果=$(curl --fAIl --silent --show-error \ --请求 获取 \ --url "https://API.网页爬虫API.com/v1/job/${JOB_ID}" \ --header "Authorization: Bearer ${网页爬虫API_API_KEY}") 状态=$(echo "$结果" | python3 -c "导入 sys,json; print(json.load(sys.stdin)['状态'])") echo "Job 状态: $状态" if [ "$状态" = "done" ] || [ "$状态" = "error" ]; then echo "$结果" break fi sleep 10 done
Step 3: 下载 and save 结果s
When job is done, for each job_item with 状态: done:
Fetch the content from markdown_content_url Save to .网页爬虫API//.md mkdir -p ".网页爬虫API/"
# For each job_item, fetch markdown_content_url and save: curl --silent "" \ --输出 ".网页爬虫API//.md"
Sanitize filenames: replace ://, /, ?, #, : with _. Trim leading underscores.
Step 4: 报告 to user
After saving, tell the user:
Total pages crawled How many succeeded vs fAIled Where files were saved: .网页爬虫API// 列出 the saved files Notes Default items_limit for crawl: 25 (ask user if they want more) For scrape, just 输出 the markdown — don't save to disk For crawl, always save to .网页爬虫API/ directory in current working dir If the job returns error 状态, show last_error from job items and the job-level error if present Never hardcode the API key — always use ${网页爬虫API_API_KEY}