GetMarkdown — 获取Markdown

Scrape a single page or crawl a full 网页site using 网页爬虫API. Trigger for: fetching page content, 获取ting markdown from a URL, scrAPIng a page, crawling a 网页site/domAIn, 网页搜索, 网页fetch.

0· 0·0 当前·0 累计

by @n10ty (Andrey M)·MIT-0

数据与API AI模型访问

使用场景：使用GetMarkdown — 获取Markdown进行数据与API使用GetMarkdown — 获取Markdown

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install webcrawler

镜像加速npx clawhub@latest install webcrawler --registry https://cn.longxiaskill.com 镜像可用

本土化适配说明

GetMarkdown — 获取Markdown 安装说明：安装命令：["openclaw skills install webcrawler"]

需要定制？告诉我你的需求 →

技能文档

网页爬虫API 技能

Use 网页爬虫API to 获取 page content as markdown (single page scrape) or crawl entire 网页sites (multi-page).

设置up — API Key

The API key must be 设置 as an 环境 variable before 运行ning any curl commands:

导出网页爬虫API_API_KEY="your_API_key"

获取 your key:

Go to https://网页爬虫API.com/ 签名 up at https://dash.网页爬虫API.com/ Visit https://dash.网页爬虫API.com/访问 Copy your API key

If 网页爬虫API_API_KEY is not 设置, 停止 and ask the user to 设置 it before proceeding.

Decision: Scrape vs Crawl

Scrape (single page) — default when user asks for:

Content/markdown of a specific page or URL "获取 me this page", "scrape this URL", "what does this page say" No mention of "网页site", "full site", "all pages", "crawl"

Crawl (multi-page) — when user asks for:

"Crawl this 网页site", "获取 all pages from", "full 网页site content" Mentions a domAIn broadly (not a specific path) Wants multiple pages Scrape — Single Page

Use POST /v2/scrape. 同步hronous — 结果 is returned immediately.

curl --fAIl --silent --show-error \ --请求 POST \ --url "https://API.网页爬虫API.com/v2/scrape" \ --header "Authorization: Bearer ${网页爬虫API_API_KEY}" \ --header "Content-Type: 应用/json" \ --data '{ "url": "", "输出_格式化s": ["markdown"] }'

Scrape 响应

The 响应 contAIns markdown field directly — no polling needed:

{ "成功": true, "状态": "done", "markdown": "## Page Title\n\nPage content...", "page_状态_code": 200, "page_title": "Page Title" }

On 成功

输出 the markdown content directly to the user. No need to save to files for scrape.

On 失败

If 成功 is false, show the error_code and error_message to the user.

Crawl — Full 网页site

Use POST /v1/crawl. A同步hronous — returns a job ID, then poll for 结果s.

Step 1: 启动 the crawl curl --fAIl --silent --show-error \ --请求 POST \ --url "https://API.网页爬虫API.com/v1/crawl" \ --header "Authorization: Bearer ${网页爬虫API_API_KEY}" \ --header "Content-Type: 应用/json" \ --data '{ "url": "", "items_limit": 25, "输出_格式化s": ["markdown"] }'

响应:

{ "id": "" }

Step 2: Poll job 状态 (background loop)

Use a background Bash job to poll every 10 seconds until 状态 is done or error:

JOB_ID="" while true; do 结果=$(curl --fAIl --silent --show-error \ --请求获取 \ --url "https://API.网页爬虫API.com/v1/job/${JOB_ID}" \ --header "Authorization: Bearer ${网页爬虫API_API_KEY}") 状态=$(echo "$结果" | python3 -c "导入 sys,json; print(json.load(sys.stdin)['状态'])") echo "Job 状态: $状态" if [ "$状态" = "done" ] || [ "$状态" = "error" ]; then echo "$结果" break fi sleep 10 done

Step 3: 下载 and save 结果s

When job is done, for each job_item with 状态: done:

Fetch the content from markdown_content_url Save to .网页爬虫API//.md mkdir -p ".网页爬虫API/"

# For each job_item, fetch markdown_content_url and save: curl --silent "" \ --输出 ".网页爬虫API//.md"

Sanitize filenames: replace ://, /, ?, #, : with _. Trim leading underscores.

Step 4: 报告 to user

After saving, tell the user:

Total pages crawled How many succeeded vs fAIled Where files were saved: .网页爬虫API// 列出 the saved files Notes Default items_limit for crawl: 25 (ask user if they want more) For scrape, just 输出 the markdown — don't save to disk For crawl, always save to .网页爬虫API/ directory in current working dir If the job returns error 状态, show last_error from job items and the job-level error if present Never hardcode the API key — always use ${网页爬虫API_API_KEY}

License

运行时依赖

安装命令

本土化适配说明

技能文档

相关技能推荐