XPR Web Scraping — 技能工具

Name: XPR Web Scraping — 技能工具
Author: paulgnz

paulgnz

XPR Web Scraping — 技能工具

v0.2.11

[自动翻译] Tools for fetching and extracting cleaned text, metadata, and links from single or multiple web pages with format options and link filtering.

0· 1,817·12 当前·13 累计

by @paulgnz·MIT-0

浏览器自动化 API工具开发工具数据分析网络工具

下载技能包

License

MIT-0

最后更新

2026/4/11

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

安全

medium confidence

The skill's code, declared requirements, and runtime instructions align with a straightforward web-scraping tool and do not request unrelated credentials or install artifacts, though part of the source shown was truncated so the full file should be reviewed before deployment.

评估建议

This skill appears to be a coherent, self-contained web scraper that doesn't request secrets or install external code. Before installing: (1) review the full src/index.ts (the provided snippet was truncated) to confirm there are no hidden network callbacks or logging endpoints; (2) ensure use complies with target sites' robots.txt, terms of service, and legal/privacy rules; (3) enforce rate limits and avoid scraping protected or paywalled content; (4) if you run in a sensitive environment, sandb...

详细分析 ▾

✓ 用途与能力

Name/description (fetching, extracting text/links/metadata) match the actual tools and code: scrape_url, extract_links, scrape_multiple. No unrelated env vars, binaries, or services are requested.

✓ 指令范围

SKILL.md describes limited scraping actions (single page, link extraction, multi-page up to 10). Instructions recommend rate-limiting and content-size limits and do not instruct access to unrelated files, credentials, or external endpoints beyond the target pages.

✓ 安装机制

No install spec; skill is instruction-plus-code and relies on built-in Node fetch. No downloads, package registry installs, or archive extraction are present in the provided metadata.

✓ 凭证需求

Skill requires no environment variables, credentials, or config paths. The code uses only network fetch and in-memory parsing; requested access is proportional to web-scraping functionality.

✓ 持久化与权限

always is false and disable-model-invocation is false (normal). The skill does not request persistent system-wide privileges or modify other skills. Autonomous invocation is allowed by platform default but not combined with other red flags.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.2.112026/2/14

- Added SKILL.md documentation detailing available web scraping tools and usage guidelines. - Described supported actions: single-page scraping, link extraction, and multi-page scraping. - Clarified content formats (text, markdown, html) and their recommended uses. - Provided best practices for scraping frequency, file size limits, and saving results.

● 可疑

安装命令点击复制

官方npx clawhub@latest install xpr-web-scraping

镜像加速npx clawhub@latest install xpr-web-scraping --registry https://cn.clawhub-mirror.com

技能文档

Web Scraping

You have web scraping tools for fetching and extracting data from web pages:

Single page:

scrape_url — fetch a URL and get cleaned text content + metadata (title, description, link count)

- Use format="text" (default) for most tasks — strips all HTML - Use format="markdown" to preserve headings, links, lists, bold/italic - Use format="html" only when you need raw HTML

Link discovery:

extract_links — fetch a page and extract all links with text and type (internal/external)

- Use the pattern parameter to filter by regex (e.g. "\\.pdf$" for PDF links) - Links are deduplicated and resolved to absolute URLs

Multi-page research:

scrape_multiple — fetch up to 10 URLs in parallel for comparison/research

- One failure doesn't block others (uses Promise.allSettled)

Best practices:

Prefer "text" format for content extraction, "markdown" for preserving structure
Don't scrape the same domain more than 5 times per minute
Combine with store_deliverable to save scraped content as job evidence
For very large pages, the content is limited to 5MB

Web Scraping

You have web scraping tools for fetching and extracting data from web pages:

Single page:

scrape_url — fetch a URL and get cleaned text content + metadata (title, description, link count)

- Use format="text" (default) for most tasks — strips all HTML - Use format="markdown" to preserve headings, links, lists, bold/italic - Use format="html" only when you need raw HTML

Link discovery:

extract_links — fetch a page and extract all links with text and type (internal/external)

- Use the pattern parameter to filter by regex (e.g. "\\.pdf$" for PDF links) - Links are deduplicated and resolved to absolute URLs

Multi-page research:

scrape_multiple — fetch up to 10 URLs in parallel for comparison/research

- One failure doesn't block others (uses Promise.allSettled)

Best practices:

Prefer "text" format for content extraction, "markdown" for preserving structure
Don't scrape the same domain more than 5 times per minute
Combine with store_deliverable to save scraped content as job evidence
For very large pages, the content is limited to 5MB

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Web Scraping

Web Scraping

安装命令点击复制