Scrapling Web Scraping — 零反爬网页爬取工具

Name: Scrapling Web Scraping — 零反爬网页爬取工具
Author: zhengxinjipai

zhengxinjipai

Scrapling Web Scraping — 零反爬网页爬取工具

v1.0.0

Scrapling Web Scraping 是为 OpenClaw 设计的零反爬网页爬取工具，能够绕过 Cloudflare 防护、处理 JavaScript 重型网站，并自动适应网站变化。支持三种模式：基本（快速 HTTP）、隐身（抗反爬）和动态（浏览器自动化）。通过 CSS 选择器提取数据，提供 JSON 输出，支持自然语言命令和 Python API。

0· 619·2 当前·2 累计

by @zhengxinjipai·MIT-0

浏览器自动化 API工具自动化开发工具云服务

下载技能包

License

MIT-0

最后更新

2026/3/6

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

该技能代码和指令总体符合网页爬取工具特征，但存在重要不一致和运营风险（未声明安装、外部下载和明确指示绕过反爬保护），安装前需谨慎。

评估建议

该技能看似合法的爬虫封装，但要求安装和运行外部 Python 包并在运行时下载浏览器——这些操作会从网络拉取和执行代码/二进制文件。安装/运行前：1) 验证上游项目（检查 scrapling 包源于 PyPI/GitHub 和 'scrapling install' 下载 URL 和校验和）。2) prefer 在隔离环境（容器/VM）内运行 pip install 以限制影响范围。3) 审计 scrapling 包依赖项和任何 post-install 脚本的网络端点或凭证使用。4) 考虑法律/伦理影响：偷偷模式可能违反服务条款或法律——仅对您拥有的或有明确权限的目标使用。5) 如果需要继续，运行网络监控以查看下载的内容，并避免使用高权限运行（除非您理解并接受风险）。...

详细分析 ▾

ℹ 用途与能力

名称、描述和包含的 CLI 封装（scrapling_tool.py）与支持基本、隐身和动态模式的爬虫一致。无需凭证和无特殊系统路径对于爬虫助手是合理的。然而，SKILL.md 指示用户运行 pip install 'scrapling[all]' 和 'scrapling install'（下载浏览器），尽管技能元数据没有提供安装规范或来源；这种不匹配值得注意。

⚠ 指令范围

运行时指令告诉操作员安装外部包并运行其安装程序以下载浏览器和其他组件。文档明确宣传“绕过 Cloudflare”和“不可检测”的隐身模式，这些是可以用来规避保护的能力声明。SKILL.md 还在示例中使用绝对 /root/.openclaw 路径（假设根环境）并要求用户在技能目录中创建自定义脚本。因此，指令超出了简单的 CLI 封装，指导网络安装和可能的二进制下载，外部技能包和包含规避保护的操作指南。

⚠ 安装机制

注册表中没有声明的安装规范，但 SKILL.md 告诉用户运行 pip install 'scrapling[all]' 和 'scrapling install'，这将下载浏览器。安装外部 PyPI 包并允许它获取浏览器二进制文件的风险为中等至高，因为注册表元数据不记录包源、校验和或 URL。技能本身不包括这些下载的资产，因此运行时将从第三方包控制的远程端点拉取代码/二进制文件。

ℹ 凭证需求

技能没有声明所需的环境变量或凭证，包含的 Python 封装也不读取机密。对于爬虫助手这是合理的。然而，隐身/Cloudflare 解决功能有时依赖于可能需要未声明的 API 密钥或服务的外部求解服务或浏览器自动化——这里没有声明任何凭证或配置用于这些服务，这是一个需要验证的实施/来源差距。

✓ 持久化与权限

技能没有请求 'always: true'，没有声明系统范围的配置更改，提供的代码是仅调用 scrapling 包的简单 CLI 封装。包含的文件中没有证据表明技能会持久修改其他技能或全局代理设置。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/3/6

Scrapling Web Scraping for OpenClaw 初始发布：- 启用零反爬网页爬取，绕过 Cloudflare 和处理 JavaScript 重型网站。- 支持三种模式：基本（快速 HTTP）、隐身（不可检测，反爬绕过）、动态（浏览器自动化）。- 通过 CSS 选择器提取数据，提供 JSON 输出。- 提供自然语言集成和 Python API。- 包含命令行界面，选项包括模式选择、Cloudflare 解决、等待元素和自定义脚本。

● 无害

安装命令点击复制

官方npx clawhub@latest install scrapling-web-scraper-openclaw

镜像加速npx clawhub@latest install scrapling-web-scraper-openclaw --registry https://cn.clawhub-mirror.com

技能文档

零反爬网页爬取工具 for OpenClaw。绕过 Cloudflare 防护、处理 JavaScript 重型网站，并自动适应网站变化。支持三种模式：基本（快速 HTTP）、隐身（抗反爬）、动态（浏览器自动化）。通过 CSS 选择器提取数据，提供 JSON 输出，支持自然语言命令和 Python API。... （以下为原文未翻译部分，保留原始 Markdown 格式）

Zero-bot-detection web scraping for OpenClaw. Bypass Cloudflare, handle JavaScript-heavy sites, and adapt to website changes automatically.

Quick Start

# Install Scrapling pip install "scrapling[all]" scrapling install # Basic usage python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com # Bypass Cloudflare python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://protected-site.com --mode stealth --cloudflare # Extract specific data python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com --selector ".product-title"

# JavaScript-heavy sites python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://spa-app.com --mode dynamic --wait ".content-loaded"

Usage with OpenClaw

Natural Language Commands

Basic scraping:

"用Scrapling抓取 https://example.com 的标题和所有链接"

Bypass protection:

"用隐身模式抓取 https://protected-site.com，绕过Cloudflare"

Extract data:

"抓取 https://shop.com 的商品名称和价格，CSS选择器是 .product"

Dynamic content:

"抓取 https://spa-app.com，等待 .data-loaded 元素加载完成"

Python Code

# Basic scraping
from scrapling.fetchers import Fetcher
page = Fetcher.get('https://example.com')
title = page.css('title::text').get()
# Bypass Cloudflare
from scrapling.fetchers import StealthyFetcher
page = StealthyFetcher.fetch('https://protected.com', 
                              headless=True, 
                              solve_cloudflare=True)# JavaScript sites
from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.fetch('https://spa-app.com', 
                             headless=True, 
                             network_idle=True)

Features

Feature	Command	Description
Basic Scrape	`--mode basic`	Fast HTTP requests
Stealth Mode	`--mode stealth`	Bypass Cloudflare/anti-bot
Dynamic Mode	`--mode dynamic`	Handle JavaScript sites
CSS Selectors	`--selector ".class"`	Extract specific elements
JSON Output	`--json`	Machine-readable output

Examples

1. Scrape with CSS Selector

python3 scrapling_tool.py https://quotes.toscrape.com --selector ".quote .text" --json

2. Bypass Cloudflare

python3 scrapling_tool.py https://nopecha.com/demo/cloudflare --mode stealth --cloudflare

3. Wait for Dynamic Content

python3 scrapling_tool.py https://spa-app.com --mode dynamic --wait ".loaded" --json

CLI Reference

python3 scrapling_tool.py URL [options]Options:
  --mode {basic,stealth,dynamic}  Scraping mode (default: basic)
  --selector, -s CSS_SELECTOR     Extract specific elements
  --cloudflare                    Solve Cloudflare (stealth mode only)
  --wait SELECTOR                 Wait for element (dynamic mode only)
  --json, -j                      Output as JSON

Advanced: Custom Scripts

Create custom scraping scripts in /root/.openclaw/skills/scrapling-web-scraping/:

from scrapling.fetchers import StealthyFetcher# Your custom scraper
def scrape_products(url):
    page = StealthyFetcher.fetch(url, headless=True)
    products = []
    for item in page.css('.product'):
        products.append({
            'name': item.css('.name::text').get(),
            'price': item.css('.price::text').get(),
            'link': item.css('a::attr(href)').get()
        })
    return products

Notes

Requires Python 3.10+
First run: scrapling install to download browsers
Respect website Terms of Service
Use responsibly

Created: 2026-03-05 by 老二 Source: https://github.com/D4Vinci/Scrapling

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Quick Start

Usage with OpenClaw

Natural Language Commands

Python Code

Features

Examples

1. Scrape with CSS Selector

2. Bypass Cloudflare

3. Wait for Dynamic Content

CLI Reference

Advanced: Custom Scripts

Notes

安装命令点击复制