📦 Scrapling Web Scraping — 技能工具
v1.0.0[自动翻译] Zero-bot-detection web scraping for OpenClaw. Bypass Cloudflare, handle JavaScript-heavy sites, and adapt to website changes automatically. Use when y...
详细分析 ▾
运行时依赖
版本
Initial release of Scrapling Web Scraping skill. - Supports fast, stealth, and dynamic web scraping modes to handle protected and JavaScript-heavy sites. - Bypasses anti-bot measures, including Cloudflare, with automatic website adaptation. - Allows data extraction using CSS selectors and provides JSON output. - Includes both command-line and Python integration examples. - Enables custom scripting for advanced scraping workflows.
安装命令
点击复制技能文档
Zero-bot-detection web scraping for OpenClaw. Bypass Cloudflare, handle JavaScript-heavy sites, and adapt to website changes automatically.
Quick Start
# Install Scrapling pip install "scrapling[all]" scrapling install# Basic usage python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com
# Bypass Cloudflare python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://protected-site.com --mode stealth --cloudflare
# Extract specific data python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com --selector ".product-title"
# JavaScript-heavy sites python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://spa-app.com --mode dynamic --wait ".content-loaded"
Usage with OpenClaw
Natural Language Commands
Basic scraping:
"用Scrapling抓取 https://example.com 的标题和所有链接"
Bypass protection:
"用隐身模式抓取 https://protected-site.com,绕过Cloudflare"
Extract data:
"抓取 https://shop.com 的商品名称和价格,CSS选择器是 .product"
Dynamic content:
"抓取 https://spa-app.com,等待 .data-loaded 元素加载完成"
Python Code
# Basic scraping from scrapling.fetchers import Fetcher page = Fetcher.get('https://example.com') title = page.css('title::text').get()# Bypass Cloudflare from scrapling.fetchers import StealthyFetcher page = StealthyFetcher.fetch('https://protected.com', headless=True, solve_cloudflare=True)
# JavaScript sites from scrapling.fetchers import DynamicFetcher page = DynamicFetcher.fetch('https://spa-app.com', headless=True, network_idle=True)
Features
| Feature | Command | Description |
|---|---|---|
| Basic Scrape | --mode basic | Fast HTTP requests |
| Stealth Mode | --mode stealth | Bypass Cloudflare/anti-bot |
| Dynamic Mode | --mode dynamic | Handle JavaScript sites |
| CSS Selectors | --selector ".class" | Extract specific elements |
| JSON Output | --json | Machine-readable output |
Examples
1. Scrape with CSS Selector
python3 scrapling_tool.py https://quotes.toscrape.com --selector ".quote .text" --json
2. Bypass Cloudflare
python3 scrapling_tool.py https://nopecha.com/demo/cloudflare --mode stealth --cloudflare
3. Wait for Dynamic Content
python3 scrapling_tool.py https://spa-app.com --mode dynamic --wait ".loaded" --json
CLI Reference
python3 scrapling_tool.py URL [options]
Options: --mode {basic,stealth,dynamic} Scraping mode (default: basic) --selector, -s CSS_SELECTOR Extract specific elements --cloudflare Solve Cloudflare (stealth mode only) --wait SELECTOR Wait for element (dynamic mode only) --json, -j Output as JSON
Advanced: Custom Scripts
Create custom scraping scripts in /root/.openclaw/skills/scrapling-web-scraping/:
from scrapling.fetchers import StealthyFetcher
# Your custom scraper def scrape_products(url): page = StealthyFetcher.fetch(url, headless=True) products = [] for item in page.css('.product'): products.append({ 'name': item.css('.name::text').get(), 'price': item.css('.price::text').get(), 'link': item.css('a::attr(href)').get() }) return products
Notes
- Requires Python 3.10+
- First run:
scrapling installto download browsers - Respect website Terms of Service
- Use responsibly
Created: 2026-03-05 by 老二 Source: https://github.com/D4Vinci/Scrapling