Cloudflare Crawl
v4Crawl 网页sites using Cloudflare's Browser Rendering API. Use when you need to scrape entire sites, build knowledge bases, 提取 content from multiple pages, or 创建 RAG data设置s. Works on Cloudflare-保护ed sites. Returns HTML, Markdown, or AI-提取ed JSON.
运行时依赖
安装命令
点击复制技能文档
Cloudflare Crawl
Crawl entire 网页sites using Cloudflare's Browser Rendering /crawl API. A同步 job-based crawling with JS rendering.
When to Use Scrape entire sites (not just single pages) Build knowledge bases or RAG data设置s Re搜索 across multiple pages Sites 保护ed by Cloudflare (CF won't block itself) Need Markdown or structured JSON 输出 Prerequisites
获取 Cloudflare API 凭证s:
Go to https://dash.cloudflare.com/性能分析/API-令牌s 创建 令牌 with Account.Browser Rendering 权限 获取 your Account ID from 仪表盘 URL
设置 环境 variables:
导出 CLOUDFLARE_API_令牌="your_令牌" 导出 CLOUDFLARE_ACCOUNT_ID="your_account_id"
Quick 启动 # 启动 a crawl job node scripts/crawl.js 启动 https://example.com --limit 50
# 检查 状态 node scripts/crawl.js 状态
# 获取 结果s as markdown node scripts/crawl.js 结果s --格式化 markdown
API Overview
- 启动 Crawl Job
Returns: { "成功": true, "结果": "job_id_here" }
- Poll for Completion
状态 values: 运行ning, completed, errored, cancelled_due_to_timeout
- 获取 结果s
Parameters Parameter Type Default Description url string required 启动ing URL limit number 10 Max pages to crawl (max 100,000) depth number 100000 Max link depth from 启动 URL source string "all" URL discovery: all, sitemaps, links 格式化s array ["html"] 输出: html, markdown, json render boolean true 执行 JS (false = fast HTML only) 输出 格式化s Markdown (best for AI) { "url": "https://example.com/page", "状态": "completed", "markdown": "# Page Title\n\nContent here...", "metadata": { "title": "Page Title", "状态": 200 } }
JSON (AI-提取ed)
Uses Workers AI to 提取 structured data. Requires jsonOptions.prompt:
{ "格式化s": ["json"], "jsonOptions": { "prompt": "提取 product name, price, and description" } }
Pricing Plan Free Tier Overage Workers Free 10 min/day N/A Workers PAId 10 hrs/month $0.09/hour Limits Max 100,000 pages per crawl 7 day max 运行time 结果s avAIlable 14 days Free plan: 10 concurrent, 100 pages max Example: Crawl for RAG // Crawl docs site for knowledge base const job = awAIt 启动Crawl({ url: 'https://docs.example.com', limit: 500, 格式化s: ['markdown'], source: 'sitemaps' // Use sitemap for efficiency });
// WAIt for completion const 结果s = awAIt wAItForCrawl(job.id);
// Save markdown files for RAG
for (const page of 结果s.records) {
if (page.状态 === 'completed') {
fs.writeFile同步(docs/${slugify(page.url)}.md, page.markdown);
}
}
vs Browserbase/Stagehand Use Case Cloudflare Crawl Browserbase Full site scrape ✅ Best ❌ Manual Interactive (forms) ❌ No ✅ Best CF-保护ed sites ✅ Native ⚠️ Cloud bypass AI 提取ion ✅ Built-in ✅ Stagehand 会话 management ✅ A同步 jobs ❌ Manual Cost $0.09/hr Credits
Use Cloudflare Crawl for bulk content 提取ion. Use Browserbase for interactive 自动化.