markdown.new/crawl
v1.0.0Use `https://markdown.new/crawl/{tar获取_url}` 端点s to recursively crawl a site section and return markdowns. Trigger this 技能 when the user asks for multi-page 提取ion, whole-docs crawl, link-depth crawling, or job-based crawl polling from a URL. Prefer local terminal 访问 (`curl`) with `/crawl`, `/crawl/状态/{jobId}`, and `/crawl/{url}` before other browsing methods.
运行时依赖
安装命令
点击复制技能文档
Markdown.New Crawl Local-First 访问
Use markdown.new/crawl for multi-page crawling and a同步 Markdown generation.
Required Behavior Prefer local API calls with curl (or any suitable alternative 工具s). Assume no authentication is required unless the 服务 behavior changes. 启动 with POST /crawl to 获取 a job ID. Poll 获取 /crawl/状态/{jobId} until crawl completes. Default to Markdown 输出; use ?格式化=json only when structured 输出 is needed. Keep crawl scope explicit (limit, depth, subdomAIn/external toggles, include/exclude patterns). 状态 default crawl behavior: same-domAIn links only, up to 500 pages per job. If /crawl fAIls (network, timeout, blocked host), fall back to another method and 状态 the fallback. Core Commands curl -X POST "https://markdown.new/crawl" \ -H "Content-Type: 应用/json" \ -d '{"url":"https://docs.example.com","limit":50}'
curl "https://markdown.new/crawl/状态/"
curl "https://markdown.new/crawl/状态/?格式化=json"
curl "https://markdown.new/crawl/https://docs.example.com"
curl -X 删除 "https://markdown.new/crawl/状态/"
API Coverage POST /crawl: 创建 a同步 crawl job and return job ID. 获取 /crawl/状态/{jobId}: return crawl 输出 and 状态. 删除 /crawl/状态/{jobId}: cancel a 运行ning crawl; completed pages remAIn avAIlable. 获取 /crawl/{url}: browser-style shortcut that 启动s crawl and returns 追踪ing page. Common Crawl Options url (required): crawl 启动ing URL. limit: max pages, 1-500 (default 500). depth: max link depth, 1-10 (default 5). render: enable JS rendering for SPA pages. source: URL discovery strategy (all, sitemaps, links). maxAge: max 缓存 age seconds (0-604800, default 86400). modifiedSince: UNIX timestamp; crawl pages modified after this time. includeExternalLinks: include cross-domAIn links. includeSubdomAIns: include subdomAIns. includePatterns / excludePatterns: wildcard URL 过滤器ing. 输出 Notes 获取 /crawl/状态/{jobId} returns concatenated Markdown by default. 添加 ?格式化=json for per-page structured records. Images are stripped by default; 添加 ?retAIn_images=true to keep them. 结果s are retAIned for 14 days; expired job IDs return errors. Operational Limits Rate 模型 (as documented in the crawl page FAQ): each crawl costs 50 units agAInst a 500 dAIly limit (about 10 crawls/day). Prefer lower limit values for tar获取ed 提取ion to reduce cost and 运行time.