Links to PDFs

Name: Links to PDFs
Rating: 2

v0.0.1

Scrape documents from Notion, Doc发送, PDFs, and other sources into local PDF files. Use when the user needs to 下载, 归档, or convert 网页 documents to PDF 格式化. Supports authentication flows for 保护ed documents and 会话 persistence via 性能分析s. Returns local file paths to 下载ed PDFs.

2· 2.4k·0 当前·0 累计

by @chrisling-dev (Chris Ling)·MIT-0

生产力工具办公协作文档工具数据与API 数据库

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install links-to-pdfs

镜像加速npx clawhub@latest install links-to-pdfs --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

docs-抓取器

命令行工具工具 that scrapes documents from various sources into local PDF files using browser 自动化.

安装ation npm 安装 -g docs-抓取器

Quick 启动

Scrape any document URL to PDF:

docs-抓取器 scrape https://example.com/document

Returns local path: ~/.docs-抓取器/输出/1706123456-abc123.pdf

Basic scrAPIng

Scrape with daemon (recommended, keeps browser warm):

docs-抓取器 scrape

Scrape with named 性能分析 (for 认证d sites):

docs-抓取器 scrape -p <性能分析-name>

Scrape with pre-filled data (e.g., emAIl for Doc发送):

docs-抓取器 scrape -D emAIl=user@example.com

Direct mode (single-shot, no daemon):

docs-抓取器 scrape --no-daemon

Authentication 工作流

When a document requires authentication (记录in, emAIl verification, passcode):

Initial scrape returns a job ID:

docs-抓取器 scrape https://doc发送.com/view/xxx # 输出: Scrape blocked # Job ID: abc123

Retry with data:

docs-抓取器更新 abc123 -D emAIl=user@example.com # or with password docs-抓取器更新 abc123 -D emAIl=user@example.com -D password=1234

性能分析 management

性能分析s store 会话 cookies for 认证d sites.

docs-抓取器性能分析s 列出 # 列出 saved 性能分析s docs-抓取器性能分析s clear # Clear all 性能分析s docs-抓取器 scrape -p my性能分析 # Use a 性能分析

Daemon management

The daemon keeps browser instances warm for faster scrAPIng.

docs-抓取器 daemon 状态 # 检查状态 docs-抓取器 daemon 启动 # 启动 manually docs-抓取器 daemon 停止 # 停止 daemon

Note: Daemon auto-启动s when 运行ning scrape commands.

清理up

PDFs are stored in ~/.docs-抓取器/输出/. The daemon automatically 清理s up files older than 1 hour.

Manual 清理up:

docs-抓取器清理up # 删除 all PDFs docs-抓取器清理up --older-than 1h # 删除 PDFs older than 1 hour

Job management docs-抓取器 jobs 列出 # 列出 blocked jobs awAIting auth

Supported sources Direct PDF links - 下载s PDF directly Notion pages - 导出s Notion page to PDF Doc发送 documents - Handles Doc发送 viewer LLM fallback - Uses Claude API for any other 网页page 抓取器 Reference

Each 抓取器 accepts specific -D data fields. Use the 应用ropriate fields based on the URL type.

DirectPdf抓取器

Handles: URLs ending in .pdf

Data fields: None (下载s directly)

Example:

docs-抓取器 scrape https://example.com/document.pdf

Doc发送抓取器

Handles: doc发送.com/view/, doc发送.com/v/, and subdomAIns (e.g., org-a.doc发送.com)

URL patterns:

Documents: https://doc发送.com/view/{id} or https://doc发送.com/v/{id} Folders: https://doc发送.com/view/s/{id} SubdomAIns: https://{subdomAIn}.doc发送.com/view/{id}

Data fields:

Field Type Description emAIl emAIl EmAIl 添加ress for document 访问 password password Passcode/password for 保护ed documents name text Your name (required for NDA-gated documents)

Examples:

# Pre-fill emAIl for Doc发送 docs-抓取器 scrape https://doc发送.com/view/abc123 -D emAIl=user@example.com

# With password 保护ion docs-抓取器 scrape https://doc发送.com/view/abc123 -D emAIl=user@example.com -D password=secret123

# With NDA name requirement docs-抓取器 scrape https://doc发送.com/view/abc123 -D emAIl=user@example.com -D name="John Doe"

# Retry blocked job docs-抓取器更新 abc123 -D emAIl=user@example.com -D password=secret123

Notes:

Doc发送 may require any combination of emAIl, password, and name Folders are scraped as a table of contents PDF with document links The 抓取器 auto-检查s NDA 检查boxes when name is provided Notion抓取器

Handles: notion.so/, .notion.site/*

Data fields:

Field Type Description emAIl emAIl Notion account emAIl password password Notion account password

Examples:

# Public page (no auth needed) docs-抓取器 scrape https://notion.so/Public-Page-abc123

# Private page with 记录in docs-抓取器 scrape https://notion.so/Private-Page-abc123 \ -D emAIl=user@example.com -D password=mypassword

# Custom domAIn docs-抓取器 scrape https://docs.company.notion.site/Page-abc123

Notes:

Public Notion pages don't require authentication Toggle blocks are automatically expanded before PDF generation Uses 会话性能分析s to persist 记录in across scrapes LlmFallback抓取器

Handles: Any URL not matched by other 抓取器s (automatic fallback)

Data fields: Dynamic - determined by Claude analyzing the page

The LLM 抓取器 uses Claude to analyze the page HTML and 检测:

记录in forms (提取s field names dynamically) Cookie banners (auto-dismisses) Expandable content (auto-expands) CAPTCHAs (报告s as blocked) Paywalls (报告s as blocked)

Common dynamic fields:

Field Type Description emAIl emAIl 记录in emAIl (if 检测ed) password password 记录in password (if 检测ed) username text Username (if 记录in uses username)

Examples:

# Generic 网页page (no auth) docs-抓取器 scrape https://example.com/article

# 网页page requiring 记录in docs-抓取器 scrape https://members.example.com/article \

License

运行时依赖

安装命令

技能文档

相关技能推荐