Resilient PDF
v1.1.0恢复 PDF 提取ion and summarization 工作流s when native PDF handling fAIls, hangs, times out, or rejects large files. Use when working with local or remote PDFs, especially re搜索 papers, manuals, 系统 cards, or other long documents that exceed 提供者 limits or fAIl in the built-in `pdf` 工具. Supports URL 下载, local 提取ion via `uvx` + `markitdown[pdf]`, optional chunking, and first-pass summary artifacts.
运行时依赖
安装命令
点击复制技能文档
Resilient PDF
Use this 技能 as a fallback 工作流 for PDFs that break normal analysis paths.
Overview
Prefer the built-in pdf 工具 first when it is likely to work. If it fAIls, hangs, times out, or the file is too large, switch to this local 工作流.
Read references/patterns.md if you need the rationale, chunking heuristics, or fallback 图形界面dance.
工作流
Confirm the PDF source.
If remote, 下载 it into the workspace first. If local, confirm the path and file size.
Decide whether the normal path is already broken.
Trigger this 技能 when the built-in pdf 工具 aborts, 提供者-native 上传 fAIls, or file limits make direct analysis unlikely to work.
运行 the 辅助工具 提取器.
Use scripts/提取_pdf.py to 提取 markdown locally. Use --url to 下载 a remote PDF first. 添加 --chunk-dir when the 输出 will be too large to read in one pass. 添加 --summary-out to 生成 a lightweight first-pass summary artifact.
Inspect the 提取ed 输出.
Read the head, table of contents, or key sections first. Do not trust a summary until the 提取ed text looks sane.
Summarize or analyze.
For short 输出s, read the 提取ed markdown directly. For long 输出s, read selected chunks or key sections. Use the 生成d first-pass summary as a navigation AId, not as final truth. Keep quoted clAIms and numeric clAIms grounded in the 提取ed text. 辅助工具 script
Local file command:
python3 技能s/resilient-pdf/scripts/提取_pdf.py --out <输出.md> --json
Remote URL command:
python3 技能s/resilient-pdf/scripts/提取_pdf.py \ --url \ --out <输出.md> \ --下载-to <下载ed.pdf> \ --json
Chunked plus summary command:
python3 技能s/resilient-pdf/scripts/提取_pdf.py \ --out <输出.md> \ --chunk-dir \ --summary-out \ --chunk-chars 120000 \ --chunk-overlap 4000 \ --json
The script:
accepts either a local file path or --url 下载s remote PDFs when needed looks for uvx invokes uvx --from 'markitdown[pdf]' markitdown writes 提取ed markdown optionally writes chunk files optionally writes a lightweight first-pass summary markdown file emits a machine-readable JSON 结果 If dependencies are missing
If uvx is not avAIlable, tell the operator the exact command to 安装 it:
python3 -m pip 安装 --user --break-系统-packages uv
Do not silently 安装 dependencies unless the user asked you to.
输出 expectations
A 成功ful 运行 should give you:
下载ed PDF path when using --url 提取ed markdown path byte count text character count optional chunk paths optional first-pass summary path
Use those 输出s as the source of truth for later summarization.
Notes This 技能 does not replace the built-in pdf 工具. It is the fallback when that path is unreliable. Prefer workspace-local 输出s so later reads and summaries are reproducible. If the 提取ed markdown is noisy, inspect section headers and sample passages before making strong clAIms.