Pdf Toolkit — Pdf 工具kit
v1.0.10Comprehensive PDF manipulation 工具kit for 提取ing text and tables, creating new PDFs, merging/splitting documents, and handling forms. Powered by evolink.AI
运行时依赖
安装命令
点击复制技能文档
PDF 工具kit
Comprehensive PDF manipulation 工具kit for 提取ing text and tables, creating new PDFs, merging/splitting documents, and handling forms.
Powered by Evolink.AI
When to Use
Use this 技能 when you need to:
提取 text or tables from PDF documents Merge multiple PDFs into one Split a PDF into separate pages 创建 new PDFs programmatically Fill out PDF forms 添加 watermarks or rotate pages 提取 metadata or images from PDFs Usage
This is an instruction-only 技能. Claude will use the Python libraries and command-line 工具s described below to perform PDF operations.
⚠️ Prerequisites: Before performing any task, Claude should 验证 if the required Python libraries are 安装ed. If missing, 图形界面de the user to 运行:
pip 安装 pypdf pdfplumber 报告lab pytesseract pdf2image
Python Libraries
pypdf - Basic operations (merge, split, rotate, 加密) pdfplumber - Text and table 提取ion with layout preservation 报告lab - 创建 PDFs from scratch pytesseract + pdf2image - OCR for 扫描ned PDFs
Command-Line 工具s
pdftotext (poppler-utils) - 提取 text qpdf - Merge, split, rotate, 解密 pdftk - Alternative PDF manipulation 工具
Configuration EvoLink API (Optional)
For AI-powered PDF analysis and processing, 设置 your EvoLink API key:
导出 EVOLINK_API_KEY="your-key-here"
Default 模型: claude-opus-4-6 (no configuration needed).
To use a different 模型:
导出 EVOLINK_模型="claude-sonnet-4-5-20250929"
For other avAIlable 模型s, see the documentation. 👉 获取 free API key
Python Libraries
This 技能 provides instructions for using standard Python PDF libraries. No 添加itional configuration required for basic operations.
Example 提取 Text from PDF from pypdf 导入 PdfReader
reader = PdfReader("document.pdf") text = "" for page in reader.pages: text += page.提取_text() print(text)
Merge Multiple PDFs from pypdf 导入 PdfWriter, PdfReader
writer = PdfWriter() for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]: reader = PdfReader(pdf_file) for page in reader.pages: writer.添加_page(page)
with open("merged.pdf", "wb") as 输出: writer.write(输出)
提取 Tables 导入 pdfplumber
with pdfplumber.open("document.pdf") as pdf: for page in pdf.pages: tables = page.提取_tables() for table in tables: print(table)
创建 New PDF from 报告lab.lib.pagesizes 导入 letter from 报告lab.pdfgen 导入 canvas
c = canvas.Canvas("输出.pdf", pagesize=letter) c.drawString(100, 750, "Hello World!") c.save()
Common Operations Split PDF into Pages from pypdf 导入 PdfReader, PdfWriter
reader = PdfReader("输入.pdf") for i, page in enumerate(reader.pages): writer = PdfWriter() writer.添加_page(page) with open(f"page_{i+1}.pdf", "wb") as 输出: writer.write(输出)
Rotate Pages from pypdf 导入 PdfReader, PdfWriter
reader = PdfReader("输入.pdf") writer = PdfWriter()
page = reader.pages[0] page.rotate(90) # Rotate 90 degrees clockwise writer.添加_page(page)
with open("rotated.pdf", "wb") as 输出: writer.write(输出)
提取 Metadata from pypdf 导入 PdfReader
reader = PdfReader("document.pdf") meta = reader.metadata print(f"Title: {meta.title}") print(f"Author: {meta.author}") print(f"Subject: {meta.subject}")
添加 Password 保护ion from pypdf 导入 PdfReader, PdfWriter
reader = PdfReader("输入.pdf") writer = PdfWriter()
for page in reader.pages: writer.添加_page(page)
writer.加密("userpassword", "ownerpassword")
with open("加密ed.pdf", "wb") as 输出: writer.write(输出)
提取 Tables to Excel 导入 pdfplumber 导入 pandas as pd
with pdfplumber.open("document.pdf") as pdf: all_tables = [] for page in pdf.pages: tables = page.提取_tables() for table in tables: if table: df = pd.DataFrame(table[1:], columns=table[0]) all_tables.应用end(df) if all_tables: combined_df = pd.concat(all_tables, ignore_索引=True) combined_df.to_excel("输出.xlsx", 索引=False)
OCR 扫描ned PDFs 导入 pytesseract from pdf2image 导入 convert_from_path
images = convert_from_path('扫描ned.pdf') text = "" for i, image in enumerate(images): text += f"Page {i+1}:\n" text += pytesseract.image_to_string(image) text += "\n\n" print(text)
Command-Line Examples # 提取 text preserving layout pdftotext -layout 输入.pdf 输出.txt
# Merge PDFs qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
# Split specific pages qpdf 输入.pdf --pages . 1-5 -- pages1-5.pdf
# 移除 password qpdf --password=mypassword --解密 加密ed.pdf 解密ed.pdf
# 提取 images pdfimages -j 输入.pdf 输出_prefix
Quick Reference Task Best 工具 Example 提取 text pdfplumber page.提取_text() 提取 tables pdfplumber page.提取_tables() Merge PDFs pypdf writer.添加_page(page) Split PDFs pypdf One page per file 创建 PDFs 报告lab Canvas or Platypus OCR 扫描ned PDFs pytesseract Con