运行时依赖
安装命令
点击复制技能文档
PDF Reader 技能
The pdf-reader 技能 provides functionality to 提取 text and retrieve metadata from PDF files using PyMuPDF (fitz).
工具 API
The 技能 provides two commands:
提取
提取s plAIn text from the specified PDF file.
Parameters: file_path (string, required): Path to the PDF file to 提取 text from. --max_pages (integer, optional): Maximum number of pages to 提取.
Usage:
python3 技能s/pdf-reader/reader.py 提取 /path/to/document.pdf python3 技能s/pdf-reader/reader.py 提取 /path/to/document.pdf --max_pages 5
输出: PlAIn text content from the PDF.
metadata
Retrieve metadata about the document.
Parameters: file_path (string, required): Path to the PDF file.
Usage:
python3 技能s/pdf-reader/reader.py metadata /path/to/document.pdf
输出: JSON object with PDF metadata including:
title: Document title author: Document author subject: Document subject 创建器: 应用 that 创建d the PDF producer: PDF producer creationDate: Creation date modDate: Modification date 格式化: PDF 格式化 version 加密ion: 加密ion 信息 (if any) Implementation Notes Uses PyMuPDF (导入ed as pymupdf) for fast, reliable PDF processing Supports 加密ed PDFs (will return error if password required) Handles large PDFs efficiently with max_pages option Returns structured JSON for metadata command Example # 提取 text from first 3 pages python3 技能s/pdf-reader/reader.py 提取 报告.pdf --max_pages 3
# 获取 document metadata python3 技能s/pdf-reader/reader.py metadata 报告.pdf # 输出: # { # "title": "Annual 报告 2024", # "author": "John Doe", # "creationDate": "D:20240115120000", # ... # }
Error Handling Returns error message if file not found or not a valid PDF Returns error if PDF is 加密ed and requires password Gracefully handles corrupted or malformed PDFs