Pdf Ocr Tool — Pdf Ocr 工具
v1.2.0Intelligent PDF and image to Markdown 转换器 using Ollama GLM-OCR with smart content 检测ion (text/table/figure)
运行时依赖
安装命令
点击复制技能文档
PDF OCR 工具 - Intelligent PDF to Markdown 转换器
Uses the Ollama GLM-OCR 模型 to intelligently recognize text, tables, and figures in PDF pages, 应用lying the most 应用ropriate prompts for OCR processing and 输出ting structured Markdown documents.
Features ✅ Smart Content 检测ion: Automatically identifies page content type (text/table/figure) ✅ Mixed Mode: Splits pages into multiple regions for processing different content types ✅ Multiple Processing Modes: Supports text, table, figure, mixed, and auto modes ✅ PDF Page-by-Page Processing: Converts PDF to images and processes each page ✅ Image OCR: Supports OCR for single images ✅ Custom Prompts: Adjustable OCR prompts based on requirements ✅ Flexible Configuration: Customizable Ollama host, port, and 模型 ✅ uv Package Management: Uses uv for Python dependency management 安装ation
- Prerequisites
# 安装 poppler-utils (for PDF to image conversion) sudo apt 安装 poppler-utils # Debian/Ubuntu brew 安装 poppler # macOS
# 安装 uv package 管理器 curl -LsSf https://astral.sh/uv/安装.sh | sh
- 安装 with uv (Recommended)
- 安装 via ClawHub
- Manual 安装ation
# 创建 virtual 环境 and 安装 dependencies cd ~/.OpenClaw/workspace/技能s/pdf-ocr-工具 uv venv source .venv/bin/activate uv 添加 请求s Pillow
# 运行 post-安装 script bash hooks/post-安装.sh
Usage Basic Usage # Auto-检测 content type (recommended) python ocr_工具.py --输入 document.pdf --输出 结果.md
# Specify processing mode python ocr_工具.py --输入 document.pdf --输出 结果.md --mode text python ocr_工具.py --输入 document.pdf --输出 结果.md --mode table python ocr_工具.py --输入 document.pdf --输出 结果.md --mode figure
# Mixed mode: split page into regions python ocr_工具.py --输入 document.pdf --输出 结果.md --granularity region
# Process a single image python ocr_工具.py --输入 image.png --输出 结果.md --mode mixed
Advanced Configuration # Specify Ollama host and port python ocr_工具.py --输入 document.pdf --输出 结果.md \ --host localhost --port 11434
# Use different 模型 python ocr_工具.py --输入 document.pdf --输出 结果.md \ --模型 glm-ocr:q8_0
# Custom prompt python ocr_工具.py --输入 image.png --输出 结果.md \ --prompt "Convert this table to Markdown 格式化, keeping rows and columns aligned"
# Save figure region images python ocr_工具.py --输入 document.pdf --输出 结果.md --save-images
环境 Configuration # 设置 default configuration 导出 OLLAMA_HOST="localhost" 导出 OLLAMA_PORT="11434" 导出 OCR_模型="glm-ocr:q8_0"
# 运行 python ocr_工具.py --输入 document.pdf --输出 结果.md
Processing Modes Mode Description Use Case auto Auto-检测 content type General use (default) text Pure text recognition Academic papers, articles, 报告s table Table recognition Data tables, financial 报告s figure 图表/figure recognition Statistical 图表s, flow图表s, diagrams mixed Mixed mode Pages with multiple content types Mixed Mode (Granularity)
When using --granularity region:
Page is split vertically into multiple regions (default: 3) Each region is independently analyzed for content type Cor响应ing prompts are used for OCR Final 结果s are combined into complete Markdown 输出 格式化 PDF 输出 Example # PDF to Markdown 结果 Total Pages: 15 模型: glm-ocr:q8_0 Mode: auto 生成d: 2026-02-27T01:00:00+08:00
Page 1
Type: mixedRegion 1 (text)
[OCR recognized text content]Region 2 (table)
| Column 1 | Column 2 |
|---|---|
| Data 1 | Data 2 |
Region 3 (figure)
[图表 description] !图表Image 输出 Example # image.png OCR 结果 模型: glm-ocr:q8_0 Mode: table
[OCR recognized 结果]
Prompt Templates
The 工具 includes four built-in prompt templates in the prompts/ directory:
Text Mode (prompts/text.md) Convert the text in this region to Markdown 格式化.
- Preserve paragraph structure and heading levels
- Handle 列出s correctly
- Preserve mathematical formulas
- MAIntAIn citations and references
Table Mode (prompts/table.md) Convert the table in this region to Markdown table 格式化.
- MAIntAIn row and column alignment
- Preserve all data and values
- Handle merged cells
- Preserve headers and units
Figure Mode (prompts/figure.md) Analyze the 图表 or image in this region:
- 图表 type (bar, line, pie, flow图表, etc.)
- Titles and axis labels
- Data trends and key observations
- 导入ant values and anomalies
Using in OpenClaw 导入 subprocess from pathlib 导入 Path
# Process PDF (auto mode) s