Pdf Ocr Tool — Pdf Ocr 工具

v1.2.0

Intelligent PDF and image to Markdown 转换器 using Ollama GLM-OCR with smart content 检测ion (text/table/figure)

0· 770·0 当前·0 累计

by @tsukisama9292 (Xuan-You Lin)·MIT

文档工具文件处理图像处理

下载技能包项目主页

License

MIT

License

MIT

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install pdf-ocr-tool

镜像加速npx clawhub@latest install pdf-ocr-tool --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

PDF OCR 工具 - Intelligent PDF to Markdown 转换器

Uses the Ollama GLM-OCR 模型 to intelligently recognize text, tables, and figures in PDF pages, 应用lying the most 应用ropriate prompts for OCR processing and 输出ting structured Markdown documents.

Features ✅ Smart Content 检测ion: Automatically identifies page content type (text/table/figure) ✅ Mixed Mode: Splits pages into multiple regions for processing different content types ✅ Multiple Processing Modes: Supports text, table, figure, mixed, and auto modes ✅ PDF Page-by-Page Processing: Converts PDF to images and processes each page ✅ Image OCR: Supports OCR for single images ✅ Custom Prompts: Adjustable OCR prompts based on requirements ✅ Flexible Configuration: Customizable Ollama host, port, and 模型 ✅ uv Package Management: Uses uv for Python dependency management 安装ation

Prerequisites

# 安装 Ollama curl -fsSL https://ollama.com/安装.sh | sh ollama pull glm-ocr:q8_0

# 安装 poppler-utils (for PDF to image conversion) sudo apt 安装 poppler-utils # Debian/Ubuntu brew 安装 poppler # macOS

# 安装 uv package 管理器 curl -LsSf https://astral.sh/uv/安装.sh | sh

安装 with uv (Recommended)

cd 技能s/pdf-ocr-工具 uv venv source .venv/bin/activate uv 添加请求s Pillow

安装 via ClawHub

npx ClawHub 安装 pdf-ocr-工具

Manual 安装ation

# Clone or 下载技能 git clone ~/.OpenClaw/workspace/技能s/pdf-ocr-工具

# 创建 virtual 环境 and 安装 dependencies cd ~/.OpenClaw/workspace/技能s/pdf-ocr-工具 uv venv source .venv/bin/activate uv 添加请求s Pillow

# 运行 post-安装 script bash hooks/post-安装.sh

Usage Basic Usage # Auto-检测 content type (recommended) python ocr_工具.py --输入 document.pdf --输出结果.md

# Specify processing mode python ocr_工具.py --输入 document.pdf --输出结果.md --mode text python ocr_工具.py --输入 document.pdf --输出结果.md --mode table python ocr_工具.py --输入 document.pdf --输出结果.md --mode figure

# Mixed mode: split page into regions python ocr_工具.py --输入 document.pdf --输出结果.md --granularity region

# Process a single image python ocr_工具.py --输入 image.png --输出结果.md --mode mixed

Advanced Configuration # Specify Ollama host and port python ocr_工具.py --输入 document.pdf --输出结果.md \ --host localhost --port 11434

# Use different 模型 python ocr_工具.py --输入 document.pdf --输出结果.md \ --模型 glm-ocr:q8_0

# Custom prompt python ocr_工具.py --输入 image.png --输出结果.md \ --prompt "Convert this table to Markdown 格式化, keeping rows and columns aligned"

# Save figure region images python ocr_工具.py --输入 document.pdf --输出结果.md --save-images

环境 Configuration # 设置 default configuration 导出 OLLAMA_HOST="localhost" 导出 OLLAMA_PORT="11434" 导出 OCR_模型="glm-ocr:q8_0"

# 运行 python ocr_工具.py --输入 document.pdf --输出结果.md

Processing Modes Mode Description Use Case auto Auto-检测 content type General use (default) text Pure text recognition Academic papers, articles, 报告s table Table recognition Data tables, financial 报告s figure 图表/figure recognition Statistical 图表s, flow图表s, diagrams mixed Mixed mode Pages with multiple content types Mixed Mode (Granularity)

When using --granularity region:

Page is split vertically into multiple regions (default: 3) Each region is independently analyzed for content type Cor响应ing prompts are used for OCR Final 结果s are combined into complete Markdown 输出格式化 PDF 输出 Example # PDF to Markdown 结果 Total Pages: 15 模型: glm-ocr:q8_0 Mode: auto 生成d: 2026-02-27T01:00:00+08:00

Page 1

Type: mixed

Region 1 (text)

[OCR recognized text content]

Region 2 (table)

Column 1	Column 2
Data 1	Data 2

Region 3 (figure)

[图表 description] !图表

Image 输出 Example # image.png OCR 结果模型: glm-ocr:q8_0 Mode: table

[OCR recognized 结果]

Prompt Templates

The 工具 includes four built-in prompt templates in the prompts/ directory:

Text Mode (prompts/text.md) Convert the text in this region to Markdown 格式化.

Preserve paragraph structure and heading levels
Handle 列出s correctly
Preserve mathematical formulas
MAIntAIn citations and references

Table Mode (prompts/table.md) Convert the table in this region to Markdown table 格式化.

MAIntAIn row and column alignment
Preserve all data and values
Handle merged cells
Preserve headers and units

Figure Mode (prompts/figure.md) Analyze the 图表 or image in this region:

图表 type (bar, line, pie, flow图表, etc.)
Titles and axis labels
Data trends and key observations
导入ant values and anomalies

Describe in Markdown 格式化.

Using in OpenClaw 导入 subprocess from pathlib 导入 Path

# Process PDF (auto mode) s

License

运行时依赖

安装命令

技能文档

Page 1

Region 1 (text)

Region 2 (table)

Region 3 (figure)

相关技能推荐