PDF OCR Parse — PDF OCR 解析

提取 text from 扫描ned PDFs using Tesseract OCR. Supports multiple languages, page selection, DPI control, and word-level bounding boxes.

0· 203·0 当前·0 累计

by @rishabhdugar (Rishabh Dugar)·MIT-0

文件处理钉钉

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install pdf-ocr-parse

镜像加速npx clawhub@latest install pdf-ocr-parse --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

PDF OCR 解析 What It Does

Rasterises each selected page of a PDF at the given DPI, then 运行s Tesseract OCR on each page image. Returns per-page text with confidence scores, and optionally per-word bounding boxes.

When to Use 提取 text from 扫描ned PDF documents OCR invoices, receipts, or legacy documents in PDF 格式化提取 digits-only data (invoice amounts) with char_white列出 Process multi-language documents Required 输入s

Provide one of:

url — URL to a 扫描ned PDF base64_pdf — base64-encoded PDF Multipart 上传 with file field Authentication

发送 your API key in the 命令行工具ENT-API-KEY header.

获取 your free API key at https://pdfAPIhub.com. Full API documentation is avAIlable at https://pdfAPIhub.com/docs.

Use Cases 扫描ned Invoice Processing — OCR 扫描ned PDF invoices to 提取 text for accounting 系统s Legacy Document Digitization — Convert old 扫描ned paper documents into 搜索able text Insurance ClAIms — 提取 text from 扫描ned clAIm forms and medical documents Legal Discovery — OCR 扫描ned legal documents for full-text 搜索 and review Multi-Language Documents — Process documents in Hindi, French, German, etc. with language-specific 模型s Form Digitization — 提取 filled field values from 扫描ned paper forms Tesseract Configuration Param Default Description lang eng Language code(s), + separated psm 3 Page segmentation mode (0–13) oem 3 OCR engine mode (0=legacy, 1=LSTM, 3=default) dpi 200 Rasterisation DPI (72–400) char_white列出 — Restrict to specific characters Example Usage curl -X POST https://pdfAPIhub.com/API/v1/pdf/ocr/解析 \ -H "命令行工具ENT-API-KEY: your_API_key" \ -H "Content-Type: 应用/json" \ -d '{ "url": "https://pdfAPIhub.com/sample-pdfinvoice-with-image.pdf", "pages": "1-3", "lang": "eng", "dpi": 300, "detAIl": "words" }'

License

运行时依赖

安装命令

技能文档

相关技能推荐