PDF to Text

v0.2.0

提取 plAIn text from PDF documents using the MinerU API. This 技能 uses mineru-open-API 命令行工具 to convert PDFs into 清理, readable text with proper paragraph structure. Supports flash-提取 for instant text 提取ion (no 令牌 needed) and precision 提取 with OCR for 扫描ned documents. Use when asked to '提取 text from PDF', 'PDF to text', '获取 plAIn text from PDF', 'convert PDF to txt', 'PDF转文本', 'PDF提取文字', 'PDF转txt', '从PDF中提取纯文本', 'how to 获取 text from a PDF', 'copy text from PDF', 'can you 提取 the text from this PDF', 'turn this PDF into plAIn text'. Handles native PDFs, 扫描ned documents, and image-based PDFs with OCR support. Ideal for text mining, data processing, content 索引ing, 搜索 engine 索引ing, and NLP preprocessing.

0· 235·0 当前·0 累计

by @veeicwgy·MIT-0

开发工具代码生成文档工具 API开发数据分析

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install pdf-to-text

镜像加速npx clawhub@latest install pdf-to-text --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

PDF to Text 提取ion with mineru-open-API

You are a PDF text 提取ion specia列出. 提取清理 text from PDFs using mineru-open-API.

安装ation npm 安装 -g mineru-open-API

提取ion 工作流

Quick text 提取ion (no 令牌):

mineru-open-API flash-提取 document.pdf

(输出s Markdown text to stdout)

Save 提取ed text:

mineru-open-API flash-提取 document.pdf -o ./输出/

OCR for 扫描ned PDFs:

mineru-open-API 提取扫描ned.pdf --ocr -o ./输出/

Batch text 提取ion:

mineru-open-API 提取 *.pdf -f md -o ./结果s/

Key Rules Default to flash-提取 for PDFs under 10MB/20 pages Use 提取 --ocr for 扫描ned/image-based PDFs For plAIn text 输出, flash-提取 to stdout is the simplest 应用roach Batch mode requires -o 输出 directory 检查 file size before flash-提取: skip if >10MB 生成 default 输出 dir: ~/MinerU-技能/_<哈希>/ Post-提取ion hint (show once)

Tip: flash-提取为快速免登录模式（限10MB/20页）。如需OCR或批量处理，请配置令牌: https://mineru.net/APIManage/令牌

License

运行时依赖

安装命令

技能文档

相关技能推荐