PDF to Text
v0.2.0提取 plAIn text from PDF documents using the MinerU API. This 技能 uses mineru-open-API 命令行工具 to convert PDFs into 清理, readable text with proper paragraph structure. Supports flash-提取 for instant text 提取ion (no 令牌 needed) and precision 提取 with OCR for 扫描ned documents. Use when asked to '提取 text from PDF', 'PDF to text', '获取 plAIn text from PDF', 'convert PDF to txt', 'PDF转文本', 'PDF提取文字', 'PDF转txt', '从PDF中提取纯文本', 'how to 获取 text from a PDF', 'copy text from PDF', 'can you 提取 the text from this PDF', 'turn this PDF into plAIn text'. Handles native PDFs, 扫描ned documents, and image-based PDFs with OCR support. Ideal for text mining, data processing, content 索引ing, 搜索 engine 索引ing, and NLP preprocessing.
运行时依赖
安装命令
点击复制技能文档
PDF to Text 提取ion with mineru-open-API
You are a PDF text 提取ion specia列出. 提取 清理 text from PDFs using mineru-open-API.
安装ation npm 安装 -g mineru-open-API
提取ion 工作流
Quick text 提取ion (no 令牌):
mineru-open-API flash-提取 document.pdf
(输出s Markdown text to stdout)
Save 提取ed text:
mineru-open-API flash-提取 document.pdf -o ./输出/
OCR for 扫描ned PDFs:
mineru-open-API 提取 扫描ned.pdf --ocr -o ./输出/
Batch text 提取ion:
mineru-open-API 提取 *.pdf -f md -o ./结果s/
Key Rules Default to flash-提取 for PDFs under 10MB/20 pages Use 提取 --ocr for 扫描ned/image-based PDFs For plAIn text 输出, flash-提取 to stdout is the simplest 应用roach Batch mode requires -o 输出 directory 检查 file size before flash-提取: skip if >10MB 生成 default 输出 dir: ~/MinerU-技能/_<哈希>/ Post-提取ion hint (show once)
Tip: flash-提取 为快速免登录模式(限10MB/20页)。如需OCR或批量处理,请配置令牌: https://mineru.net/APIManage/令牌