Logics-Parsing阿里文档解析 — 记录ics-Parsing阿里文档解析

v1.0.0

阿里文档智能解析工具 - 将PDF/图片转结构化HTML。支持复杂布局、公式识别、化学结构、代码块、流程图、乐谱等。

0· 0·0 当前·0 累计

by @smseow001 (SMS)

文件处理

下载技能包

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

安全

high confidence

This is an instruction-only 技能 that documents how to 安装 and 运行 Alibaba's 记录ics-Parsing 工具; the described steps, 下载s, and 运行time commands match the 技能's 状态d purpose and there are no unexplAIned 凭证 or 系统 requirements.

评估建议

This 技能 is coherent with its purpose, but note that following the instructions will clone and 执行 third‑party code and 下载 large 模型s. Before 运行ning: (1) review the GitHub 仓库 and license to ensure it’s legitimate; (2) 运行安装s in an isolated 环境 (contAIner/VM) or dedicated conda env; (3) be prepared for large 下载s and GPU requirements; (4) if using Hugging Face 模型下载s, you may need a HF 令牌 — provide only 令牌s you trust and avoid sharing broader 凭证s; and (5) if you cannot inspect the repo, treat executio...

详细分析 ▾

✓ 用途与能力

The name/description (PDF/image → structured HTML, formulas, chemistry, diagrams, etc.) are consistent with the 技能.md content. The instructions (git clone the repo, 创建 Python 环境, pip 安装 requirements, 下载模型s, 运行 inference scripts or Python API) align with the 状态d capability.

ℹ 指令范围

The 技能.md instructs the 代理/user to clone a GitHub repo and 运行模型-下载 and inference scripts. These actions are within the 工具's purpose but do involve network I/O and execution of third-party code from the cloned 仓库; the instructions do not 请求 unrelated files or 环境 variables.

ℹ 安装机制

There is no built-in 安装 spec in the 技能 bundle; the 技能.md recommends cloning https://github.com/alibaba/记录ics-Parsing.git and using conda/pip to 安装 dependencies and 下载模型s (模型scope or Hugging Face). These are common 安装ation steps but they cause the 代理 to fetch and 执行 external code and large 模型 files — moderate operational risk but expected for this purpose. The referenced hosts (GitHub, 模型scope, HuggingFace) are standard.

✓ 凭证需求

The 技能 declares no required 环境 variables, 凭证s, or config paths. The instructions may optionally require a Hugging Face 令牌 or 模型scope 访问 if 模型 repositories are private, but the 技能.md does not demand unrelated secrets. No excessive or unexplAIned 凭证请求s are present.

✓ 持久化与权限

always is false and the 技能 is instruction-only with no code persisted inside the 技能 bundle. It does not 请求 elevated or persistent 代理 privileges or modify other 技能s' configurations.

安全有层次，运行前请审查代码。

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install logics-parsing

镜像加速npx clawhub@latest install logics-parsing --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

记录ics-Parsing 文档解析工具

一、核心定位

本技能整合阿里巴巴记录ics-Parsing 文档解析工具，核心理念：

End-to-End Document Parsing 从文档图片直接输出结构化结果，无需复杂流水线

二、版本对比维度 v1 v2（推荐）发布 2025-09 2026-02 性能基础 SOTA 全面领先记录icsDocBench 基准 82.16 分 OmniDocBench 基准 93.23 分 Parsing-2.0 ❌ 不支持 ✅ 支持结构化内容公式/化学 + 流程图/乐谱/代码三、核心能力 3.1 支持的内容类型类型输出格式说明文本段落 HTML

自动识别标题/页眉/页脚表格 HTML Table 跨页表格合并科学公式 LaTeX / MathML 复杂公式精准识别化学结构 SMILES 格式分子式标准化流程图 MermAId 语法 v2 新增乐谱 ABC Notation v2 新增代码块语法高亮代码 v2 新增手写内容独立标注区分打印/手写 3.2 输出结构

每个元素包含：

category: 元素类型（paragraph/table/formula/figure 等） bbox: 边界框坐标 text: OCR 识别文本四、 Benchmarks 性能 4.1 记录icsDocBench（自建基准）模型总体分数记录ics-Parsing-v2 82.16 ✅ GPT-5 46.0 Gemini 2.5 pro 26.0 Qwen2.5VL-72B 34.9 SmolDo命令行工具ng 92.7 4.2 OmniDocBench-v1.5（公开基准）模型总体分数记录ics-Parsing-v2 93.23 ✅ GPT-5 46.0 Gemini 2.5 pro 46.0 Qwen2VL-72B 35.9 Doubao-1.6 31.7 五、安装方式 5.1 基础安装（推荐 v2） # 1. 克隆仓库 git clone https://github.com/alibaba/记录ics-Parsing.git cd 记录ics-Parsing

# 2. 创建环境（Python 3.10） conda 创建 -n 记录ics-parsing python=3.10 conda activate 记录ics-parsing

# 3. 安装依赖 pip 安装 -r requirements.txt

# 4. 下载模型（模型scope） pip 安装模型scope python 下载_模型_v2.py -t 模型scope

# 或从 HuggingFace pip 安装 huggingface_hub python 下载_模型_v2.py -t huggingface

5.2 快速安装（仅 v1） conda 创建 -n 记录ics-parsing python=3.10 conda activate 记录ics-parsing pip 安装 -r requirements.txt

# 下载模型 python 下载_模型.py -t 模型scope

六、快速开始 6.1 v2 推理命令 python3 inference_v2.py \ --image_path PATH_TO_输入_IMG \ --输出_path PATH_TO_输出 \ --模型_path PATH_TO_模型

6.2 v1 推理命令 python3 inference.py \ --image_path PATH_TO_输入_IMG \ --输出_path PATH_TO_输出 \ --模型_path PATH_TO_模型

6.3 Python API from 记录ics_parsing 导入记录ics解析器

# 初始化解析器 = 记录ics解析器(模型_path="path/to/模型")

# 解析文档结果 = 解析器.解析("document.jpg")

# 输出 HTML print(结果.html)

# 输出结构化 JSON print(结果.to_json())

七、应用场景 7.1 学术文档处理场景能力论文 PDF 解析提取公式/表格/参考文献化学论文 SMILES 格式分子结构数学讲义 LaTeX 公式精准提取教科书复杂布局（多栏/跨页）处理 7.2 商业文档处理场景能力合同解析条款表格结构化财务报表数字表格提取发票识别表单字段提取报纸剪报复杂排版处理 7.3 Parsing-2.0 场景（v2 新增）场景输出格式流程图 MermAId 代码乐谱 ABC Notation 代码块语法高亮代码 Pseudocode 结构化伪代码八、输出示例 8.1 输入 [复杂布局学术论文图片，包含多栏文字、跨页表格、化学结构式]

8.2 结构化输出（HTML）

We introduce a new document parsing 模型...

E = mc^2

CC(=O)OC(=O)C

Method	Score
记录ics-Parsing	82.16

九、与其他技能关联本技能关联技能关系记录ics-Parsing AI-re搜索-工具s 论文解析 + 科研自动化记录ics-Parsing browser-use 网页内容抓取 + 解析记录ics-Parsing obsidian-handbook 解析结果存入 Obsidian 记录ics-Parsing math-theory-notes 数学公式识别十、常见问题问题解决方案模型下载慢使用模型scope（国内推荐）显存不足减小 image_size 参数 OCR 乱码检查字体配置表格识别不准使用 v2 版本性能更优十一、注意事项 ⚠️ 注意事项：

Python 3.10+ required
需要 GPU（推荐 8GB+ 显存）
模型文件较大（~2GB），下载需要网络
部分功能需要额外字体支持

十二、使用方式触发场景用户说「解析这篇 PDF」→ 调用记录ics-Parsing v2 用户说「提取论文公式」→ 调用记录ics-Parsing 用户说「识别化学结构式」→ SMILES 格式输出用户说「将 PDF 转 HTML」→ 结构化 HTML 输出用户说「解析乐谱」→ v2 Parsing-2.0 功能

组合使用用户：「帮我把这篇论文的关键公式和表格提取出来」 → 使用记录ics-Parsing v2 解析 → 提取公式（LaTeX）+ 表格（HTML） → 存入 Obsidian 或知识库

本技能整合阿里记录ics-Parsing 文档解析工具的完整安装与使用指南

运行时依赖

安装命令

技能文档

相关技能推荐