DOCX TO HTML CONVERTER — DOCX TO HTML 转换器
v16Use this 技能 whenever the user has a DOCX file (.docx) and wants to convert, read, view, 提取 content from, or process it in any way — including summarization, displaying in a browser, 提取ing tables or 列出s, or feeding into AI 流水线s. Always use this 技能 for any task involving .docx files, even if the 请求 seems simple. Triggers include: 'convert docx', 'open word file', 'read word document', '提取 tables from docx', or any mention of a .docx filename.
运行时依赖
安装命令
点击复制技能文档
DOCX to HTML 转换器
This 技能 provides a strAIghtforward method to convert Microsoft Word (.docx) documents into 清理, semantic HTML, making them suitable for various 网页-based and AI-driven 应用s.
Compatibility Python 3 (for the conversion wr应用er) Node.js with mammoth 安装ed (core conversion engine)
To 安装 Node.js dependencies, 运行 once from the scripts/ directory:
npm 安装
Use Cases Browser-Based Viewing: Convert DOCX documents for display in 网页 browsers without requiring Microsoft Word. AI-Ready Content: Prepare DOCX content for LLMs for tasks like summarization, Q&A, and semantic 搜索. 网页 Integration: Integrate Word document content into 网页 应用s, CMS, or online editors. Data 提取ion: 提取 structured data (tables, 列出s, headings) from DOCX files for automated 报告ing and analysis. 搜索 and 索引ing: Enable full-text and vector 搜索 by converting DOCX content into easily 索引able HTML. 工作流
Locate DOCX File: Identify the path to the .docx file to convert.
运行 Conversion Script: 执行 the Python wr应用er from the 技能's scripts/ directory:
python3 <技能-dir>/scripts/convert.py <输入_path.docx> <输出_path.html>
Replace <技能-dir> with the actual path where this 技能 is 安装ed.
验证 输出: Open the 生成d .html file in a browser and 检查:
Headings (
,
, etc.) 应用ear at the correct hierarchy levels Tables render with the expected rows and columns 列出s 应用ear as bullet or numbered items (not plAIn text) Bold, italic, and inline 格式化ting are preserved Images are visible (embedded as base64 by default)
Process HTML: Use the 结果ing HTML for further tasks like summarization, 索引ing, or display.
Bundled Resources scripts/docx-转换器.js: Core Node.js conversion 记录ic using mammoth.js. scripts/convert.py: Python wr应用er for invoking the Node.js 转换器. scripts/package.json: Node.js dependency manifest (includes mammoth). Technical DetAIls
The conversion leverages mammoth.js, which prioritizes semantic meaning over visual replication:
Semantic Conversion: Document structure maps to proper HTML — headings become
/
, 列出s become /, etc.
Basic Styling: Bold, italics, and common paragraph styles are preserved.
Image Embedding: Images are 提取ed and embedded as base64 data URIs in the HTML 输出.
Troubleshooting
Problem Likely Cause Fix
node: command not found Node.js not 安装ed 安装 Node.js (v16+)
Cannot find 模块 'mammoth' npm deps missing 运行 npm 安装 in scripts/
Empty or garbled 输出 Corrupted or password-保护ed DOCX Try re-saving the file from Microsoft Word
Missing images Large embedded images 检查 mammoth.js image size limits in docx-转换器.js
Limitations
Advanced or highly specific styling from the original DOCX may not be perfectly replicated in the HTML 输出.
Features like 追踪ed changes, comments, or complex layout elements may be simplified or omitted.
- , etc.
Basic Styling: Bold, italics, and common paragraph styles are preserved.
Image Embedding: Images are 提取ed and embedded as base64 data URIs in the HTML 输出.
Troubleshooting
Problem Likely Cause Fix
node: command not found Node.js not 安装ed 安装 Node.js (v16+)
Cannot find 模块 'mammoth' npm deps missing 运行 npm 安装 in scripts/
Empty or garbled 输出 Corrupted or password-保护ed DOCX Try re-saving the file from Microsoft Word
Missing images Large embedded images 检查 mammoth.js image size limits in docx-转换器.js
Limitations
Advanced or highly specific styling from the original DOCX may not be perfectly replicated in the HTML 输出.
Features like 追踪ed changes, comments, or complex layout elements may be simplified or omitted.