首页龙虾技能列表 › Webfetch Md — 技能工具

Webfetch Md — 技能工具

v1.1.0

[自动翻译] Fetch any webpage and convert its main content into clean Markdown format, preserving image links and resolving relative URLs.

0· 1,100·5 当前·5 累计
by @shijianwen (ShiJianwen)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/8
安全扫描
VirusTotal
可疑
查看报告
OpenClaw
安全
high confidence
The skill's files, dependencies, and runtime instructions align with its stated purpose (fetch a webpage and convert it to Markdown); there are no obvious mismatches or hidden endpoints, though it can fetch arbitrary URLs so use in a networked agent should be considered.
评估建议
This skill appears to do exactly what it says: fetch a provided URL and convert the main content to Markdown using cheerio and turndown. Before installing or enabling it for autonomous agents, consider: (1) runtime Node version — cheerio notes Node >= 20.18.1 in package metadata, so ensure compatibility; (2) network risk — the skill can fetch any URL you pass it, so do not let untrusted prompts cause the agent to fetch internal URLs or sensitive endpoints (SSRF/internal network exposure); (3)...
详细分析 ▾
用途与能力
The name/description (fetch webpage → Markdown) matches the code and SKILL.md. index.js implements HTML fetching, content extraction, URL resolution, and turndown conversion; package.json lists cheerio and turndown as dependencies and node is required. The included CLI and OpenClaw tool wrappers call the same core function, which is coherent.
指令范围
SKILL.md instructs the agent to run the CLI/tool with a URL and the code only fetches and processes the provided URL; it does not read local files or environment secrets. However, because the skill will fetch arbitrary URLs provided at runtime, it can reach internal network endpoints or external sites — this is expected for the stated purpose but carries the usual SSRF / internal-network-access risk if run in a privileged network context.
安装机制
There is no remote install script or unusual download URL. This is an instruction-and-code bundle relying on Node and standard npm packages (cheerio, turndown). package-lock.json points to packages on the npm registry (registry.npmjs.org), not personal servers or shortened URLs, so the install footprint is conventional.
凭证需求
The skill requests no environment variables, credentials, or config paths. All required runtime inputs are provided as URL parameters, which is proportionate to the stated functionality.
持久化与权限
The skill is not marked always:true and does not attempt to modify other skills or system-wide configuration. It performs no autonomous privilege-escalating actions in the provided code.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.1.02026/2/15

统一 CLI 和 OpenClaw 工具入口,优化错误处理,更新文档

● 可疑

安装命令 点击复制

官方npx clawhub@latest install webfetch-md
镜像加速npx clawhub@latest install webfetch-md --registry https://cn.clawhub-mirror.com

技能文档

抓取任意网页,转换为干净的 Markdown 格式,保留图片链接。

使用方法

作为 OpenClaw 工具调用

webfetch-md url="https://example.com"

CLI 使用

# 基本使用(输出 JSON 格式)
npx webfetch-md https://example.com

# 或使用 --url 参数 npx webfetch-md --url https://example.com

# 提取 Markdown 内容(配合 jq) npx webfetch-md https://example.com | jq -r '.markdown'

# 保存到文件 npx webfetch-md https://example.com | jq -r '.markdown' > article.md

输出格式

CLI 和工具都输出统一的 JSON 格式:

{
  "success": true,
  "title": "文章标题",
  "markdown": "# 文章标题\n\n正文内容...",
  "images": ["https://example.com/img1.png"],
  "imageCount": 1,
  "contentLength": 1523
}

作为模块使用

const { fetchAsMarkdown } = require('./index');
const result = await fetchAsMarkdown('https://example.com');
console.log(result.markdown);

功能特点

  • ✅ 抓取任意网页 HTML
  • ✅ 智能提取正文内容(过滤导航、广告等)
  • ✅ 保留图片链接(转换为 !alt 格式)
  • ✅ 自动转换相对路径为绝对路径
  • ✅ 输出干净的 Markdown

依赖

  • turndown: HTML to Markdown 转换
  • cheerio: HTML 解析和提取

技术实现

核心流程

  • 网页抓取:使用 fetch API 获取 HTML,模拟浏览器 User-Agent
  • HTML解析:使用 cheerio 加载和解析 HTML 内容
  • 内容提取:智能识别正文区域,过滤无关元素
  • URL处理:将相对路径转换为绝对路径
  • Markdown转换:使用 turndown 转换为标准 Markdown 格式

智能内容提取算法

按优先级选择正文容器:
  • article 标签
  • main 标签
  • [role="main"] 属性
  • .post-content / .entry-content
  • .content / .post
  • #content / #main ID
  • 回退到 body 标签

自动过滤的元素

  • 脚本和样式标签
  • 导航、页眉、页脚
  • 侧边栏和广告区域
  • 评论区

错误处理

工具返回统一的 JSON 格式,包含 success 字段标识操作状态:

{
  "success": false,
  "error": "错误信息"
}

开发说明

项目结构

webfetch-md/
├── index.js          # 核心功能模块
├── cli.js           # CLI 和 OpenClaw 工具入口
├── package.json     # 依赖配置
├── test.js          # 测试脚本
└── SKILL.md         # 技能文档

测试

# 运行测试
npm test

# 或直接测试 node test.js https://example.com

版本历史

  • v1.1.0 (当前): 统一 CLI 和 OpenClaw 工具入口,优化错误处理
  • v1.0.1: 基础功能实现,支持网页抓取和 Markdown 转换
  • v1.0.0: 初始版本发布
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务