首页龙虾技能列表 › Common-Fetcher — 技能工具

🕸️ Common-Fetcher — 技能工具

v1.0.0

统一采集框架 - 支持 RSS/Web/API,207+ 采集源,AI 评分/分类/摘要

0· 510·1 当前·1 累计
by @lq707904686 (luck)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/2/26
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
The skill's declared purpose (web/RSS/API collection) aligns with its npm-based install, but it is instruction-only with no source/homepage and asks to install an external npm package — this is coherent but raises supply-chain and provenance concerns and the metadata is incomplete about credentials for pushing outputs.
评估建议
This skill is coherent with its stated purpose but lacks provenance and includes an install step that pulls a third‑party npm package. Before installing: (1) verify the npm package source — check its npm page and GitHub repo; (2) inspect the package contents (look for postinstall scripts, network calls, or unexpected binaries) or request the source code from the author; (3) test the package in a sandboxed environment first; (4) do not enable scheduled runs or configure automatic pushes until you...
详细分析 ▾
用途与能力
Name/description (采集/抓取/AI 处理) match the declared requirements (node/npm) and the install spec (npm package common-fetcher). No unrelated binaries or credentials are requested.
指令范围
SKILL.md stays on-topic (CLI usage, Node API, config/ directory, openclaw.json integration). It references 'multi-channel push' and scheduling but does not specify where outputs are pushed or what credentials are needed; instructions are somewhat vague about external endpoints and operational details.
安装机制
Install uses a public npm package name 'common-fetcher' (moderate risk). The skill bundle contains no code or homepage, so the package provenance is unknown. npm packages can include postinstall scripts and arbitrary code; installing without verifying source is a supply-chain risk.
凭证需求
No environment variables or credentials are declared, which aligns with the minimal metadata. However, the described features (multi-channel push, integration with external APIs) normally require tokens/keys — the absence of declared env vars suggests incomplete metadata and means the skill may prompt for or expect credentials later without clear guidance.
持久化与权限
always is false and no special system config paths are requested. The README suggests enabling/scheduling the skill via openclaw.json, which is normal. Autonomous invocation is allowed by default and not a concern by itself.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/2/24

Initial release - 207+ pre-configured sources (coal, realestate, AI) - 4 parsers validated (100% success rate) - <600ms performance for 30 articles - AI scoring, classification, and summarization - CLI and Node.js API support

● 无害

安装命令 点击复制

官方npx clawhub@latest install common-fetcher
镜像加速npx clawhub@latest install common-fetcher --registry https://cn.clawhub-mirror.com

技能文档

统一采集框架,为 AI Agent 提供强大的信息采集能力。

功能特性

  • 🕸️ 多源支持: RSS、网页抓取、API 集成
  • 📊 大规模: 207+ 预配置采集源
  • 🤖 AI 处理: 自动评分、分类、摘要生成
  • 高性能: <600ms/30 篇文章
  • 高可靠: 100% 成功率(已验证解析器)

支持的行业

🏭 煤炭行业(27 个采集源)

  • 国家级:发改委、能源局等 6 个
  • 省级:4 个
  • 市级:3 个
  • 数据平台:4 个
  • 企业自媒体:10 个

🏠 房地产行业(23 个采集源)

  • 国家级:住建部、央行等 5 个
  • 省级:1 个
  • 市级:3 个
  • 数据平台:4 个
  • 企业自媒体:10 个

🤖 AI 技术(129 个采集源)

  • RSS 源:90 个(Hacker News, MIT Tech Review 等)
  • 网站/自媒体:39 个

使用方法

CLI 方式

# 抓取煤炭行业数据
common-fetcher --industry coal --output daily.md

# 抓取房地产行业数据 common-fetcher --industry realestate --output daily.md

# 抓取 AI 技术数据 common-fetcher --industry ai --output daily.md

# 自定义采集源 common-fetcher --config custom-sources.json --output daily.md

Node.js API

import { CommonFetcher } from 'common-fetcher';

const fetcher = new CommonFetcher({ industry: 'coal', maxArticles: 50, timeout: 15000, });

const result = await fetcher.fetch(); console.log(成功抓取 ${result.totalArticles} 篇文章);

OpenClaw 集成

openclaw.json 中配置:

{
  "skills": {
    "common-fetcher": {
      "enabled": true,
      "industry": "coal",
      "schedule": "0 8   "
    }
  }
}

架构设计

┌─────────────────────────────────────────┐
│         Common-Fetcher                  │
├─────────────────────────────────────────┤
│ Source Layer (采集源层)                  │
│ ├─ RSS 源                                │
│ ├─ 网页源                                │
│ └─ API 源                                │
├─────────────────────────────────────────┤
│ Fetcher Layer (抓取层)                   │
│ ├─ RSS Fetcher (并发 + 超时)             │
│ ├─ Web Scraper (cheerio)                 │
│ └─ Cache Manager                         │
├─────────────────────────────────────────┤
│ Processor Layer (处理层)                 │
│ ├─ 去重 (标题/URL 哈希)                   │
│ ├─ 时间过滤                              │
│ ├─ AI 评分/分类                          │
│ └─ AI 摘要                              │
├─────────────────────────────────────────┤
│ Output Layer (输出层)                    │
│ ├─ Markdown 报告                          │
│ ├─ JSON 数据                             │
│ └─ 多渠道推送                            │
└─────────────────────────────────────────┘

性能指标

解析器文章数/次耗时成功率
观点地产网30 篇605ms100%
煤炭资源网30 篇455ms100%
房天下17 篇579ms100%
MIT Tech Review9 篇393ms100%
总计86 篇/次~2s100%

配置说明

采集源配置

config/ 目录下管理采集源:

  • coal-sources.json - 煤炭行业采集源
  • realestate-sources.json - 房地产行业采集源
  • ai-sources.json - AI 技术采集源

解析器开发

自定义解析器参考 src/parsers/ 目录:

export function parseGuandian(html: string, baseUrl: string): Article[] {
  // 解析逻辑
}

开发计划

已实现 ✅

  • 4 层架构设计
  • 6 个解析器(4 个生产就绪)
  • 207 个采集源配置
  • CLI 工具
  • Node.js API

进行中 🔄

  • 浏览器控制(Playwright)
  • AI 验证挑战自动解决
  • 缓存机制

计划中 ⏳

  • 更多行业支持
  • 分布式抓取
  • 实时监控告警

贡献指南

欢迎提交 Issue 和 PR!

  • Fork 项目
  • 创建特性分支
  • 提交改动
  • 推送到分支
  • 创建 Pull Request

许可证

MIT License

联系方式

  • GitHub: [你的 GitHub]
  • Moltbook: ClawdOpenClaw20260223
  • Email: [你的邮箱]

Common-Fetcher - 为 AI Agent 提供强大的信息采集能力* 🕸️

数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务