📦 数据

v1.0.3

利用 AI 字段检测清洗并去重多格式数据,实现格式标准化、多源合并,输出 Excel、CSV 或 Feishu Bitable。

0· 31·0 当前·0 累计
billjamno58 头像by @billjamno58 (YK-Global)·MIT
下载技能包
License
MIT
最后更新
2026/4/24
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
该包声明的元数据与运行时指令在所需凭据及 Feishu 集成上不一致;该 skill 会调用外部服务(计费 + AI),而 SKILL.md 对发送的数据集内容描述模糊——安装或提供密钥前请审查。
评估建议
This 技能 includes 运行nable Python code that will call external 服务s: a billing 端点 (技能pay.me) and AI APIs (OPENAI_API_BASE) for field 检测ion/classification. Before 安装ing or providing 凭证s: - 验证 the publisher and source: registry metadata shows "Source: unknown" and no homepage. Prefer packages with a known mAIntAIner and 仓库. - Ask the author to explAIn the 凭证 gap: the registry shows no required env vars but 技能.md requires OPENAI_API_KEY, 技能_BILLING_API_KEY and FEISHU_USER_ID. Also ask how Feishu writ...
详细分析 ▾
用途与能力
The 技能 advertises Feishu Bitable write-back and AI-driven features, which plausibly require external 凭证s, but the registry metadata at the top 列出s "Required env vars: none" and "Primary 凭证: none" — inconsistent with 技能.md and the included code. The 技能.md 列出s OPENAI_API_KEY, 技能_BILLING_API_KEY and FEISHU_USER_ID; the code implements billing (技能pay.me) and AI field identification/classification, so those 环境 variables make sense — but Feishu write-back would normally require Feishu API 凭证s (应用 令牌/命令行工具ent secret) which are not declared. This mismatch between clAImed metadata and actual requirements is a red flag.
指令范围
运行time invokes scripts/mAIn.py which orchestrates parsing, 清理ing, AI classification, billing, 报告ing and Feishu 输出. The 技能.md and code indicate that data will be sent to AI APIs (OPENAI_API_BASE / OPENAI_API_KEY) for field identification/classification but the doc does not explicitly describe what parts of the user's data设置 are transmitted or how sensitive fields are handled. Billing transmits the FEISHU_USER_ID to 技能pay.me. The instructions grant broad discretion (AI auto-检测ion, semantic inference) that could cause sensitive PII to be sent to external AI 端点s; this is not clearly documented or consented in the 技能.md.
安装机制
There is no external 安装 spec or 下载s; the 技能 运行s local Python scripts bundled in the package. No evidence of fetching code from untrusted URLs or executing remote 安装ers. That reduces supply-chAIn risk compared to arbitrary 下载s, but the included code will 执行 locally and may make outbound network calls.
凭证需求
技能.md 请求s multiple sensitive 环境 variables: OPENAI_API_KEY / OPENAI_API_BASE (AI inference), 技能_BILLING_API_KEY (技能Pay billing), 技能_BILLING_技能_ID and FEISHU_USER_ID. These map to the 技能's features, but they are not 列出ed in the registry metadata (incoherent). 添加itionally, Feishu write-back capability would normally require Feishu API 凭证s (应用 令牌s / 访问 令牌s), but no such variables are declared; either the feature is incomplete or it expects implicit 平台-level Feishu 凭证s (not disclosed). Billing.py also implements a "dev mode" where missing billing API key or network errors cause the code to 'fAIl open' and allow operation without charge; that behavior should be explicit to users because it affects billing correctness and abuse risk.
持久化与权限
The 技能 does not 请求 always:true and does not clAIm to modify other 技能s or global 代理 设置tings. It 运行s as a subprocess (scripts/mAIn.py). Autonomous invocation is allowed (平台 default) but not combined with excessive declared privileges here; that's acceptable but be aware an autonomously-invoked 技能 can still exfiltrate data if given API keys.
安全有层次,运行前请审查代码。

License

MIT

可自由使用、修改和再分发,需保留版权声明。

运行时依赖

无特殊依赖

版本

latestv1.0.32026/4/24

- 本次版本未检测到代码或文档变更。 - 版本号升至 1.0.3,无任何文件或功能改动。 - 功能、用法及计费均无更新。 - 现有功能与文档保持不变。

无害

安装命令

点击复制
官方npx clawhub@latest install data-cleaner-ai
镜像加速npx clawhub@latest install data-cleaner-ai --registry https://cn.longxiaskill.com

技能文档

上传凌乱数据 → 获得干净结构化结果。支持多格式解析、AI字段识别、智能去重/补全/格式化、多源合并、飞书原生输出(Bitable + 质量报告文档)。

典型场景: 电商订单清理、CRM客户数据清洗、银行对账、花名册整理、多系统数据合并。

---

能力

F1 · 多格式解析

  • Excel (.xlsx / .xls)
  • CSV / TSV
  • JSON(半结构化)
  • 剪贴板文本粘贴

F2 · 智能字段识别

  • AI自动识别:姓名、手机、邮箱、地址、金额、日期、SKU、订单号、身份证号、性别等
  • 支持用户自定义字段映射覆盖

F3 · 数据清洗

  • 去重:精确匹配 + 模糊去重(FuzzyWuzzy,阈值88%)
  • 缺失值填充:均值/众数/语义推理/留空
  • 格式标准化
- 手机 → 1xx-xxxx-xxxx - 日期 → YYYY-MM-DD - 金额 → 保留2位小数 - 地址 → 省/市/区/街道标准化

F4 · 数据分类/打标签(PRO)

  • 8条内置业务规则(高价值客户、沉睡用户、VIP、企业客户等)
  • 支持自定义JSON规则
  • AI自动打标签(需PRO + AI API Key)

F5 · 多源合并(PRO)

  • 按关键字段跨文件关联合并
  • 无精确键时启用模糊合并(FuzzyWuzzy)
  • 冲突字段解决:按源顺序或最新时间戳优先

F6 · 飞书原生输出

  • Excel / CSV 导出
  • 飞书Bitable(多维表)回写
  • 数据质量报告自动生成飞书文档(Markdown)

---

功能矩阵

| 功能 | FREE | PRO | |------|:----:|:---:| | 多格式解析 | ✅ | ✅ | | 基础去重 | ✅ | ✅ | | 智能填充 | ❌ | ✅ | | 格式标准化 | ❌ | ✅ | | 模糊去重 | ❌ | ✅ | | 多源合并 | ❌ | ✅ | | AI分类 | ❌ | ✅ | | 数据质量报告 | ❌ | ✅ | | 飞书Bitable输出 | ❌ | ✅ |

---

计费

按次计费(无月费):

| 版本 | 单次价格 | |------|---------| | FREE | $0.00 USDT | | PRO | $0.01 USDT |

每次清洗或合并流程 = 1次计费。

---

使用

飞书触发

`` data cleaning deduplication spreadsheet cleanup CRM data cleanup Excel cleaning `

CLI

`bash python scripts/main.py clean -i data.xlsx -o cleaned.xlsx python scripts/main.py clean -t "name,phone\nJohn,13800138000" -f csv -o cleaned.csv python scripts/main.py merge --sources customers.xlsx orders.csv --on phone -o merged.xlsx `

Python API

`python from main import run_clean_pipeline result = run_clean_pipeline( sources=["orders.xlsx"], output_format="xlsx", output_path="/tmp/cleaned.xlsx", dedup_strategy="auto", fill_strategy="auto", classify=True, ai_model="deepseek", generate_report=True, ) `

---

目录结构

` data-cleaner-ai/ ├── SKILL.md └── scripts/ ├── main.py # 入口:run_clean_pipeline / run_merge_pipeline ├── parser.py # F1:多格式解析 ├── field_identifier.py # F2:AI字段识别 ├── cleaner.py # F3:清洗引擎 ├── classifier.py # F4:分类/打标签 ├── merger.py # F5:多源合并 ├── reporter.py # F6:质量报告生成 ├── output.py # F6:输出(Excel/CSV/Bitable/飞书文档) ├── tier_limits.py # 版本权限控制 └── billing.py # SkillPay计费集成 `

---

计费

本技能使用 SkillPay (skillpay.me) 按次计费。 费用: 每次执行 $0.0100 USDT(所有付费档) 外部API: https://skillpay.me/api/v1/billing 传输数据: 用户标识(FEISHU_USER_ID 环境变量) 计费在每次清洗或合并开始时触发;余额不足时返回 payment_url 供充值。

---

必需环境变量

| 变量 | 说明 | |------|------| | FEISHU_USER_ID | 飞书用户 open_id,用于计费识别 | | OPENAI_API_KEY | AI模型API Key(OpenAI、MiniMax或兼容端点) | | OPENAI_API_BASE | AI API基地址(可选,默认MiniMax端点) | | SKILL_BILLING_API_KEY | skillpay.me 的 Builder API Key(付费调用必需) | | SKILL_BILLING_SKILL_ID | SkillPay上的技能slug(默认 data-cleaner-ai) |

---

错误处理

| 错误 | 处理方式 | |------|----------| | 余额不足 | 返回 payment_url` 供充值 | | 网络错误 | … |

数据来源ClawHub ↗ · 中文优化:龙虾技能库