Batch File Processor — 并行批量文件处理

Name: Batch File Processor — 并行批量文件处理
Author: ddpie

ddpie

Batch File Processor — 并行批量文件处理

v1.0.0

使用子代理（摘要、分析、提取、转换）进行大规模文件集的并行批处理。适用于对目录中的多个文件执行相同操作，例如生成文件索引/摘要、批量内容分析、批量信息提取或格式转换。

0· 171·0 当前·0 累计

by @ddpie·MIT-0

开发工具 API工具自动化文件处理

下载技能包

License

MIT-0

最后更新

2026/3/22

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能的指令和要求与批量文件处理辅助工具内部一致，但它隐式地预期文件系统读取访问权限和spawn/收集子代理的能力——在使用前应确认和限制这些权限。

评估建议

该技能与其声明的目的相符，但在使用前应确认运行时权限：仅在信任的目录上运行，或者配置允许列表的路径和文件类型。验证您的代理平台支持spawn子代理，并且sessions_yield行为是安全的且有速率限制。考虑首先在非敏感样本文件上进行测试，设置大小/时间限制（模板提到超时），避免使用提升的权限运行它或对包含机密信息（密钥、凭据等）的文件夹进行操作。如果可能，在处理大型或敏感文件集之前添加明确的防护措施（路径允许列表/拒绝列表、最大文件大小、编辑规则）。...

详细分析 ▾

ℹ 用途与能力

Name/description match the instructions: the SKILL.md explains batching, sub-agents, and file summaries. The skill does not declare required environment variables or config paths, yet the runtime instructions assume access to the host filesystem and a sub-agent/session API. This omission is a declaration gap (not necessarily malicious) — the skill legitimately needs file-read and sub-agent capabilities but doesn't state them explicitly.

ℹ 指令范围

Instructions explicitly tell sub-agents to 'Read the following files completely' and use shell find to enumerate files. That is coherent for summarization/analysis, but it means the skill will read full file contents (which can include secrets). There are no instructions requiring other unrelated data sources, network exfiltration endpoints, or environment variables. The guidance lacks safeguards (allowlist/denylist, size limits, or redaction) which raises privacy risk if run against sensitive directories.

✓ 安装机制

No install spec and no code files — the skill is instruction-only, so nothing is written to disk or downloaded during install. This is the lowest-risk install model.

ℹ 凭证需求

The skill requests no credentials or env vars in metadata, which is appropriate. However, it implicitly requires permission to read arbitrary files and to spawn/collect sub-agents (sessions_yield). Ensure those implicit privileges are minor and scoped; otherwise the ability to read many files could expose sensitive data.

✓ 持久化与权限

always is false and the skill does not request persistent/privileged presence or claim to modify other skills or system-wide settings. Autonomous invocation is allowed (platform default) but not combined with other red flags.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/3/22

初始发布：使用子代理的并行批量文件处理方法

● 无害

安装命令点击复制

官方npx clawhub@latest install batch-file-processor

镜像加速npx clawhub@latest install batch-file-processor --registry https://cn.clawhub-mirror.com

技能文档

批量文件处理器

处理大量文件使用子代理，避免主代理上下文溢出。

工作流

1. 列出文件

find  -type f -name "*.md" | sort

2. 分组

将文件分成每组2-4个文件（3个为最佳）。

3. 派遣子代理

每组一个子代理。任务模板：

完全读取以下文件并为每个文件生成简短摘要（少于50字）。
/path/to/file1.md
/path/to/file2.md
/path/to/file3.md
仅返回JSON数组：[{"file": "relative/path/file1.md", "summary": "..."},...]

关键参数：

mode： "run" （一次性任务）
runTimeoutSeconds： 120 （大文件增加到180）
label：描述性标签，例如 idx-project-batch1

4. 收集结果

子代理完成后推送结果。使用 sessions_yield 等待并收集。

5. 编译输出

所有结果收集完成后，主代理编译最终交付物（索引文件、报告等）。

规则

2-4个文件每个子代理 —— 绝不让一个子代理顺序处理整个目录
完全读取文件内容 —— 不进行头/尾截断；部分读取产生不完整的摘要
标准化输出格式 —— JSON使主代理易于解析和合并
每轮一次spawn —— 系统限制；使用多次spawn + yield周期

反模式

错误	后果
使用 `head -20` 浏览文件头	摘要质量差，关键信息丢失
一个子代理处理整个目录	上下文溢出，超时失败
主代理顺序读取所有文件	上下文窗口耗尽，后续文件不可读
一个子代理处理一个大目录	大目录超时，小目录浪费容量

## 基准测试 70个文件 → 25个子代理（每3个文件） → 并行执行 → 5分钟内完成 → 高准确度摘要

任务模板变体

文件摘要（默认）

为每个文件生成简短摘要（少于50字）。

信息提取

从每个文件提取以下字段：项目名称、预算、关键联系人、风险。返回JSON：[{"file": "...", "project": "...", "budget": "...", "contacts": [...], "risks": [...]}]

内容分类

根据以下主题分类每个文件：安全、合规、迁移。返回JSON：[{"file": "...", "has_security": true/false, "has_compliance": true/false, "has_migration": true/false}]

代码分析

分析每个源文件：计数行、列出导入/依赖项、识别主要函数。返回JSON：[{"file": "...", "lines": N, "imports": [...], "main_functions": [...]}]

Process large numbers of files in parallel using sub-agents, avoiding main agent context overflow.

Workflow

1. List files

find  -type f -name "*.md" | sort

2. Group

Split into batches of 2-4 files each (3 is optimal).

3. Dispatch sub-agents

One sub-agent per batch. Task template:

Read the following files completely and generate a brief summary (under 50 words) for each.
/path/to/file1.md
/path/to/file2.md
/path/to/file3.md
Return ONLY a JSON array:
[{"file": "relative/path/file1.md", "summary": "..."},...]

Key parameters:

mode: "run" (one-shot task)
runTimeoutSeconds: 120 (increase to 180 for large files)
label: descriptive label, e.g. idx-project-batch1

4. Collect results

Sub-agents push results on completion. Use sessions_yield to wait and collect incrementally.

5. Compile output

Once all results are in, the main agent compiles the final deliverable (index file, report, etc.).

Rules

2-4 files per sub-agent — never let one sub-agent process an entire directory sequentially
Read full file content — no head/tail truncation; partial reads produce incomplete summaries
Standardize output format — JSON makes it easy for the main agent to parse and merge
One spawn per turn — system limitation; use multiple spawn + yield cycles

Anti-patterns

Mistake	Consequence
`head -20` to skim file headers	Poor summary quality, key information missed
One sub-agent processes entire directory	Context overflow, timeout failure
Main agent reads all files sequentially	Context window exhausted, later files unreadable
One sub-agent per large directory	Large directories timeout, small ones waste capacity

Benchmarks

70 files → 25 sub-agents (3 files each) → parallel execution → completed in 5 minutes → high accuracy summaries

Task Template Variants

File summarization (default)

Generate a brief summary (under 50 words) for each file.

Information extraction

Extract the following fields from each file: project name, budget, key contacts, risks.
Return JSON: [{"file": "...", "project": "...", "budget": "...", "contacts": [...], "risks": [...]}]

Content classification

Classify each file by checking for these topics: security, compliance, migration.
Return JSON: [{"file": "...", "has_security": true/false, "has_compliance": true/false, "has_migration": true/false}]

Code analysis

Analyze each source file: count lines, list imports/dependencies, identify main functions.
Return JSON: [{"file": "...", "lines": N, "imports": [...], "main_functions": [...]}]

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

批量文件处理器

工作流

1. 列出文件

2. 分组

3. 派遣子代理

4. 收集结果

5. 编译输出

规则

反模式

任务模板变体

文件摘要（默认）

信息提取

内容分类

代码分析

Workflow

1. List files

2. Group

3. Dispatch sub-agents

4. Collect results

5. Compile output

Rules

Anti-patterns

Benchmarks

Task Template Variants

File summarization (default)

Information extraction

Content classification

Code analysis

安装命令点击复制