Agent Benchmark — 技能工具

Name: Agent Benchmark — 技能工具
Author: yuyonghao-123

yuyonghao-123

Agent Benchmark — 技能工具

v0.1.0

通过12个标准化任务自动评估AI Agent在文件操作、数据处理、系统操作、健壮性与代码质量五大维度的综合能力。

0· 83·0 当前·0 累计

by @yuyonghao-123·MIT-0

智能体 AI模型访问自动化数据分析

下载技能包

License

MIT-0

最后更新

2026/3/26

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

可疑

medium confidence

The package generally matches a benchmark tool, but there are several mismatches between the documentation and the actual code (runner language, undeclared runtime requirements, and an unexpected write into a 'memory' folder) that warrant caution before installing or running it.

评估建议

What to consider before installing/running: - Clarify the runner: SKILL.md describes a PowerShell runner, but the package includes index.js (Node) that actually runs tasks. Ask the author which runner is intended. - Runtimes required: index.js may spawn python, node, and go; the registry metadata declares no required binaries. Ensure you want those interpreters available on the host and that you trust code executed by them. - Arbitrary code execution: the benchmark executes task-provided code by...

详细分析 ▾

ℹ 用途与能力

The stated purpose (agent capability benchmark) aligns with the included tasks and scoring logic. However the SKILL.md emphasizes a PowerShell runner (src/benchmark-runner.ps1) while the repository contains a Node.js index.js that implements a runner and executes arbitrary language code (python/node/go). The package metadata declares no required binaries, yet index.js expects interpreters/runtimes (python, node, go). This mismatch is disproportionate to the documented purpose and should be clarified.

⚠ 指令范围

SKILL.md instructs users to run a PowerShell script (src/benchmark-runner.ps1) and includes PowerShell task scripts, but the actual executable logic is index.js (Node). index.js writes files, creates temp directories, writes and executes user-supplied code (from tasks.json/tasks) by spawning child processes, and includes behavior not documented in SKILL.md. The instructions in SKILL.md do not fully describe what will be executed on the host.

ℹ 安装机制

There is no install spec (instruction-only claimed), which is lower risk, but the package contains Node code that will be executed if you run it. The tool expects external runtimes (python/node/go) though no required-binaries are declared. No remote download or obscure URLs are present in the package, which reduces installer risk, but the lack of declared runtime requirements is an inconsistency.

ℹ 凭证需求

The skill does not declare required environment variables, which matches registry metadata. index.js spawns processes inheriting process.env and some benchmark tasks intentionally read environment variables (task-011). That's reasonable for 'system operations' tests, but reports capture task outputs (which may include env values) and the tool will persist those outputs—so running tasks that print sensitive environment values could leak them into local reports.

⚠ 持久化与权限

index.js generates reports and explicitly writes a report to a relative '../../memory/benchmark-results.md' (i.e., escapes the package directory). Writing into a 'memory' path outside the skill directory can place results into agent persistent storage; SKILL.md did not document this. The skill is not marked always:true, but this unexpected persistent write and the discrepancy between documented runner and shipped Node runner is a privilege/persistence concern and should be clarified.

⚠ index.js:89

Shell command execution detected (child_process).

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.1.02026/3/26

agent-benchmark v0.1.0 - Initial release of the skill. - Includes 12 standardized benchmark tasks across 5 ability dimensions. - Automated scoring system and Markdown report generation. - Supports custom task definition and analysis by category. - Provides PowerShell-based test runner and clear reporting structure.

● 可疑

安装命令点击复制

官方npx clawhub@latest install agent-benchmark

镜像加速npx clawhub@latest install agent-benchmark --registry https://cn.clawhub-mirror.com

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

安装命令点击复制