首页龙虾技能列表 › Agent Benchmark — 技能工具

Agent Benchmark — 技能工具

v0.1.0

通过12个标准化任务自动评估AI Agent在文件操作、数据处理、系统操作、健壮性与代码质量五大维度的综合能力。

0· 83·0 当前·0 累计
by @yuyonghao-123·MIT-0
下载技能包
License
MIT-0
最后更新
2026/3/26
安全扫描
VirusTotal
可疑
查看报告
OpenClaw
可疑
medium confidence
The package generally matches a benchmark tool, but there are several mismatches between the documentation and the actual code (runner language, undeclared runtime requirements, and an unexpected write into a 'memory' folder) that warrant caution before installing or running it.
评估建议
What to consider before installing/running: - Clarify the runner: SKILL.md describes a PowerShell runner, but the package includes index.js (Node) that actually runs tasks. Ask the author which runner is intended. - Runtimes required: index.js may spawn python, node, and go; the registry metadata declares no required binaries. Ensure you want those interpreters available on the host and that you trust code executed by them. - Arbitrary code execution: the benchmark executes task-provided code by...
详细分析 ▾
用途与能力
The stated purpose (agent capability benchmark) aligns with the included tasks and scoring logic. However the SKILL.md emphasizes a PowerShell runner (src/benchmark-runner.ps1) while the repository contains a Node.js index.js that implements a runner and executes arbitrary language code (python/node/go). The package metadata declares no required binaries, yet index.js expects interpreters/runtimes (python, node, go). This mismatch is disproportionate to the documented purpose and should be clarified.
指令范围
SKILL.md instructs users to run a PowerShell script (src/benchmark-runner.ps1) and includes PowerShell task scripts, but the actual executable logic is index.js (Node). index.js writes files, creates temp directories, writes and executes user-supplied code (from tasks.json/tasks) by spawning child processes, and includes behavior not documented in SKILL.md. The instructions in SKILL.md do not fully describe what will be executed on the host.
安装机制
There is no install spec (instruction-only claimed), which is lower risk, but the package contains Node code that will be executed if you run it. The tool expects external runtimes (python/node/go) though no required-binaries are declared. No remote download or obscure URLs are present in the package, which reduces installer risk, but the lack of declared runtime requirements is an inconsistency.
凭证需求
The skill does not declare required environment variables, which matches registry metadata. index.js spawns processes inheriting process.env and some benchmark tasks intentionally read environment variables (task-011). That's reasonable for 'system operations' tests, but reports capture task outputs (which may include env values) and the tool will persist those outputs—so running tasks that print sensitive environment values could leak them into local reports.
持久化与权限
index.js generates reports and explicitly writes a report to a relative '../../memory/benchmark-results.md' (i.e., escapes the package directory). Writing into a 'memory' path outside the skill directory can place results into agent persistent storage; SKILL.md did not document this. The skill is not marked always:true, but this unexpected persistent write and the discrepancy between documented runner and shipped Node runner is a privilege/persistence concern and should be clarified.
index.js:89
Shell command execution detected (child_process).
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv0.1.02026/3/26

agent-benchmark v0.1.0 - Initial release of the skill. - Includes 12 standardized benchmark tasks across 5 ability dimensions. - Automated scoring system and Markdown report generation. - Supports custom task definition and analysis by category. - Provides PowerShell-based test runner and clear reporting structure.

● 可疑

安装命令 点击复制

官方npx clawhub@latest install agent-benchmark
镜像加速npx clawhub@latest install agent-benchmark --registry https://cn.clawhub-mirror.com
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务