首页龙虾技能列表 › Agent Eval

Agent Eval

v1.0.0

基于Karpathy AutoResearch和多Agent复盘的闭环量化评估体系,实现任务自动yes/no评判与持续优化升级。

0· 78·1 当前·1 累计
by @luaqnyin·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/10
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
The skill's instructions match an agent-evaluation purpose, but the runtime instructions require reading/writing specific local agent memory files and sending reports while the metadata declares no required config paths or credentials — an incoherence that warrants caution.
评估建议
Before installing, verify and restrict what files this skill may access and how reports are sent: (1) ask the author to declare explicit required config paths and minimal file-permissions (read-only vs write) for memory/YYYY-MM-DD.md, memory/evolution/<agent-id>.md, patterns.md, AGENTS.md, HEARTBEAT.md; (2) confirm the exact delivery mechanism for 'send to boss' (email? internal message?) and ensure it cannot exfiltrate data to arbitrary endpoints; (3) run the skill in an isolated/test environme...
详细分析 ▾
用途与能力
Name/description and the SKILL.md content consistently describe a closed-loop agent evaluation system (generate → evaluate → modify → rerun). However, the manifest declares no required config paths or environment variables while the instructions explicitly expect access to many local files (e.g., memory/YYYY-MM-DD.md, memory/evolution/<agent-id>.md, patterns.md, AGENTS.md, HEARTBEAT.md). That mismatch (declared zero file access vs. explicit file I/O in SKILL.md) is an inconsistency.
指令范围
Runtime instructions instruct the agent to read daily agent task logs and many repository files and to write evolution logs, PAT records, and patterns.md. They also say '将整体评分趋势发给老板' without specifying delivery mechanism. The file reads/writes are within the skill's evaluation purpose but the instructions are broad and partly vague about how reports are transmitted — giving the agent wide latitude to access and potentially transmit sensitive data.
安装机制
Instruction-only skill with no install steps and no code files; nothing is written to disk by an installer. This is the lowest install risk.
凭证需求
The skill requests no credentials or declared config paths, yet SKILL.md requires reading/writing multiple local files (agent memory and config-like documents). The absence of declared required config paths/permissions underrepresents the actual access the skill needs and prevents applying least-privilege controls.
持久化与权限
always:false (no forced always-on). The skill envisions scheduled daily/weekly evaluation loops and autonomous agent actions. Autonomous invocation combined with file access and vague report-sending instructions increases blast radius if misused; however, autonomous invocation alone is the platform default and not itself flagged as high risk.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/10

Initial release introducing a modular, quantifiable agent evaluation framework with self-improvement feedback loops. - Provides standardized yes/no checklists and scoring rules for diverse agent types (Content, Legal, Science, Literature, Analysis, Medical, etc.) - Establishes weighted, dimension-specific evaluation items and time-based auto-evaluation workflows (daily self-review, weekly CEO reports). - Defines clear scoring tiers with actionable triggers for optimization and tracking. - Integrates with existing memory, quality, and research systems for seamless agent evolution. - Prioritizes real-world task sets and explicit improvement cycles.

● 无害

安装命令 点击复制

官方npx clawhub@latest install agent-eval
镜像加速npx clawhub@latest install agent-eval --registry https://cn.clawhub-mirror.com
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务