📦 Skill Eval — 技能评估
v1.1.1一站式技能评估框架,支持触发率测试、质量对比、模型对比,一键生成会话并收集历史数据,量化评估效果。
0· 174·0 当前·0 累计
下载技能包
最后更新
2026/4/21
安全扫描
OpenClaw
安全
high confidenceThe skill appears to do what it claims (an agent-driven evaluation framework) and does not request extra credentials or remote installs, but it reads your OpenClaw config, spawns subagents that inherit the agent environment, and persists full session histories — so review usage and storage implications before running on sensitive data.
评估建议
This skill is internally consistent with its stated purpose, but take these precautions before running it:
- Review SKILL.md and the bundled scripts (especially anything that writes files) so you understand what will be stored under eval-workspace/.
- Don't run evaluations against skills or prompts that will surface sensitive credentials or personal data; persisted histories include tool calls and tool results and may capture secrets.
- Because the workflow uses sandbox="inherit" and cleanup="k...详细分析 ▾
✓ 用途与能力
Name/description match the included files and runtime instructions: the repository contains resolver and analysis scripts, example evals, and a SKILL.md that instructs the agent to spawn subagents and run local Python analysis. The requested actions (reading skill paths, running trigger/quality/model workflows, writing per-iteration workspaces) are coherent with an evaluation framework.
ℹ 指令范围
Runtime instructions explicitly tell the agent to read ~/.openclaw/openclaw.json to locate skill directories, call sessions_spawn and sessions_history, run local Python scripts via exec, and write full evaluation data to eval-workspace/. Those actions are expected for an eval tool, but they grant the skill access to user config and full conversation histories (including tool calls/results).
✓ 安装机制
No install spec is present (instruction-only skill). Scripts are bundled in the repo and meant to be run locally via exec. There are no remote downloads or installers referenced in SKILL.md; requirements.txt lists requests but analysis scripts are documented as offline. This is low install risk.
ℹ 凭证需求
The skill declares no required env vars or credentials (good). However, it reads ~/.openclaw/openclaw.json and requires subagents to use sandbox="inherit" in spawn calls, which means the spawned sessions may inherit the main agent's registration environment/skill context. While not an explicit credential request, this can expose the same runtime environment to subagents — the behavior is explainable by the tool's purpose but worth noting.
⚠ 持久化与权限
Workflows require cleanup="keep" and saving full_history.json / raw transcripts to eval-workspace/<skill>/iter-N/. Persisting full session histories (tool_use + tool_result) can retain sensitive data (API keys, tokens, user-provided secrets) if any eval touches them. Combined with sandbox="inherit", retained histories may contain environment-derived data. This is expected for an evaluation tool but represents a real privacy/storage risk that users must manage.
⚠ scripts/analyze_latency.py:219
Dynamic code execution detected.
⚠ scripts/analyze_model_compare.py:330
Dynamic code execution detected.
⚠ scripts/analyze_quality.py:210
Dynamic code execution detected.
⚠ scripts/analyze_triggers.py:243
Dynamic code execution detected.
⚠ scripts/build_evals_with_context.py:89
Dynamic code execution detected.
⚠ scripts/legacy/run_compare.py:91
Dynamic code execution detected.
⚠ scripts/legacy/run_diagnostics.py:605
Dynamic code execution detected.
⚠ scripts/legacy/run_latency_profile.py:495
Dynamic code execution detected.
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv1.1.12026/3/19
Security: Add runtime actions disclosure, change fake-tool setup to manual (no auto gateway restart or skill install).
● 可疑
安装命令
点击复制官方npx clawhub@latest install openclaw-skill-eval
镜像加速npx clawhub@latest install openclaw-skill-eval --registry https://cn.longxiaskill.com