📦 Skill Eval — 技能评估

v1.1.1

一站式技能评估框架,支持触发率测试、质量对比、模型对比,一键生成会话并收集历史数据,量化评估效果。

0· 174·0 当前·0 累计
xiaoxing9 头像by @xiaoxing9 (Xiaoxing9)
下载技能包
最后更新
2026/4/21
0
安全扫描
VirusTotal
可疑
查看报告
OpenClaw
安全
high confidence
The skill appears to do what it claims (an agent-driven evaluation framework) and does not request extra credentials or remote installs, but it reads your OpenClaw config, spawns subagents that inherit the agent environment, and persists full session histories — so review usage and storage implications before running on sensitive data.
评估建议
This skill is internally consistent with its stated purpose, but take these precautions before running it: - Review SKILL.md and the bundled scripts (especially anything that writes files) so you understand what will be stored under eval-workspace/. - Don't run evaluations against skills or prompts that will surface sensitive credentials or personal data; persisted histories include tool calls and tool results and may capture secrets. - Because the workflow uses sandbox="inherit" and cleanup="k...
详细分析 ▾
用途与能力
Name/description match the included files and runtime instructions: the repository contains resolver and analysis scripts, example evals, and a SKILL.md that instructs the agent to spawn subagents and run local Python analysis. The requested actions (reading skill paths, running trigger/quality/model workflows, writing per-iteration workspaces) are coherent with an evaluation framework.
指令范围
Runtime instructions explicitly tell the agent to read ~/.openclaw/openclaw.json to locate skill directories, call sessions_spawn and sessions_history, run local Python scripts via exec, and write full evaluation data to eval-workspace/. Those actions are expected for an eval tool, but they grant the skill access to user config and full conversation histories (including tool calls/results).
安装机制
No install spec is present (instruction-only skill). Scripts are bundled in the repo and meant to be run locally via exec. There are no remote downloads or installers referenced in SKILL.md; requirements.txt lists requests but analysis scripts are documented as offline. This is low install risk.
凭证需求
The skill declares no required env vars or credentials (good). However, it reads ~/.openclaw/openclaw.json and requires subagents to use sandbox="inherit" in spawn calls, which means the spawned sessions may inherit the main agent's registration environment/skill context. While not an explicit credential request, this can expose the same runtime environment to subagents — the behavior is explainable by the tool's purpose but worth noting.
持久化与权限
Workflows require cleanup="keep" and saving full_history.json / raw transcripts to eval-workspace/<skill>/iter-N/. Persisting full session histories (tool_use + tool_result) can retain sensitive data (API keys, tokens, user-provided secrets) if any eval touches them. Combined with sandbox="inherit", retained histories may contain environment-derived data. This is expected for an evaluation tool but represents a real privacy/storage risk that users must manage.
scripts/analyze_latency.py:219
Dynamic code execution detected.
scripts/analyze_model_compare.py:330
Dynamic code execution detected.
scripts/analyze_quality.py:210
Dynamic code execution detected.
scripts/analyze_triggers.py:243
Dynamic code execution detected.
scripts/build_evals_with_context.py:89
Dynamic code execution detected.
scripts/legacy/run_compare.py:91
Dynamic code execution detected.
scripts/legacy/run_diagnostics.py:605
Dynamic code execution detected.
scripts/legacy/run_latency_profile.py:495
Dynamic code execution detected.
安全有层次,运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.1.12026/3/19

Security: Add runtime actions disclosure, change fake-tool setup to manual (no auto gateway restart or skill install).

可疑

安装命令

点击复制
官方npx clawhub@latest install openclaw-skill-eval
镜像加速npx clawhub@latest install openclaw-skill-eval --registry https://cn.longxiaskill.com
数据来源ClawHub ↗ · 中文优化:龙虾技能库