📦 Skill Eval — 技能评估

v1.1.1

一站式技能评估框架，支持触发率测试、质量对比、模型对比，一键生成会话并收集历史数据，量化评估效果。

0· 174·0 当前·0 累计

by @xiaoxing9 (Xiaoxing9)

测试工具 AI模型访问数据分析自动化

下载技能包

最后更新

2026/4/21

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

安全

high confidence

The skill appears to do what it claims (an agent-driven evaluation framework) and does not request extra credentials or remote installs, but it reads your OpenClaw config, spawns subagents that inherit the agent environment, and persists full session histories — so review usage and storage implications before running on sensitive data.

评估建议

This skill is internally consistent with its stated purpose, but take these precautions before running it: - Review SKILL.md and the bundled scripts (especially anything that writes files) so you understand what will be stored under eval-workspace/. - Don't run evaluations against skills or prompts that will surface sensitive credentials or personal data; persisted histories include tool calls and tool results and may capture secrets. - Because the workflow uses sandbox="inherit" and cleanup="k...

详细分析 ▾

✓ 用途与能力

Name/description match the included files and runtime instructions: the repository contains resolver and analysis scripts, example evals, and a SKILL.md that instructs the agent to spawn subagents and run local Python analysis. The requested actions (reading skill paths, running trigger/quality/model workflows, writing per-iteration workspaces) are coherent with an evaluation framework.

ℹ 指令范围

Runtime instructions explicitly tell the agent to read ~/.openclaw/openclaw.json to locate skill directories, call sessions_spawn and sessions_history, run local Python scripts via exec, and write full evaluation data to eval-workspace/. Those actions are expected for an eval tool, but they grant the skill access to user config and full conversation histories (including tool calls/results).

✓ 安装机制

No install spec is present (instruction-only skill). Scripts are bundled in the repo and meant to be run locally via exec. There are no remote downloads or installers referenced in SKILL.md; requirements.txt lists requests but analysis scripts are documented as offline. This is low install risk.

ℹ 凭证需求

The skill declares no required env vars or credentials (good). However, it reads ~/.openclaw/openclaw.json and requires subagents to use sandbox="inherit" in spawn calls, which means the spawned sessions may inherit the main agent's registration environment/skill context. While not an explicit credential request, this can expose the same runtime environment to subagents — the behavior is explainable by the tool's purpose but worth noting.

⚠ 持久化与权限

Workflows require cleanup="keep" and saving full_history.json / raw transcripts to eval-workspace/<skill>/iter-N/. Persisting full session histories (tool_use + tool_result) can retain sensitive data (API keys, tokens, user-provided secrets) if any eval touches them. Combined with sandbox="inherit", retained histories may contain environment-derived data. This is expected for an evaluation tool but represents a real privacy/storage risk that users must manage.

⚠ scripts/analyze_latency.py:219