📦 Ab Test Agent Workflow - A/B测试工作流
v1.0.0Test Agent Workflow 1.1.0 多智能体双盲 A/B 测试工作流。对多个 AI 模型/Agent 进行多轮次、双盲对照测试。 核心角色:Coordinator、Contestant A/B、Judge。 触发场景:"A/B 测试"、"双盲测试"、"比较 AI 模型"、"模型评测"、"测试工作流"、"compare...
0· 16·0 当前·0 累计
下载技能包
最后更新
2026/4/19
安全扫描
OpenClaw
安全
high confidence该技能是一个内部一致的 A/B(双盲)多智能体测试工作流:其指令、所包含的辅助脚本以及所请求的资源均与其声明的目的保持一致,且不请求凭据或外部安装。
评估建议
This skill appears to do what it says: coordinate A/B blind comparisons using subagents, anonymize outputs, and have a judge score them. Before installing or running: 1) review the included scripts locally (anonymizer.py, judge_prompts.py, runner.py) — do not run them unexamined on sensitive data; 2) be cautious about executing any code that contestants produce (the prompts encourage contestants to output runnable code); run such code only in a secure sandbox; 3) the provided runner.py in the pa...详细分析 ▾
✓ 用途与能力
Name/description (multi‑agent double‑blind A/B testing) matches the included artifacts: SKILL.md describes coordinator/contestant/judge roles and the repo contains runner.py, anonymizer.py and judge_prompts.py which implement that workflow. No unrelated credentials, binaries, or config paths are requested.
ℹ 指令范围
SKILL.md directs the agent to spawn subagents (sessions_spawn) and to use the included scripts or inline prompts to run the workflow; it does not instruct reading arbitrary system files, harvesting env vars, or posting results to third‑party endpoints. Note: the workflow includes running/collecting model outputs and optionally running code-generation tasks — you should not automatically execute untrusted generated code without sandboxing.
✓ 安装机制
No install spec is present (instruction-only + local scripts included). No downloads, package installs, or external installers are requested.
✓ 凭证需求
The skill requests no environment variables or credentials. The code uses only standard libs and in‑memory data structures; there are no hidden credential accesses in the provided files.
✓ 持久化与权限
always is false and the skill does not request persistent platform privileges or alter other skills' configuration. The included anonymizer stores mapping in memory and exposes it via APIs/CLI for report revelation — expected for the stated purpose.
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv1.0.02026/4/19
ab-test-agent-workflow v1.1.0 引入结构化多智能体双盲 A/B 测试工作流,用于模型对比。 - 新增多轮双盲评估支持,通过 coordinator、contestants、judge 三种角色对两个模型/智能体进行评测。 - 展示完整工作流架构,含角色定义与通信流程。 - 提供各角色及多种任务类型(通用、代码生成)的详细 prompt 模板,确保输出标准化。 - 支持全自动(skill 模式)与脚本驱动两种执行方式。 - 附带示例报告格式、评分速查表,以及匿名化、超时、解析回退的故障排查指南。 - 新增 runner、prompt 构造/解析、输出匿名化脚本。
● 无害
安装命令
点击复制官方npx clawhub@latest install ab-test-agent-workflow-1-1-0
镜像加速npx clawhub@latest install ab-test-agent-workflow-1-1-0 --registry https://cn.longxiaskill.com