📦 Ab Test Agent Workflow - A/B测试工作流

v1.0.0

Test Agent Workflow 1.1.0 多智能体双盲 A/B 测试工作流。对多个 AI 模型/Agent 进行多轮次、双盲对照测试。 核心角色:Coordinator、Contestant A/B、Judge。 触发场景:"A/B 测试"、"双盲测试"、"比较 AI 模型"、"模型评测"、"测试工作流"、"compare...

0· 16·0 当前·0 累计
johnsmithfan 头像by @johnsmithfan (JohnSmithfan)
下载技能包
最后更新
2026/4/19
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
该技能是一个内部一致的 A/B(双盲)多智能体测试工作流:其指令、所包含的辅助脚本以及所请求的资源均与其声明的目的保持一致,且不请求凭据或外部安装。
评估建议
This skill appears to do what it says: coordinate A/B blind comparisons using subagents, anonymize outputs, and have a judge score them. Before installing or running: 1) review the included scripts locally (anonymizer.py, judge_prompts.py, runner.py) — do not run them unexamined on sensitive data; 2) be cautious about executing any code that contestants produce (the prompts encourage contestants to output runnable code); run such code only in a secure sandbox; 3) the provided runner.py in the pa...
详细分析 ▾
用途与能力
Name/description (multi‑agent double‑blind A/B testing) matches the included artifacts: SKILL.md describes coordinator/contestant/judge roles and the repo contains runner.py, anonymizer.py and judge_prompts.py which implement that workflow. No unrelated credentials, binaries, or config paths are requested.
指令范围
SKILL.md directs the agent to spawn subagents (sessions_spawn) and to use the included scripts or inline prompts to run the workflow; it does not instruct reading arbitrary system files, harvesting env vars, or posting results to third‑party endpoints. Note: the workflow includes running/collecting model outputs and optionally running code-generation tasks — you should not automatically execute untrusted generated code without sandboxing.
安装机制
No install spec is present (instruction-only + local scripts included). No downloads, package installs, or external installers are requested.
凭证需求
The skill requests no environment variables or credentials. The code uses only standard libs and in‑memory data structures; there are no hidden credential accesses in the provided files.
持久化与权限
always is false and the skill does not request persistent platform privileges or alter other skills' configuration. The included anonymizer stores mapping in memory and exposes it via APIs/CLI for report revelation — expected for the stated purpose.
安全有层次,运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/19

ab-test-agent-workflow v1.1.0 引入结构化多智能体双盲 A/B 测试工作流,用于模型对比。 - 新增多轮双盲评估支持,通过 coordinator、contestants、judge 三种角色对两个模型/智能体进行评测。 - 展示完整工作流架构,含角色定义与通信流程。 - 提供各角色及多种任务类型(通用、代码生成)的详细 prompt 模板,确保输出标准化。 - 支持全自动(skill 模式)与脚本驱动两种执行方式。 - 附带示例报告格式、评分速查表,以及匿名化、超时、解析回退的故障排查指南。 - 新增 runner、prompt 构造/解析、输出匿名化脚本。

无害

安装命令

点击复制
官方npx clawhub@latest install ab-test-agent-workflow-1-1-0
镜像加速npx clawhub@latest install ab-test-agent-workflow-1-1-0 --registry https://cn.longxiaskill.com
数据来源ClawHub ↗ · 中文优化:龙虾技能库