Ab Test Agent Workflow - A/B测试工作流

v1.0.0

Test Agent Workflow 1.1.0 多智能体双盲 A/B 测试工作流。对多个 AI 模型/Agent 进行多轮次、双盲对照测试。核心角色：Coordinator、Contestant A/B、Judge。触发场景："A/B 测试"、"双盲测试"、"比较 AI 模型"、"模型评测"、"测试工作流"、"compare...

0· 16·0 当前·0 累计

by @johnsmithfan (JohnSmithfan)

自动化测试工具智能体工作流

下载技能包

最后更新

2026/4/19

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能是一个内部一致的 A/B（双盲）多智能体测试工作流：其指令、所包含的辅助脚本以及所请求的资源均与其声明的目的保持一致，且不请求凭据或外部安装。

评估建议

This skill appears to do what it says: coordinate A/B blind comparisons using subagents, anonymize outputs, and have a judge score them. Before installing or running: 1) review the included scripts locally (anonymizer.py, judge_prompts.py, runner.py) — do not run them unexamined on sensitive data; 2) be cautious about executing any code that contestants produce (the prompts encourage contestants to output runnable code); run such code only in a secure sandbox; 3) the provided runner.py in the pa...

详细分析 ▾

✓ 用途与能力

Name/description (multi‑agent double‑blind A/B testing) matches the included artifacts: SKILL.md describes coordinator/contestant/judge roles and the repo contains runner.py, anonymizer.py and judge_prompts.py which implement that workflow. No unrelated credentials, binaries, or config paths are requested.

ℹ 指令范围

SKILL.md directs the agent to spawn subagents (sessions_spawn) and to use the included scripts or inline prompts to run the workflow; it does not instruct reading arbitrary system files, harvesting env vars, or posting results to third‑party endpoints. Note: the workflow includes running/collecting model outputs and optionally running code-generation tasks — you should not automatically execute untrusted generated code without sandboxing.

✓ 安装机制

No install spec is present (instruction-only + local scripts included). No downloads, package installs, or external installers are requested.

✓ 凭证需求

The skill requests no environment variables or credentials. The code uses only standard libs and in‑memory data structures; there are no hidden credential accesses in the provided files.

✓ 持久化与权限

always is false and the skill does not request persistent platform privileges or alter other skills' configuration. The included anonymizer stores mapping in memory and exposes it via APIs/CLI for report revelation — expected for the stated purpose.

安全有层次，运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/19

ab-test-agent-workflow v1.1.0 引入结构化多智能体双盲 A/B 测试工作流，用于模型对比。 - 新增多轮双盲评估支持，通过 coordinator、contestants、judge 三种角色对两个模型/智能体进行评测。 - 展示完整工作流架构，含角色定义与通信流程。 - 提供各角色及多种任务类型（通用、代码生成）的详细 prompt 模板，确保输出标准化。 - 支持全自动（skill 模式）与脚本驱动两种执行方式。 - 附带示例报告格式、评分速查表，以及匿名化、超时、解析回退的故障排查指南。 - 新增 runner、prompt 构造/解析、输出匿名化脚本。

● 无害

安装命令

点击复制

官方npx clawhub@latest install ab-test-agent-workflow-1-1-0

镜像加速npx clawhub@latest install ab-test-agent-workflow-1-1-0 --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

运行时依赖

版本

安装命令

相关技能推荐