Adversary Review — 对抗审查

Name: Adversary Review — 对抗审查
Rating: 1 (2 reviews)
Author: KeaneYan

KeaneYan

Adversary Review — 对抗审查

v1.2.0

每次生成回复后，自动调用第二个AI实例对草稿进行质疑与审查，确保输出质量与安全性，支持本地模型保护敏感数据。

2· 73·0 当前·0 累计

by @keaneyan (KeaneYan)·MIT-0

安全 AI模型访问智能体自动化测试工具

下载技能包

License

MIT-0

最后更新

2026/4/2

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

medium confidence

该技能的指令与其声明目的（运行次级对抗审查者）一致，但会将草稿文本发送至当前配置的模型提供商，并对隐私/持久性做出未强制或充分说明的假设。

评估建议

该技能功能如描述所示，但安装前请：(1) 确认平台 sessions_spawn 行为——仅发送草稿还是附带额外上下文；(2) 若处理敏感内容，优先配置本地审查模型（SKILL.md 建议）或确保模型提供商的数据保留策略符合需求；(3) 为“敏感草稿”设定明确检测规则，避免模糊判断导致意外泄露；(4) 预期每次响应都会增加延迟与 token 成本；(5) 先在低风险环境测试，验证不会泄露对话历史或持久化审查交换。...

详细分析 ▾

✓ 用途与能力

名称、描述与运行指令均聚焦于让第二个 AI 审查草稿；未请求无关二进制、环境变量或安装。所用资源（模型 API）与目标成正比。

ℹ 指令范围

指令明确告知代理生成审查子代理并将草稿发送至配置的模型 API。逻辑一致，但 SKILL.md 宣称“仅草稿文本（非完整对话历史）被共享”“审查交换不持久”“敏感草稿自动跳过”而未说明实现方式。检测“敏感数据”的策略交由代理自行判断，模糊且可能误用导致数据暴露。sessions_spawn 机制可能因平台行为附带更多上下文，技能未记录如何强制最小上下文。

✓ 安装机制

仅含指令，无安装规范或代码文件；不写入磁盘，无需外部下载，降低安装时风险。

ℹ 凭证需求

技能未声明所需环境变量或凭证，符合预期。但会隐式使用代理配置的模型提供商凭证（调用 sessions_spawn / 提供商 API）。该隐式使用对审查者属正常，但用户需知悉会产生额外 API 调用并将草稿文本传输至提供商。

✓ 持久化与权限

技能未设为常驻，不请求提升或持久系统权限，也不修改其他技能配置。允许自主调用（平台默认）但此处并不过度。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.2.02026/4/2

修复安全扫描：移除虚假的“本地”声明，诚实披露数据流，为注重隐私的用户增加本地模型选项

● 无害

安装命令点击复制

官方npx clawhub@latest install adversary-review

镜像加速npx clawhub@latest install adversary-review --registry https://cn.clawhub-mirror.com

技能文档

所有对用户的回复在发出前都必须通过对抗性审查。

何时使用

每一次对用户的回复，无一例外。这是一个强制质量关卡。

工作原理

先按常规起草回复（但不要发送），然后生成一个带有批判性审查者人格的子智能体。该子智能体检查草稿中的问题，你再根据其反馈修改。

步骤 1：起草回复

照常撰写回复。暂勿发送。

步骤 2：生成审查子智能体

使用 sessions_spawn，设置 mode="run"，并选用更便宜的模型（如 qwen3.5-plus、gpt-4o-mini）以降低成本。审查者无需完整工具集——它只阅读文本并输出批评意见。

注意： 这会把草稿文本发送到已配置的模型提供商 API。若使用云端模型，数据将传输到提供商服务器。若处理敏感内容，请查阅提供商的数据政策。要实现纯本地审查，请配置本地模型（如 ollama、LM Studio）。

使用以下提示模板生成：

`` You are an adversarial reviewer. Examine the following draft response critically. Draft: """ [YOUR DRAFT HERE] """ Check for: 1. Factual errors or unsubstantiated claims 2. Missing important caveats or edge cases 3. Logical contradictions 4. Tone issues (too apologetic, too confident, dismissive, etc.) 5. Missing follow-up suggestions that would be valuable 6. Overly verbose sections that could be trimmed 7. Any advice that could backfire or cause problems Respond with exactly one of: - PASS + one-line reason why it's fine - List of specific issues, each with: - Where the problem is - Why it's a problem - Suggested fix Be harsh. Be picky. Better to over-catch than to miss. You are the quality gate. `

`步骤 3：采纳反馈`

子智能体说 PASS → 原样发出草稿


子智能体提出有效问题 → 修改草稿，再发出改进版  
子智能体明显错误 → 相信自己的判断，发出你的版本

`步骤 4：发出`

若审查导致实质性改动，可简要说明改进（如“审查发现 X，已修正 Y”）。仅微调则无需提及。专注于交付最佳结果。

`隐私与安全`

草稿文本通过已配置的模型 API 发送至第二个 AI 模型实例。若使用云端模型（如 qwen3.5-plus、gpt-4o-mini），数据将传输到提供商服务器。要实现纯本地审查，请使用本地模型提供商（如 ollama、LM Studio）。


仅共享草稿文本（不含完整对话历史）。  
若草稿含敏感数据（PII、凭据等），智能体应自动跳过审查步骤。  
审查交互不会在当前智能体会话之外留存。

`例外情况`

以下情况无需审查：

HEARTBEAT_OK`
系统级确认（工具结果、NO_REPLY）
纯粹机械确认，不含任何主观内容

为何重要

LLM 输出可能包含细微错误、缺失上下文或语气问题，创作者视角容易忽略。第二双“对抗性眼睛”能在问题抵达用户前捕获它们。这相当于智能体界的代码审查。注意：此审查步骤会增加每次回复的延迟与 token 消耗。

技术细节

无需特殊配置。要禁用审查，卸载此 skill 即可。

Every response to the user must pass adversarial review before delivery.

When to Use

EVERY response to the user, without exception. This is a mandatory quality gate.

How It Works

After drafting your response (but before sending it), spawn a sub-agent with a critical reviewer persona. The sub-agent examines your draft for problems, then you revise based on its feedback.

Step 1: Draft Your Response

Write your response as you normally would. Do not send it yet.

Step 2: Spawn Reviewer Sub-Agent

Use sessions_spawn with mode="run" and a cheaper model (e.g. qwen3.5-plus, gpt-4o-mini) to minimize cost. The reviewer does not need the full tool suite — it only reads text and outputs critique.

Note: This sends your draft text to the configured model provider's API. If you use cloud-based models, data will be sent to the provider's servers. Check your provider's data policy if handling sensitive content. For truly local review, configure a local model (e.g. ollama, LM Studio).

Spawn with this prompt template:

You are an adversarial reviewer. Examine the following draft response critically. Draft: """ [YOUR DRAFT HERE] """ Check for: Factual errors or unsubstantiated claims Missing important caveats or edge cases Logical contradictions Tone issues (too apologetic, too confident, dismissive, etc.) Missing follow-up suggestions that would be valuable Overly verbose sections that could be trimmed Any advice that could backfire or cause problems Respond with exactly one of: PASS + one-line reason why it's fine List of specific issues, each with: - Where the problem is - Why it's a problem - Suggested fix

Be harsh. Be picky. Better to over-catch than to miss. You are the quality gate.

Step 3: Apply Feedback

Sub-agent says PASS → deliver your draft as-is
Sub-agent raises valid points → revise your draft, then deliver the improved version
Sub-agent is clearly wrong → trust your own judgment, deliver your version

Step 4: Deliver

When the review leads to substantive changes, briefly note the improvement (e.g. "Review caught X, fixed Y"). For minor edits, no need to mention. Focus on delivering the best result.

Privacy & Safety

The draft text is sent to a second AI model instance via the configured model API. If you use cloud-based models (e.g. qwen3.5-plus, gpt-4o-mini), this will send data to the provider's servers. For local-only review, use a local model provider (e.g. ollama, LM Studio).
Only the draft text (not full conversation history) is shared with the reviewer.
If the draft contains sensitive data (PII, credentials, etc.), the agent should skip the review step automatically.
Review exchanges are not persisted beyond the current agent session.

Exceptions

These situations do NOT need review:

HEARTBEAT_OK
System-level acks (tool results, NO_REPLY)
Purely mechanical confirmations with zero opinion content

Why This Matters

LLM outputs can contain subtle errors, missing context, or tone issues that are easy to miss from the creator's perspective. A second "pair of eyes" that is explicitly adversarial catches problems before they reach the user. This is the agent equivalent of code review.

Note: This review step adds latency and token usage per response.

Technical Details

No special configuration needed. To disable review, uninstall this skill.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

何时使用

工作原理

步骤 1：起草回复

步骤 2：生成审查子智能体

步骤 3：采纳反馈

步骤 4：发出

隐私与安全

例外情况

为何重要

技术细节

When to Use

How It Works

Step 1: Draft Your Response

Step 2: Spawn Reviewer Sub-Agent

Step 3: Apply Feedback

Step 4: Deliver

Privacy & Safety

Exceptions

Why This Matters

Technical Details

安装命令点击复制

`步骤 3：采纳反馈`

`步骤 4：发出`

`隐私与安全`

`例外情况`