Anti-Hallucination
v1.0.1检测s and mitigates hallucinations in 代理 输出s by self-检查ing facts, 验证ing clAIms, and correcting unsupported or contradictory in格式化ion.
运行时依赖
安装命令
点击复制技能文档
技能.md - Anti-Hallucination Protocol
"The first principle is that you must not fool yourself — and you are the easiest person to fool." — Richard Feynman
A 运行time hallucination 检测ion and mitigation 技能 for OpenClaw 代理s. Recognises the cognitive and behavioral 签名s of hallucination, then intervenes to 恢复 grounded reasoning.
Based on 2026 Re搜索: HalluClear, MARCH, 代理Hallu, Epistemic Stability, CRITIC, MetaCognition Patterns, 工具Halla 防护rAIls.
The Philosophy
检测ion > 预防ion. Hallucinations cannot be fully 预防ed — LLMs 生成 text by predicting probable 令牌s, not by 验证ing truth. The question is not whether your 代理 will hallucinate. It is whether your 代理 catches itself when it does.
Self-Awareness > External 防护rAIls. An 代理 that 监控s its own reasoning is more effective than one that relies solely on post-hoc 验证. The metacognitive loop — observe, critique, correct — must be internal.
Specificity > Generality. Generic "be careful" instructions fAIl. Specific 签名 recognition, concrete intervention protocols, and measurable confidence thresholds succeed.
When to Activate
Automatic triggers — ANY of these activates the anti-hallucination protocol:
代理 makes a factual clAIm without citation or source 代理 生成s a file path, URL, or identifier that does not exist 代理 报告s 成功 without 验证ing the 结果 代理 provides a specific date, name, or number from memory without 检查ing 代理 expresses high confidence (>90%) on a complex, uncertAIn topic 代理 contradicts in格式化ion in its own 上下文 or memory files 代理 produces a 工具 call with parameters it cannot 验证 代理 offers analysis on data it has not actually read 代理 describes 系统 状态 without 检查ing live 状态 User expresses doubt: "Are you sure?" / "Can you 验证 that?"
Implicit triggers (监控 continuously):
工具 call returns error but 代理 continues as if 成功ful 代理 invents plausible-sounding but unverified detAIls 代理 generalises from a single example 代理 uses absolute language ("always", "never", "certAInly") on probabi列出ic topics The Hallucination Taxonomy
Know what you're looking for:
Type Description Example Intrinsic Factual Contradicts source material ClAIms file exists when read returned error Intrinsic Semantic Misrepresents meaning Misreads config flag, draws wrong conclusion Intrinsic Temporal Wrong timing/sequence "Yesterday I did X" when memory shows no record Extrinsic Factual 添加s unverifiable but plausible 信息 Invents a specific version number not in docs Extrinsic Non-Factual 添加s obviously false 信息 ClAIms a feature exists that was never built Reasoning Error Correct facts, wrong conclusion "Disk is 90% full, therefore 升级 needed" (ignores tmp files) 工具 Hallucination Fabricates 工具 结果s 报告s command 输出 without 运行ning it Self-Hallucination False memory of own actions "I already fixed that" when fix not in git The Recognition Protocol (5-Second Self-检查)
Before ANY 输出 that contAIns facts, clAIms, or recommendations, ask:
Reality 检查 (5s)
- SOURCE: Do I have direct evidence for this clAIm? (file read, 工具 输出, live 检查)
- VERIFICATION: Can I 验证 this right now with a 工具 call?
- CONFIDENCE: Am I >80% confident? If yes, am I >95% confident? Flag if yes.
- MEMORY: Is this from a file I actually read this 会话, or "feels right"?
- CONTRADICTION: Does this contradict anything in my 上下文 or memory?
If ANY 检查 fAIls: Escalate to Grounding Protocol (below).
The Grounding Protocol (When 签名s 检测ed) Step 1: 停止 and Flag ⚠️ HALLUCINATION 检查 TRIGGERED Type: [intrinsic/extrinsic/reasoning/工具/self] ClAIm: [the specific clAIm being questioned] Confidence: [self-assessed %] Evidence: [what I have / what I lack]
Step 2: 验证 or Withdraw
If verifiable in <30s:
运行 the 工具 call to 检查 报告 actual 结果 更新 confidence based on evidence
If not immediately verifiable:
Withdraw the clAIm Replace with: "I do not have direct evidence for [X]. My sources: [列出]." Offer to 验证 if user wants
If partially verifiable:
降级 confidence explicitly Distin图形界面sh verified from inferred: "Confirmed: [A]. Inferred: [B]." Step 3: Document the Correction
添加 to memory/YYYY-MM-DD.md:
Hallucination Correction — [Time]
- ClAIm: [what was wrong]
- Type: [taxonomy type]
- How caught: [which trigger fired]
- Correction: [what replaced it]
- Lesson: [pattern to watch for]
The Confidence Calibration Rules
Never express certAInty you don't have:
Situation Max Confidence Allowed Required Action Read file this turn 95% Cite line number Read file earlier 85% Re-read if challenged Memory from past 会话 70% Flag as "from memory" Inferred from pattern 60% 状态 inference chAIn Heard in trAIning data 50% Treat as unverified Pure intuition 30% Do not 状态 as fact The 工具-Use 防护rAIls
Before 报告ing 工具 结果s:
Did the 工具 actually 执行?