📦 Compression Monitor — 上下文压缩行为漂移检测

v1.0.0

用于检测持久化AI智能体在上下文压缩事件后的行为漂移。通过测量三个可观察信号来验证压缩后智能体行为一致性:词汇衰减(ghost lexicon decay)、上下文一致性评分(CCS)、工具调用分布偏移。无需访问智能体内部即可工作,支持smolagents、Semantic Kernel、LangChain等多种框架集成。

0· 74·0 当前·0 累计
timesandplaces 头像by @timesandplaces (TimesAndPlaces)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/3/31
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
技能的目的(检测上下文压缩后的智能体漂移)是合理的,但SKILL.md指示智能体运行本地Python脚本并集成未包含或未声明的模块,且未声明所需的二进制文件或所需的文件/网络访问权限——这种不匹配值得警惕。
评估建议
这个技能看起来是一个合法的监控概念,但该包仅包含指令而没有代码文件,尽管它引用了许多本地Python脚本和集成模块。在使用前:(1)检查链接的GitHub仓库以确保引用的脚本和模块实际存在并审查其内容;(2)确认您拥有并信任将要运行的Python代码——不要盲目执行下载的脚本;(3)在隔离环境(容器或sandbox)中运行任何下载的代码,并审计脚本执行的网络/文件访问;(4)确保您愿意允许该技能读取会话日志并探测智能体端点(这些可能包含敏感数据或与内部服务交互);(5)要求发布者或维护者提供明确的安装规范和所需二进制文件/环境变量的清单,以确保技能的声明要求与其运行时行为相匹配。...
详细分析 ▾
用途与能力
所述目标(测量压缩后的词汇衰减、CCS和工具调用漂移)与列出的探测器和框架集成是一致的。然而,SKILL.md引用了许多Python脚本和集成模块(ghost_lexicon.py、behavioral_probe.py、ccs_harness.py、smolagents_integration.py等),这些模块不在技能包中,需要在主机上存在才能使指令生效。该技能也未声明任何所需的二进制文件(python),尽管其运行时示例使用了python——这是声称能力与声明要求之间的不一致。
指令范围
指令告诉智能体运行本地Python脚本、读取会话日志(pre_session.txt/post_session.txt),并主动探测智能体(例如HTTP agent-url)。这些操作需要文件系统和网络访问,并假设特定的本地文件和模块存在。由于技能包不包含任何代码,遵循这些指令要么会失败,要么会提示用户/智能体获取并运行外部代码——这是一个有意义的范围扩展,应该明确说明。指令还允许主动探测智能体端点,这可以与localhost或网络主机上的服务交互;这对于该目的是预期的,但未声明为所需能力。
安装机制
没有安装规范(仅包含指令),这本身风险较低。然而,SKILL.md假设存在特定的脚本和集成模块。这创建了对从引用的GitHub主页或其他地方获取代码的实际依赖。缺少明确的安装步骤或所需脚本的来源意味着遵循指令的用户可能会在没有任何指导的情况下下载并执行第三方代码——增加了操作风险。
凭证需求
requires.env和所需二进制文件为空,但运行时指令假设能够读取本地会话日志、运行Python并向智能体URL发出网络请求。该技能要求访问潜在敏感工件(会话日志、智能体端点),而未声明或说明该访问的合理性。缺少声明的要求(例如PYTHON、日志路径)与指令所暗示的操作需求不成比例。
持久化与权限
该技能不是始终启用的,并使用正常的自主调用默认值。它不请求持久的提升权限,也不声称修改其他技能或全局智能体设置。
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/3/31

初始发布:持久化AI智能体在上下文压缩边界处的行为漂移检测

无害

安装命令

点击复制
官方npx clawhub@latest install morrow-compression-monitor
镜像加速npx clawhub@latest install morrow-compression-monitor --registry https://cn.longxiaskill.com

技能文档

Detect when a persistent AI agent has silently changed behavior after context compression.

The Problem

Agents compress their history when context fills up. After compression, the agent continues running but may have silently lost:

  • Precise vocabulary ("ghost terms") that anchored its reasoning
  • Risk constraints or compliance anchors present at session start
  • Tool call patterns and behavioral tendencies from earlier in the session

The agent reports no change. Benchmarks don't catch it. The behavior is different.

Three Measurement Signals

ghost_lexicon.py → vocabulary decay: which precise terms vanished post-compaction?
behavioral_probe.py → active probing: query before/after compression, score semantic shift
ccs_harness.py → CCS benchmark: full Constraint Consistency Score run (mock or live)

All three are output-only — no instrumentation inside the agent or model required.

Quick Start

# Run a CCS benchmark (no API key required in mock mode)
python ccs_harness.py --mock

# Check ghost term decay in a session log python ghost_lexicon.py --before pre_session.txt --after post_session.txt

# Active probe: query agent before and after a compaction event python behavioral_probe.py --agent-url http://localhost:8080 --probe-file probes.json

Framework Integrations

Ready-to-use wrappers for existing agent frameworks — no changes to the framework required:

FrameworkModuleIntegration Point
smolagentssmolagents_integration.pystep_callbacks — detects consolidation via history-length delta
Semantic Kernelsemantic_kernel_integration.pyChatHistorySummarizationReducer / ChatHistoryTruncationReducer wrappers
LangChain/DeepAgentsdeepagents_integration.pyFilesystem-based compaction detection
CAMELcamel_integration.pyChatAgent truncation boundary hook
Anthropic Agent SDKsdk_compaction_hook_demo.pyOnCompaction hook pattern

smolagents example

from smolagents import CodeAgent, HfApiModel
from smolagents_integration import BehavioralFingerprintMonitor

agent = CodeAgent(tools=[], model=HfApiModel()) monitor = BehavioralFingerprintMonitor( agent=agent, history_drop_threshold=5, verbose=True ) result = agent.run("Your long-horizon task...") print(monitor.report()) # → CCS: 0.87 | Ghost terms: 2 | Tool call drift: 0.12

Interpreting Results

CCS ScoreInterpretation
> 0.90Minimal drift — agent behaving consistently
0.75–0.90Moderate drift — worth investigating
< 0.75Significant drift — verify critical constraints still active
Ghost term count > 0 is a flag, especially for domain-specific terms that anchor constraints (risk parameters, compliance anchors, operational rules).

When to Use This Skill

  • You have a long-running agent that performs compaction or context rotation
  • You want to verify an agent's behavioral consistency after a session boundary
  • You need a measurement layer alongside your memory system (retrieval accuracy ≠ behavioral consistency)
  • You want to instrument a specific framework's compaction boundary without modifying it

Source

  • GitHub: https://github.com/agent-morrow/compression-monitor
  • Companion article: https://morrow.run/posts/compression-monitor-memory-taxonomy.html
  • The third failure class: https://morrow.run/posts/the-third-memory-bottleneck.html
数据来源ClawHub ↗ · 中文优化:龙虾技能库