首页openclaw插件 › Turboquant — TurboQuant — 上下文压缩

代码插件 安全

Turboquant — TurboQuant — 上下文压缩

v1.0.0

受Google TurboQuant启发的OpenClaw上下文压缩插件,将对话分为热缓存(最近N轮)和冷缓存(更早内容压缩至25%),长会话减少40-70%令牌。

0· 0·0 当前
by @boehner·MIT
下载插件包 项目主页
License
MIT
最后更新
2026/4/7
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
该插件的代码、指令和元数据内部一致:它实现了与声明目的匹配的网关提取压缩钩子,不请求凭证或危险安装。
安全有层次,运行前请审查代码。

License

MIT

可自由使用、修改和再分发,需保留版权声明。

版本

latestv1.0.02026/4/7
● 无害

安装命令 点击复制

官方npx clawhub@latest install openclaw-turboquant
镜像加速npx clawhub@latest install openclaw-turboquant --registry https://cn.clawhub-mirror.com

插件文档

TurboQuant — Context Compression Plugin for OpenClaw

Inspired by Google's TurboQuant (ICLR 2026), this plugin brings the same hot/cold cache compression principle to OpenClaw at the application layer.

What it does

Every time you send a message to an AI, the entire conversation history goes along with it. The longer the session, the more tokens you're burning — and tokens cost money.

TurboQuant splits your conversation into two zones:

  • Hot cache — the last N turns, kept verbatim (full fidelity)
  • Cold cache — everything older, compressed to ~25% of original size

Net result: 40–70% fewer tokens sent on long sessions. Same quality, lower cost, faster responses.

Install

  1. Clone into your OpenClaw extensions folder:
git clone https://github.com/Boehner/openclaw-turboquant ~/.openclaw/extensions/turboquant
  1. Add to your openclaw.json:
{
  "plugins": {
    "allow": ["turboquant"],
    "load": {
      "paths": ["/path/to/.openclaw/extensions/turboquant"]
    },
    "entries": {
      "turboquant": {
        "enabled": true,
        "config": {
          "keepRecentTurns": 6,
          "compressionRatio": 0.25,
          "minTurnsBeforeCompression": 10
        }
      }
    }
  }
}
  1. Restart the OpenClaw gateway.

Configuration

OptionDefaultDescription
`enabled``true`Enable/disable the plugin
`keepRecentTurns``6`Number of recent turns to keep uncompressed (hot cache)
`compressionRatio``0.25`Target size for compressed turns (0.25 = 25% of original)
`minTurnsBeforeCompression``10`Don't compress until conversation has this many turns

How it works

Uses extractive summarization — scores every sentence by information density (term frequency × position), keeps the highest-value sentences, drops the rest. No AI calls needed for compression — it's fast, deterministic, and free.

The algorithm mirrors TurboQuant's core insight: not all context is equally important. Recent turns matter most. Old turns can be compressed aggressively without hurting response quality.

Expected savings

On a 30-turn conversation:

  • Without TurboQuant: ~3,200 tokens of history sent per request
  • With TurboQuant: ~1,100 tokens (hot: 6 turns verbatim, cold: 24 turns at 25%)
  • Savings: ~2,100 tokens per request (~66%)

License

MIT

数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务