Turboquant — TurboQuant — 上下文压缩

Name: Turboquant — TurboQuant — 上下文压缩
Author: boehner

boehner

代码插件安全

Turboquant — TurboQuant — 上下文压缩

v1.0.0

受Google TurboQuant启发的OpenClaw上下文压缩插件，将对话分为热缓存（最近N轮）和冷缓存（更早内容压缩至25%），长会话减少40-70%令牌。

0· 0·0 当前

by @boehner·MIT

AI模型访问开发工具自动化

下载插件包项目主页

License

MIT

最后更新

2026/4/7

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该插件的代码、指令和元数据内部一致：它实现了与声明目的匹配的网关提取压缩钩子，不请求凭证或危险安装。

安全有层次，运行前请审查代码。

License

MIT

查看条款 ↗

版本

latestv1.0.02026/4/7

● 无害

安装命令点击复制

官方npx clawhub@latest install openclaw-turboquant

镜像加速npx clawhub@latest install openclaw-turboquant --registry https://cn.clawhub-mirror.com

插件文档

TurboQuant — Context Compression Plugin for OpenClaw

Inspired by Google's TurboQuant (ICLR 2026), this plugin brings the same hot/cold cache compression principle to OpenClaw at the application layer.

What it does

Every time you send a message to an AI, the entire conversation history goes along with it. The longer the session, the more tokens you're burning — and tokens cost money.

TurboQuant splits your conversation into two zones:

Hot cache — the last N turns, kept verbatim (full fidelity)
Cold cache — everything older, compressed to ~25% of original size

Net result: 40–70% fewer tokens sent on long sessions. Same quality, lower cost, faster responses.

Install

Clone into your OpenClaw extensions folder:

git clone https://github.com/Boehner/openclaw-turboquant ~/.openclaw/extensions/turboquant

Add to your openclaw.json:

{
  "plugins": {
    "allow": ["turboquant"],
    "load": {
      "paths": ["/path/to/.openclaw/extensions/turboquant"]
    },
    "entries": {
      "turboquant": {
        "enabled": true,
        "config": {
          "keepRecentTurns": 6,
          "compressionRatio": 0.25,
          "minTurnsBeforeCompression": 10
        }
      }
    }
  }
}

Restart the OpenClaw gateway.

Configuration

Option	Default	Description
`enabled`	`true`	Enable/disable the plugin
`keepRecentTurns`	`6`	Number of recent turns to keep uncompressed (hot cache)
`compressionRatio`	`0.25`	Target size for compressed turns (0.25 = 25% of original)
`minTurnsBeforeCompression`	`10`	Don't compress until conversation has this many turns

How it works

Uses extractive summarization — scores every sentence by information density (term frequency × position), keeps the highest-value sentences, drops the rest. No AI calls needed for compression — it's fast, deterministic, and free.

The algorithm mirrors TurboQuant's core insight: not all context is equally important. Recent turns matter most. Old turns can be compressed aggressively without hurting response quality.

Expected savings

On a 30-turn conversation:

Without TurboQuant: ~3,200 tokens of history sent per request
With TurboQuant: ~1,100 tokens (hot: 6 turns verbatim, cold: 24 turns at 25%)
Savings: ~2,100 tokens per request (~66%)

License

MIT

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

版本

安装命令 点击复制

插件文档

TurboQuant — Context Compression Plugin for OpenClaw

What it does

Install

Configuration

How it works

Expected savings

License

安装命令点击复制