TurboQuant+ KV Cache Compression — 6.4倍KV缓存压缩

v1.0.0

TurboQuant+ 在 Apple Silicon 上无损压缩 llama.cpp 的 KV 缓存达 6.4 倍，显存占用骤降，可跑更大模型、更长上下文，推理速度几乎不降。

0· 95·0 当前·0 累计

by @wukai8289

支付工具金融科技 AI模型访问

使用场景：支付宝支付蚂蚁金服API支付宝小程序

下载技能包

最后更新

2026/4/4

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

安全

high confidence

The skill's instructions, required actions, and lack of credentials align with its stated purpose of configuring TurboQuant+ for llama.cpp on Apple Silicon; it does recommend building third‑party code and making a privileged sysctl change, which are expected for this use but carry normal operational risk.

评估建议

This skill appears coherent for configuring TurboQuant+ with llama.cpp, but follow these precautions before proceeding: 1) Verify the external GitHub fork (TheTom/llama-cpp-turboquant) is the intended project and review its source/commit history before building. 2) Build and run the code in an isolated or trusted environment (container, dedicated machine) if possible. 3) Be cautious with the suggested sudo sysctl change (iogpu.wired_limit_mb): it requires elevated privileges and changes system G...

详细分析 ▾

✓ 用途与能力

Name/description claim KV cache compression for llama.cpp on Apple Silicon; the SKILL.md and README exclusively describe using a TurboQuant llama.cpp fork, relevant CLI flags, and platform-specific tuning. No unrelated credentials, binaries, or services are requested.

ℹ 指令范围

Instructions stay on-topic (clone/build the turboquant fork, run llama-server with cache-type flags). They also recommend a system-level change (sudo sysctl iogpu.wired_limit_mb) to raise GPU memory caps for large contexts — this is relevant to the stated goal but requires elevated privileges and modifies system state. No instructions collect or transmit user data to unexpected endpoints.

ℹ 安装机制

The skill is instruction-only (no install spec), but its README instructs cloning and building a GitHub repository (TheTom/llama-cpp-turboquant). Downloading and compiling third-party code from GitHub is common for this domain but is a moderate operational risk if the repository is untrusted or has malicious contents. The skill itself does not provide an automated installer or opaque download URLs.

✓ 凭证需求

No environment variables, credentials, or config paths are requested. The requested actions (build/run a local server, sysctl) are proportionate to compressing KV caches for local inference.

✓ 持久化与权限

Skill does not request persistent inclusion (always: false) and does not attempt to modify other skills or agent-wide configs. It does recommend a one-off privileged sysctl change (requires sudo) which alters system GPU memory limits until reboot; this is a legitimate but privileged action and not an automatic persistent installation by the skill.

安全有层次，运行前请审查代码。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/4

v1.0: TurboQuant+ KV缓存压缩指南，支持Apple Silicon本地LLM推理

● 可疑

安装命令

点击复制

官方npx clawhub@latest install turboquant-plus

镜像加速npx clawhub@latest install turboquant-plus --registry https://cn.longxiaskill.com 镜像可用

本土化适配说明

TurboQuant+ KV Cache Compression — 6.4倍KV缓存压缩安装说明：安装命令：npx clawhub@latest install turboquant-plus 该技能用于支付宝相关操作，可能需要相应的平台账号或API密钥

需要定制？告诉我你的需求 →

运行时依赖

版本

安装命令

本土化适配说明

相关技能推荐