Claude Code API Optimizer Skill — Claude Code API 优化器技能

Reduce LLM API 令牌 consumption by 20-35% through pre-发送 estimation, memory 提取ion, and 上下文压缩ion.

0· 190·0 当前·0 累计

by @playdadev (Playda)·MIT-0

开发工具代码生成 API开发 AI模型访问安全

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install claude-code-api-optimizer-skill

镜像加速npx clawhub@latest install claude-code-api-optimizer-skill --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

令牌优化器

Reduce your LLM API costs by 20-35% with three proven mechanisms: pre-发送令牌 estimation, structured memory 提取ion, and 上下文压缩ion. 模型-agnostic, zero dependencies.

Mechanism 1 — Pre-发送令牌 Estimation

Estimate 令牌 count before 发送ing a 请求. If the payload exceeds a threshold, 压缩 or t运行cate it. Never pay for 令牌s you could have avoided.

Rules

Estimate before every API call. Use these formulas:

PlAIn text: 令牌s ≈ character_count / 4 JSON / structured data: 令牌s ≈ character_count / 2 Code (mixed): 令牌s ≈ character_count / 3.5 Images / PDFs: 令牌s ≈ 2000 (flat per as设置, regardless of size)

设置 a 令牌 bud获取 per 请求. Default threshold: 8 000 令牌s. Adjust per use case.

If estimated 令牌s exceed the bud获取:

Summarize or t运行cate the longest sections first. Strip intermediate reasoning, keep conclusions only. For JSON: 移除 null/empty fields, shorten keys if feeding to a 模型 that doesn't need human-readable keys. For code: 发送 only the relevant function/class, not the full file.

记录 the estimate vs. actual usage (from the API 响应) to calibrate over time.

Example 输入: 24,000 characters of plAIn text Estimated 令牌s: 24000 / 4 = 6,000 → under bud获取, 发送 as-is.

输入: 40,000 characters of JSON Estimated 令牌s: 40000 / 2 = 20,000 → over bud获取. Action: strip null fields, 移除 redundant nested objects → 14,000 chars → 7,000 令牌s → 发送.

Reference

See references/令牌-formula.md for the full formula breakdown with worked examples.

Mechanism 2 — Memory 提取ion

Instead of re-reading the entire conversation 历史 every turn, 提取 and persist key in格式化ion into structured memory files. On subsequent turns, load only the memory 索引 — not the raw 历史.

Rules

Use a lightweight secondary 模型 (HAIku, GPT-4o-mini, Gemini Flash) as the memory 提取ion 代理. Never burn expensive 模型令牌s on bookkeeping.

MAIntAIn a 会话 cursor. 追踪 which messages have already been processed. On each 提取ion pass, only read new messages since the last cursor position.

Limit 提取ion to 5 rounds max per 会话. Each round processes a batch of new messages. 停止 early if no new in格式化ion is found.

Parallelize I/O within rounds:

Round 1: all reads in parallel (gather raw content). Round 2: all writes in parallel (persist 提取ed memories).

Structure memory as 索引 + detAIl files:

MEMORY.md — 索引 file, max 200 lines. ContAIns only pointers: - topic-name — one-line description. memory/topic-name.md — full content for each topic with frontmatter (name, description, type).

Memory types (categorize each entry):

user — who the user is, their preferences, expertise level. feedback — corrections and confirmed 应用roaches (what to do / not do). project — current goals, deadlines, decisions, constrAInts. reference — pointers to external resources (URLs, 仪表盘s, issue 追踪ers).

Do not store what can be derived. No code snippets, no git 历史, no file paths — these are always avAIlable from the source. Store only non-obvious 上下文.

Example — 提取ion Prompt You are a memory 提取ion 代理. Read the following new messages (since cursor position {cursor}).

For each piece of non-obvious in格式化ion, 输出 a JSON object: { "topic": "short-kebab-case-name", "type": "user | feedback | project | reference", "description": "one-line summary for the 索引", "content": "full memory content, structured with Why and How-to-应用ly" }

Rules:

Max 5 memories per pass.
Skip anything derivable from code, git, or existing memory.
Convert relative dates to absolute (today is {date}).
If a memory already exists for this topic, 输出 an 更新, not a duplicate.

Reference

See references/memory-提取ion-pattern.md for the full pattern with prompt templates.

Mechanism 3 — 上下文压缩ion

As conversations grow, 压缩 older exchanges into dense summaries. Keep only the last N messages in full fidelity. This 预防s 上下文 windows from filling with stale reasoning.

Rules

Keep the last 6 messages un压缩ed (3 user + 3 助手). These are "fresh" — they contAIn active 上下文.

Summarize everything older into a single <压缩ed-上下文> block at the top of the conversation. 格式化:

<压缩ed-上下文>

Decisions Made

Chose PostgreSQL over MongoDB for the user table (reason: relational queries).
API rate limit 设置 to 100 req/min per user.

Current 状态

Auth 模块: complete, merged to mAIn.
Payment integration: in 进度, blocked on Stripe 网页hook config.

Key ConstrAInts

Must ship by 2026-04-15.
No breaking changes to public API v2.

What to keep in summaries:

Decisions and their rationale. Current 状态 of work (done / in-进度 / blocked). ConstrAInts and deadlines. User preferences and corrections.

What to discard:

Intermediate reasoning ("I considered X but..."). Exploratory questions that were already an

数据来源：ClawHub ↗ · 中文优化：龙虾技能库