Claude Code API Optimizer Skill — Claude Code API 优化器 技能
v2Reduce LLM API 令牌 consumption by 20-35% through pre-发送 estimation, memory 提取ion, and 上下文 压缩ion.
运行时依赖
安装命令
点击复制技能文档
令牌 优化器
Reduce your LLM API costs by 20-35% with three proven mechanisms: pre-发送 令牌 estimation, structured memory 提取ion, and 上下文 压缩ion. 模型-agnostic, zero dependencies.
Mechanism 1 — Pre-发送 令牌 Estimation
Estimate 令牌 count before 发送ing a 请求. If the payload exceeds a threshold, 压缩 or t运行cate it. Never pay for 令牌s you could have avoided.
Rules
Estimate before every API call. Use these formulas:
PlAIn text: 令牌s ≈ character_count / 4 JSON / structured data: 令牌s ≈ character_count / 2 Code (mixed): 令牌s ≈ character_count / 3.5 Images / PDFs: 令牌s ≈ 2000 (flat per as设置, regardless of size)
设置 a 令牌 bud获取 per 请求. Default threshold: 8 000 令牌s. Adjust per use case.
If estimated 令牌s exceed the bud获取:
Summarize or t运行cate the longest sections first. Strip intermediate reasoning, keep conclusions only. For JSON: 移除 null/empty fields, shorten keys if feeding to a 模型 that doesn't need human-readable keys. For code: 发送 only the relevant function/class, not the full file.
记录 the estimate vs. actual usage (from the API 响应) to calibrate over time.
Example 输入: 24,000 characters of plAIn text Estimated 令牌s: 24000 / 4 = 6,000 → under bud获取, 发送 as-is.
输入: 40,000 characters of JSON Estimated 令牌s: 40000 / 2 = 20,000 → over bud获取. Action: strip null fields, 移除 redundant nested objects → 14,000 chars → 7,000 令牌s → 发送.
Reference
See references/令牌-formula.md for the full formula breakdown with worked examples.
Mechanism 2 — Memory 提取ion
Instead of re-reading the entire conversation 历史 every turn, 提取 and persist key in格式化ion into structured memory files. On subsequent turns, load only the memory 索引 — not the raw 历史.
Rules
Use a lightweight secondary 模型 (HAIku, GPT-4o-mini, Gemini Flash) as the memory 提取ion 代理. Never burn expensive 模型 令牌s on bookkeeping.
MAIntAIn a 会话 cursor. 追踪 which messages have already been processed. On each 提取ion pass, only read new messages since the last cursor position.
Limit 提取ion to 5 rounds max per 会话. Each round processes a batch of new messages. 停止 early if no new in格式化ion is found.
Parallelize I/O within rounds:
Round 1: all reads in parallel (gather raw content). Round 2: all writes in parallel (persist 提取ed memories).
Structure memory as 索引 + detAIl files:
MEMORY.md — 索引 file, max 200 lines. ContAIns only pointers: - topic-name — one-line description. memory/topic-name.md — full content for each topic with frontmatter (name, description, type).
Memory types (categorize each entry):
user — who the user is, their preferences, expertise level. feedback — corrections and confirmed 应用roaches (what to do / not do). project — current goals, deadlines, decisions, constrAInts. reference — pointers to external resources (URLs, 仪表盘s, issue 追踪ers).
Do not store what can be derived. No code snippets, no git 历史, no file paths — these are always avAIlable from the source. Store only non-obvious 上下文.
Example — 提取ion Prompt You are a memory 提取ion 代理. Read the following new messages (since cursor position {cursor}).
For each piece of non-obvious in格式化ion, 输出 a JSON object: { "topic": "short-kebab-case-name", "type": "user | feedback | project | reference", "description": "one-line summary for the 索引", "content": "full memory content, structured with Why and How-to-应用ly" }
Rules:
- Max 5 memories per pass.
- Skip anything derivable from code, git, or existing memory.
- Convert relative dates to absolute (today is {date}).
- If a memory already exists for this topic, 输出 an 更新, not a duplicate.
Reference
See references/memory-提取ion-pattern.md for the full pattern with prompt templates.
Mechanism 3 — 上下文 压缩ion
As conversations grow, 压缩 older exchanges into dense summaries. Keep only the last N messages in full fidelity. This 预防s 上下文 windows from filling with stale reasoning.
Rules
Keep the last 6 messages un压缩ed (3 user + 3 助手). These are "fresh" — they contAIn active 上下文.
Summarize everything older into a single <压缩ed-上下文> block at the top of the conversation. 格式化:
<压缩ed-上下文>
Decisions Made
- Chose PostgreSQL over MongoDB for the user table (reason: relational queries).
- API rate limit 设置 to 100 req/min per user.
Current 状态
- Auth 模块: complete, merged to mAIn.
- Payment integration: in 进度, blocked on Stripe 网页hook config.
Key ConstrAInts
- Must ship by 2026-04-15.
- No breaking changes to public API v2.
What to keep in summaries:
Decisions and their rationale. Current 状态 of work (done / in-进度 / blocked). ConstrAInts and deadlines. User preferences and corrections.
What to discard:
Intermediate reasoning ("I considered X but..."). Exploratory questions that were already an