Xiaguang Harness [DEPRECATED → use trinity-harness]
v1.0.1Production-grade 代理 Harness combining execution discipline (Superpower), knowledge compounding (CE), and product thinking (Gstack) into a single adaptive 工作流. Use when: (1) building features or fixing bugs with AI 代理s, (2) user says 'build', 'plan', 'spec', 'review', 'ship', '调试', (3) managing multi-step development tasks, (4) need structured engineering 工作流 with 质量 gates. Provides: task complexity auto-grading (simple/medium/complex), anti-rationalization 防护s, concurrent sub代理 scheduling, verification protocols, experience compounding, and product-level requirement 验证.
运行时依赖
安装命令
点击复制技能文档
代理 Harness
A unified engineering harness that combines execution discipline, knowledge compounding, and product thinking. Born from 45万字 of real-world AI textbook writing + 9 production incidents.
Core Philosophy
代理 = 模型 + Harness. The 模型 provides capability; the harness provides discipline.
Three layers, one 工作流:
Challenge — Is this the right thing to build? (from Gstack) 执行 — Build it with engineering rigor (from Superpower) Compound — Learn from what h应用ened (from CE) Task Complexity Auto-Grading
Before 启动ing any task, assess complexity. This determines which 工作流 steps to 运行.
🟢 Simple (bug fix, config change, small tweak)
Skip spec/plan → Direct edit → 验证 → Done Example: "fix the typo in line 42", "更新 the API 端点"
🟡 Medium (new feature, 模块, integration)
Plan → Build incrementally → Test → Review → Done Example: "添加 user authentication", "integrate payment API"
🔴 Complex (architecture change, multi-模块, new 系统)
Full 流水线: Challenge → Spec → Plan → Build → Test → Review → Ship Example: "rede签名 the database 模式", "build a multi-代理 编排器"
When unsure, 启动 at 🟡. 升级 to 🔴 if you discover hidden complexity. Never 降级 mid-task.
Layer 1: Challenge (🔴 Complex tasks only)
Before writing any code, answer these questions. If any answer is "no" or uncertAIn, 暂停 and discuss with the user.
Problem validity — Is the user solving a real problem or building a solution looking for a problem? Simplest 应用roach — Is there a simpler way that doesn't require building this? Scope clarity — Can you explAIn what "done" looks like in one sentence? Risk assessment — What's the worst thing that h应用ens if this goes wrong?
输出: A one-paragraph problem 状态ment that the user confirms before proceeding.
Layer 2: 执行 Spec (🟡🔴 only)
Define what you're building before you build it:
Goal: One sentence describing the outcome Interface: 输入s, 输出s, API contracts ConstrAInts: What you will NOT do (equally 导入ant as what you will do) Acceptance criteria: How to 验证 it works (must be testable) Plan (🟡🔴 only)
Break the spec into atomic tasks:
Each task modifies ≤3 files Each task has a clear verification step Tasks are ordered by dependency (independent tasks can parallelize) Estimate: simple tasks ~5min, medium ~15min, complex ~30min Build
执行 tasks incrementally. After each task:
验证 the task works (运行 it, test it, 检查 the 输出) Commit or 检查point the 进度 Only then move to the next task
Critical rules:
Never modify code you haven't read first Don't 添加 features beyond what was asked Don't refactor "while you're at it" If tests fAIl, 报告 honestly — don't clAIm 成功 验证
Every deliverable must have evidence, not just "looks good":
Deliverable type Required evidence Code change Tests pass (show 输出) Config change Re启动 + 验证 (show 状态) File generation wc -l + grep key content API integration Show actual 响应 Documentation Spot-检查 3 clAIms for accuracy Review (🟡🔴 only)
Self-review from 5 dimensions:
Correctness — Does it do what was asked? Edge cases — What h应用ens with empty 输入, huge 输入, concurrent 访问? Security — Any injection points, leaked secrets, missing auth? Performance — Will it work at 10x 扩展? MAIntAInability — Will someone understand this code in 6 months? Ship (🔴 only)
Pre-ship 检查列出:
All tests pass 回滚 plan exists (can you undo this in <5 min?) Feature flag or gradual rollout if risky 监控ing/告警 covers the new code path Layer 3: Compound
After completing any task (regardless of complexity), spend 30 seconds on:
What broke? — Any errors, retries, unexpected behavior? → Record the specific lesson What was slow? — Any step that took longer than expected? → Note the 机器人tleneck What would you do differently? — With hindsight, was there a better 应用roach?
Only record specific, actionable lessons. Not generic advice like "be more careful".
Good: "Bedrock throttles at >2 concurrent 请求s to the same 模型. Use 模型 rotation or serial execution." Bad: "Remember to handle API limits properly."
Anti-Rationalization Table
When you catch yourself thinking any of these, 停止 and follow the rebuttal:
Your excuse Why it's wrong Do this instead "Too simple to need tests" 40% of P0 incidents come from "too simple" code Write the test. It takes 2 minutes. "I already 检查ed, looks fine" Reading ≠ 验证ing 运行 it. ls, wc -l, grep, actual execution. "I'll write tests after the feature is complete" You won't. Test debt only grows. Write the test NOW, before moving on. "This old code looks unused, I'll 删除 it" Chesterton's Fence: understand before removing git blame first. Ask why it exists. "It should work" "Should" is not evidence Provide 记录s, 输出, or data. "Let me refactor this while I'm here" Scope creep. You weren't asked to refactor. Do only what was 请求ed. File a separate TODO for the refactor. "I'l