速率限制平衡公平性、可用性和滥用防护。需要明确设计:谁被限流、什么资源被限制,以及客户端应该如何退避。
何时提供此工作流
触发条件:
- 保护公共 API、认证端点或昂贵操作
- 多租户“噪音邻居“隔离
- 事件后导致级联 429/502 的重试风暴
初始提供: 使用六个阶段:(1) 威胁与公平模型,(2) 维度与键,(3) 算法与配置,(4) 分布式实施,(5) 客户端协议与体验,(6) 可观测性与调优。确认实施层(API 网关 vs 应用中间件 vs 边缘)。
阶段 1:威胁与公平模型
目标: 区分合法突发(批处理作业、移动端重试)和滥用;将限制与产品层级和 SLA 对齐。
退出条件: 书面策略:免费 vs 付费限制、合作伙伴上限、突发配额。
阶段 2:维度与键
目标: 选择稳定的限制键:认证用户 ID > API 密钥 > IP(注意共享 NAT 问题)。
实践
阶段 3:算法与配置
目标: 令牌桶/漏桶用于平滑突发;滑动窗口用于严格的每分钟限制;将并发限制与请求速率分开考虑。
阶段 4:分布式实施
目标: 中央存储(Redis 等)具有原子递增;处理多区域(粘性路由 vs 共享计数器);注意时钟偏移。
阶段 5:客户端协议与体验
目标: 一致的 429 响应带有 Retry-After;记录指数退避 + 抖动;可选的 X-RateLimit-* 头用于透明度。
阶段 6:可观测性与调优
目标: 按路由和行为者类别的节流指标;异常拒绝峰值的警报(攻击 vs 配置错误的客户端)。
最终审查清单
- [ ] 策略与层级和公平性目标匹配
- [ ] 限制键稳定且难以伪造
- [ ] 算法与突发 vs 持续语义匹配
- [ ] 分布式正确性已考虑
- [ ] 面向客户端的 429 行为已记录
- [ ] 指标和调优循环已定义
有效指导的技巧
- 与认证协调——匿名 IP 限制是粗粒度的。
- 不要以破坏监控的方式对健康检查进行节流。
- GraphQL:考虑查询成本/深度限制,而不仅仅是 HTTP 计数。
- WebSockets:将连接限制与消息速率限制分开。
处理偏差
Rate limits balance fairness, availability, and abuse prevention. Design explicitly: who is throttled, what resource is limited, and how clients should back off.
When to Offer This Workflow
Trigger conditions:
- Protecting public APIs, auth endpoints, or expensive operations
- Multi-tenant “noisy neighbor” isolation
- Retry storms after incidents causing cascading 429/502
Initial offer:
Use six stages: (1) threat & fairness model, (2) dimensions & keys, (3) algorithms & config, (4) distributed enforcement, (5) client protocol & UX, (6) observability & tuning). Confirm enforcement layer (API gateway vs app middleware vs edge).
Stage 1: Threat & Fairness Model
Goal: Distinguish legitimate bursts (batch jobs, mobile retries) from abuse; align limits with product tiers and SLAs.
Exit condition: Written policy: free vs paid limits, partner caps, burst allowances.
Stage 2: Dimensions & Keys
Goal: Choose stable limit keys: authenticated user id > API key > IP (with shared-NAT caveats).
Practices
- Per-tenant and global limits; separate expensive routes (exports, search)
Stage 3: Algorithms & Config
Goal: Token bucket / leaky bucket for smooth bursts; sliding window for strict per-minute caps; consider concurrency limits separately from request rate.
Stage 4: Distributed Enforcement
Goal: Central store (Redis, etc.) with atomic increments; handle multi-region (sticky routing vs shared counters); mind clock skew.
Stage 5: Client Protocol & UX
Goal: Consistent 429 responses with Retry-After; document exponential backoff + jitter; optional X-RateLimit-* headers for transparency.
Stage 6: Observability & Tuning
Goal: Metrics on throttles by route and actor class; alerts on abnormal deny spikes (attack vs misconfigured client).
Final Review Checklist
- [ ] Policy matches tiers and fairness goals
- [ ] Limit keys stable and hard to spoof
- [ ] Algorithm matches burst vs sustained semantics
- [ ] Distributed correctness considered
- [ ] Client-facing 429 behavior documented
- [ ] Metrics and tuning loop defined
Tips for Effective Guidance
- Coordinate with authentication—anonymous IP limits are coarse.
- Don’t throttle health checks in ways that break monitors.
- GraphQL: consider query cost / depth limits, not only HTTP count.
- WebSockets: separate connection caps from message rate limits.
Handling Deviations
- Edge/CDN: limits may differ from origin—document both layers.