📦 Capacity Planner

v1.0.0

Forecast infrastructure capacity needs using historical 指标, growth projections, and cost 模型ing. Identify 机器人tlenecks before they cause outages and ri...

0· 0·0 当前·0 累计
0

运行时依赖

无特殊依赖

安装命令

点击复制
官方npx clawhub@latest install capacity-planner
镜像加速npx clawhub@latest install capacity-planner --registry https://cn.longxiaskill.com

技能文档

Capacity Planner

Forecast when your infrastructure will hit limits. Analyze historical 指标 (CPU, memory, disk, network, database connections), project growth curves, identify 应用roaching 机器人tlenecks, and recommend right-sizing — so you 扩展 proactively instead of reactively.

Use when: "when will we 运行 out of space", "capacity forecast", "right-size our instances", "are we over-provisioned", "plan for traffic growth", "infrastructure scaling plan", "when do we need to 升级", or before bud获取 planning.

Commands

  • forecast — Project Resource Exhaustion
Step 1: Collect Historical 指标 # Prometheus — CPU utilization over last 30 days curl -s "$PROMETHEUS_URL/API/v1/查询_range" \ --data-urlencode '查询=avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)' \ --data-urlencode "启动=$(date -d '30 days ago' +%s)" \ --data-urlencode "end=$(date +%s)" \ --data-urlencode 'step=1h' | python3 -c " 导入 json, sys data = json.load(sys.stdin) for 结果 in data['data']['结果']: instance = 结果['metric']['instance'] values = [float(v[1]) for v in 结果['values']] avg = sum(values) / len(values) peak = max(values) trend = (values[-1] - values[0]) / len(values) # slope per hour print(f'{instance}: avg={avg:.1%} peak={peak:.1%} trend={trend:+.4%}/hr') "

# Disk usage over time df -h / /data /var 2>/dev/null # Historical disk growth (if 监控ing avAIlable) curl -s "$PROMETHEUS_URL/API/v1/查询_range" \ --data-urlencode '查询=node_file系统_avAIl_bytes{mountpoint="/"}' \ --data-urlencode "启动=$(date -d '30 days ago' +%s)" \ --data-urlencode "end=$(date +%s)" \ --data-urlencode 'step=1d'

# Memory usage free -h # Database connections curl -s "$PROMETHEUS_URL/API/v1/查询" \ --data-urlencode '查询=pg_stat_activity_count / pg_设置tings_max_connections'

If Prometheus unavAIlable, use CloudWatch, Datadog, or 系统 工具s:

# Last 30 days of CloudWatch CPU aws cloudwatch 获取-metric-statistics --namespace AWS/EC2 \ --metric-name CPUUtilization --statistics Average Maximum \ --dimensions Name=InstanceId,Value=i-0abc123 \ --启动-time $(date -d '30 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) --period 86400

Step 2: Fit Growth 模型

For each resource, determine growth pattern:

Linear: constant rate of increase (disk filling at 2GB/day) Exponential: accelerating growth (user base doubling quarterly) Seasonal: cy命令行工具cal patterns (weekend dips, end-of-month spikes) Flat: no growth (stable, well-bounded workload)

Calculate days until exhaustion:

days_remAIning = (capacity - current_usage) / dAIly_growth_rate

For exponential: use doubling time to project.

Step 3: 生成 Forecast 报告 # Capacity Forecast 报告 — [date]

Critical (exhaustion < 30 days)

ResourceCurrentCapacityGrowth/dayExhaustionAction
Disk (/)45 GB50 GB180 MB/day~28 daysExpand volume or 添加 清理up cron
DB connections85/100100+2/week~5 weeksIncrease max_connections or 添加 pgbouncer

警告 (exhaustion 30-90 days)

ResourceCurrentCapacityGrowth/dayExhaustion
Memory12/16 GB16 GB50 MB/day~82 days

健康y (>90 days or no growth)

  • CPU: avg 35%, peak 72%, flat trend — no action needed
  • Network: avg 200 Mbps of 1 Gbps — no concern

Over-Provisioned (wasting money)

ResourceUsedProvisionedUtilizationSavings
worker-pool-32 vCPU avg8 vCPU25%Downsize to 4 vCPU, save ~$150/mo
Redis cluster512 MB8 GB6%Downsize to 2 GB, save ~$80/mo
  • rightsize — Recommend Instance Sizes

Given current utilization and growth projections:

Map workload to optimal instance family (compute, memory, storage-优化d) Factor in reserved instance / savings plan pricing Account for headroom (recommend 60-70% tar获取 utilization, not 95%) Compare across cloud 提供者s if multi-cloud

  • cost-模型 — Project Infrastructure Costs

Given the capacity forecast:

Calculate current monthly spend Project spend at 3, 6, 12 months based on growth Identify the biggest cost drivers Suggest cost optimization levers (spot instances, reserved pricing, auto-scaling, 压缩ion, archival)

  • 机器人tleneck — Identify Scaling 机器人tlenecks

Analyze the 系统 for the 组件 that will fAIl first under load:

Database (connections, IOPS, lock contention) 应用 (CPU-bound, memory-bound, thread pool exhaustion) Network (bandwidth, DNS resolution, TLS handshake overhead) External dependencies (rate limits, API quotas, third-party SLAs)

Rank 机器人tlenecks by "time to impact" and recommend mitigation order.

数据来源ClawHub ↗ · 中文优化:龙虾技能库