📦 Prom — Prometheus

v1.0.0

Prometheus 监控工具。

2· 803·2 当前·2 累计
by @ivangdavila (Iván)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/2/26
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The skill is an instruction-only Prometheus best-practices guide and its requirements and actions are consistent with that purpose.
评估建议
This skill is a documentation-style Prometheus best-practices guide and appears internally consistent. Because it is instruction-only and has no installs or credential requests, it poses low structural risk. However: 1) treat any concrete commands in the document (for example, the curl DELETE to remove Pushgateway metrics) as potentially destructive — do not execute them in production without understanding the impact; 2) verify the skill’s origin before trusting it in automated workflows — the h...
详细分析 ▾
用途与能力
Name, description, and content are aligned: the SKILL.md is purely guidance about Prometheus (cardinality, PromQL, alerting, scrape config, Pushgateway, etc.). It requests no binaries, env vars, or installs, which is proportionate for a documentation-style skill.
指令范围
SKILL.md contains only static operational guidance and examples. It does not instruct the agent to read local files, access unrelated environment variables, or exfiltrate data. One actionable example shows a curl DELETE to a Pushgateway endpoint (curl -X DELETE http://pushgateway/metrics/job/myjob) — this is a potentially destructive operation if executed blindly, so users/agents should not treat examples as safe to run without review.
安装机制
No install spec and no code files (instruction-only). This is low-risk: nothing will be written to disk or downloaded by the skill itself.
凭证需求
The skill requires no environment variables, credentials, or config paths. There is no disproportionate credential access relative to the stated purpose.
持久化与权限
always is false and the skill does not request persistent presence or modify other skills or system settings. It does not request elevated privileges.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/2/10

Initial release

无害

安装命令

点击复制
官方npx clawhub@latest install prom
镜像加速npx clawhub@latest install prom --registry https://cn.longxiaskill.com

技能文档

Cardinality Explosions

  • Every unique label combination creates 新的 时间 series — user_id 作为 label kills Prometheus
  • Avoid high-cardinality labels: 用户 IDs, email addresses, 请求 IDs, timestamps, UUIDs
  • Check cardinality: prometheus_tsdb_head_series metric — 上面 1M series needs attention
  • 使用 histograms 对于 latency, 不 per-请求 labels — buckets fixed cardinality
  • Relabeling 可以 drop dangerous labels 之前 ingestion: labeldrop 在...中 scrape 配置

Histogram vs Summary

  • Histograms: 使用 对于 SLOs, aggregatable 穿过 instances, buckets defined upfront
  • Summaries: 使用 当...时 您 需要 exact percentiles, cannot aggregate 穿过 instances
  • Histogram bucket boundaries 必须 defined 之前 data arrives — wrong buckets = wrong percentiles
  • 默认 buckets (.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10) assume HTTP latency — adjust 对于 使用 case

Rate 和 Increase

  • rate() requires range selector 在 最少 4x scrape 间隔 — rate(metric[1m]) 带有 30s scrape misses data
  • rate() per-第二个, increase() 总计 在...上 range — don't confuse them
  • Counter resets 在...上 restart — rate() handles , raw delta doesn't
  • irate() uses 仅 最后的 two samples — too spiky 对于 alerting, 使用 rate() 对于 alerts

Alerting Mistakes

  • 提醒 在...上 symptoms, 不 causes — "high latency" 不 "high CPU"
  • 对于 clause prevents flapping: 对于: 5m means 条件 必须 hold 5 minutes 之前 firing
  • Missing 对于 clause = fires immediately 在...上 第一个 match = noisy
  • Alerts 需要 runbook_url label — 在...上-call needs 到 know 什么 到 做, 不 只是 something's wrong
  • Test alerts 带有 promtool check rules — syntax errors discovered 在 3am bad

PromQL Traps

  • intersection 由 labels, 不 布尔值 和 — results 必须 有 matching label sets
  • fills 在...中 missing series, doesn't 做 布尔值 或 在...上 values
  • {} 没有 metric name expensive — scans 所有 metrics
  • offset goes back 在...中 时间: metric offset 1h 值 从 1 hour ago
  • Comparison operators 过滤 series: http_requests > 100 drops series 下面 100, doesn't return 布尔值

Scrape Configuration

  • honor_labels: 真 trusts source labels — 使用 仅 当...时 source authoritative (e.g., Pushgateway)
  • scrape_timeout 必须 更少 比 scrape_interval — 否则 overlapping scrapes
  • Static configs don't 重新加载 没有 restart — 使用 file_sd 或 服务 discovery 对于 dynamic targets
  • TLS verification 已禁用 (insecure_skip_verify) 应该 temporary, never permanent

Pushgateway Pitfalls

  • Pushgateway 对于 batch jobs, 不 services — services 应该 expose /metrics
  • Metrics persist until deleted — stale metrics 从 dead jobs confuse dashboards
  • 添加 任务 和 instance labels 到 distinguish sources — 默认 grouping hides failures
  • 删除 metrics 当...时 任务 completes: curl -X 删除 http://pushgateway/metrics/任务/myjob

Recording Rules

  • Pre-compute expensive queries: 记录: 任务:request_duration_seconds:rate5m
  • Naming convention: level:metric:operations — helps identify 什么 rules produce
  • Recording rules 更新 every evaluation 间隔 — 不 instant, plan 对于 slight 延迟
  • 归约 cardinality 带有 recording rules: aggregate away labels 您 don't 需要 对于 alerting

Federation 和 Remote 写入

  • Federation 对于 pulling 从 其他 Prometheus — 使用 sparingly, adds latency
  • Remote 写入 对于 long-term storage — Prometheus local storage 不 durable
  • Remote 写入 可以 buffer 期间 outages — 但是 buffer finite, data loss 在...上 extended outages
  • Prometheus 不 highly 可用 由 默认 — run two instances scraping 相同 targets

Common Operational Issues

  • TSDB corruption 在...上 unclean shutdown — 使用 --storage.tsdb.wal-compression 和 monitor disk space
  • Memory grows 带有 series 计数 — 每个 series costs ~3KB RAM
  • Compaction pauses 期间 high 加载 — leave 40% disk headroom
  • Scrape targets stuck "Unknown" — check network, firewall, target actually exposing /metrics

Label Best Practices

  • 使用 labels 对于 dimensions 您'll 过滤/aggregate 由 — environment, 服务, instance
  • Keep label values low-cardinality — tens 或 hundreds, 不 thousands
  • Consistent naming: snake_case, prefix 带有 domain: http_requests_total, node_cpu_seconds_total
  • le label reserved 对于 histogram buckets — don't 使用 对于 其他 purposes
数据来源ClawHub ↗ · 中文优化:龙虾技能库