详细分析 ▾
运行时依赖
版本
Major update: Adds Alloy pipeline management, new recipes, and greatly expands data collection capabilities. - Introduced Alloy pipeline management with the new `alloy_pipeline` tool and support for recipes, creation, status, diagnosis, and deletion actions. - Added dozens of Alloy pipeline recipes and helpers for setting up metrics, logs, traces, exporters, and agent integrations. - Expanded documentation and quickstart references for Alloy, pipeline composition, and common data-collection use cases. - Extended the SKILL to include scenarios for managing data collection pipelines, collecting logs and metrics from multiple sources, and handling credentials securely. - Updated and refined limits, troubleshooting, and best-practices guidance in user instructions.
安装命令 点击复制
技能文档
You have full native Grafana access — query data, create dashboards, set alerts, receive alert notifications, annotate events, explore datasources, push custom data, and deliver visualizations inline. Works with ANY data in Grafana, not just agent metrics.
Musts
- Always call
grafana_explore_datasources第一个 当...时 您 需要 datasource UID — never guess UIDs - Always call
grafana_search之前 creating dashboard — avoid duplicates - Always call
grafana_get_dashboard之前grafana_share_dashboard— 您 需要 exact panel IDs - Always call
grafana_get_dashboard之前grafana_update_dashboard— 您 需要 panel IDs 和 current structure - Prefer
grafana_query对于 direct answers 在...上 creating dashboards — "什么's my cost?" needs 数字, 不 URL - Prefer
grafana_query在...上grafana_create_dashboard+grafana_share_dashboard对于 simple data questions — 数字 faster 比 图表 - 使用
grafana_query_logs对于 log searches — LogQL 对于 logs, PromQL 对于 metrics, TraceQL 对于 traces. Never 使用grafana_query对于 Loki datasources - 使用
grafana_query_traces对于 trace searches — TraceQL 对于 traces, PromQL 对于 metrics, LogQL 对于 logs. Never 使用grafana_query或grafana_query_logs对于 Tempo datasources - 所有 tools work 带有 任何 Prometheus datasource — 不 只是
openclaw_lens_metrics - 当...时 您 see "GRAFANA ALERTS" 在...中 prompt context, investigate immediately 带有
grafana_check_alerts— 使用suggestedInvestigation字段 到 go directly 到 querying ( provides tool, 查询, 和 datasource) - Run
grafana_check_alerts带有 actionsetuponce 之前 提醒 notifications 可以 reach agent — creates webhook contact point - 推送 data 之前 querying 或 dashboarding — data pushed 通过 OTLP 和 可用 immediately
- Prefer
grafana_explain_metric对于 "什么 metric?" questions 在...上 manualgrafana_query— returns current 值, trend, stats, 和 metadata 在...中 one call - 使用
queryNames从 推送 响应 对于 PromQL queries — don't guess metric names (counters 获取_totalsuffix) - 使用
openclaw_ext_prefix 对于 custom metrics —grafana_push_metricsauto-prepends 如果 missing - 关注 statistics-第一个 discipline 对于 log investigation — always run 计数/rate LogQL 之前 reading individual entries. 使用
grafana_query_logs带有 metric-在...上-logs queries (count_over_time,rate,topk) 之前 switching 到 raw log entries - Silence alerts 期间 investigation — 使用
grafana_check_alerts带有 actionsilence到 prevent repeat notifications 当...时 investigating - 使用
list_rules对于 complete 提醒 health —grafana_check_alerts带有 actionlist_rulesreturns 所有 rules 带有 live eval state (normal/firing/待处理/nodata/错误), health, 和 lastEvaluation — 否 需要 到 cross-reference 带有列表action - 使用
dashboardUid+panelId到 re-run panel queries — don't manually extract PromQL/LogQL 从get_dashboard输出. Bothgrafana_query和grafana_query_logsaccept these params 到 auto-resolve panel's 查询 表达式 和 datasource. tool handles 模板 变量 replacement 和 datasource routing automatically - Confirm 带有 用户 之前 deleting dashboards 或 提醒 rules —
grafana_update_dashboard带有 operation删除和grafana_check_alerts带有 actiondelete_rulepermanent 和 cannot undone - Always 使用
alloy_pipelineactionrecipes第一个 当...时 unsure 哪个 pipeline recipe fits 用户's 请求 — 因为 recipes provide validation, credential handling, 和 sample queries raw 配置 做 不 - Always call
alloy_pipelineactionstatus之后 creating pipeline — 因为 data takes 15-20s 到 flow 通过 pipeline, 和 components 可能 失败 silently 之后 重新加载 - Never guess Alloy 组件 names — 使用 recipes 对于 known patterns, 或 raw
配置仅 当...时 用户 explicitly provides Alloy syntax - Prefer recipes 在...上 raw 配置 当...时 recipe exists — recipes provide validation, sample queries, credential handling, dashboard templates, 和 automatic 导出 target wiring
- Never 写入 credentials 进入 raw
配置— 当...时 用户 provides 连接 字符串, DSN, 密码, 或 API 键, ALWAYS 使用 matching recipe (哪个 routes credentials 通过sys.env(), keeping secrets off disk). 如果 您 必须 使用 raw 配置, wrap sensitive values 在...中sys.env("MY_VAR_NAME")和 tell 用户 到 设置 env var 在哪里 Alloy runs - 读取
envVarsRequired从 every pipeline 创建 响应 — credential recipes 可能 returnpending_credentialsstatus 当...时 env vars aren't 设置 尚未. Tell 用户 exact var names 和 它们 必须 设置 them 在哪里 Alloy runs, 然后 验证 带有 actionstatus - Warn users 之前 creating credential-必填 pipelines — Alloy 配置 重新加载 atomic: 如果 credential recipe's env vars aren't 设置, 重新加载 failure blocks 所有 managed pipelines (不 只是 新的 one) until env vars 设置 或 pipeline deleted. Always ask: "做 您 有 credentials 就绪 到 设置 作为 env vars 在...上 Alloy host?"
- Chain pipeline creation 进入 existing tools — 之后 pipeline 活跃:
grafana_list_metrics或grafana_query_logs到 discover data,grafana_create_dashboard到 visualize,grafana_create_alert到 monitor - 使用
alloy_pipelineactiondiagnose作为 第一个 step 当...时 用户 reports pipeline issues — 因为 checks Alloy connectivity, 所有 pipeline health, 配置 file drift, 和 limits 在...中 one call - Confirm 带有 用户 之前 deleting pipelines —
alloy_pipeline带有 action删除removes 配置 和 data stops flowing - 所有 log recipes accept 处理中 params — don't 创建 separate "处理中" pipelines. 添加
jsonExpressions,labelFields,structuredMetadata,tenantValue,matchRoutes, etc. directly 到 任何 log recipe (docker-logs, file-logs, syslog, etc.) - 使用
samplingPolicies对于 multi-policy tail sampling — don't 创建 raw 配置 当...时application-traces可以 handle .sampleRate对于 simple probabilistic,samplingPolicies对于 intelligent multi-policy (keep errors, keep slow, sample rest) - 使用 log 处理中 params 对于 multi-tenant routing —
tenantValue/tenantSource/matchRouteswork 在...上 所有 log recipes. Don't 创建 separate "routing" pipelines - 读取
references/alloy-components.md之前 composing raw 配置 — 有 复制-pasteable snippets 对于 所有 common Alloy components
Quick Decision Tree
- "什么 [metric]?" / "为什么 做过 spike?" →
grafana_explain_metric - "什么's current 值 的 X?" / complex PromQL →
grafana_query - "查找 错误 logs" / "搜索 logs 对于..." →
grafana_query_logs - "查找 slow traces" / "Show trace 对于 会话 X" / "Debug distributed spans" →
grafana_query_traces - "Debug 会话" / "为什么 做过 失败?" / "什么 went wrong?" →
grafana_query_traces(搜索 错误/slow) →grafana_query_traces(获取 → 关注correlationHint) →grafana_query_logs→grafana_query→grafana_annotate - "Show me 图表" / "Visualize..." →
grafana_search→grafana_get_dashboard→grafana_share_dashboard - "创建 dashboard 对于..." →
grafana_search(check duplicates) →grafana_create_dashboard - "添加 panel 到 my dashboard" →
grafana_get_dashboard→grafana_update_dashboard - "删除 dashboard" →
grafana_update_dashboard带有 operation删除(confirm 带有 用户 第一个) - "提醒 me 当...时..." →
grafana_check_alerts(setup) →grafana_create_alert - "列表 my 提醒 rules" / "什么 alerts 做 I 有?" →
grafana_check_alerts带有 actionlist_rules - "删除 提醒 rule X" →
grafana_check_alerts带有 actionlist_rules→delete_rule带有ruleUid - "Track my [custom data]" / "记录 my [past data]" →
grafana_push_metrics(带有 可选时间戳对于 historical data, auto-registers, returnsqueryNames) →grafana_query带有queryNames - "什么 data sources 做 I 有?" →
grafana_explore_datasources - "什么 metrics 可用?" →
grafana_list_metrics - "设置 up monitoring" / "Monitor my agent" / "什么 dashboards 应该 I 有?" →
grafana_search(check existing) →grafana_create_dashboard带有llm-command-center→ 关注suggestedNextchain 通过 remaining templates - "GenAI observability" / "OTel gen_ai metrics" / "Standard AI monitoring" →
grafana_create_dashboard带有genai-observability模板 - "什么 happened 在...中 会话 X?" / "Debug 会话" →
grafana_create_dashboard带有会话-explorer模板 → paste 会话 ID - "Show me LLM traces" / "Show agent logs" →
grafana_create_dashboard带有llm-command-center模板 (Loki + Tempo) - "如何 much am I spending?" / "Cost analysis" →
grafana_create_dashboard带有cost-intelligence模板 - "哪个 tools slow?" / "Tool errors" →
grafana_create_dashboard带有tool-performance模板 - "队列 health" / "Webhook issues" / "Stuck sessions" →
grafana_create_dashboard带有sre-operations模板 - "System health check" / "Status 举报" / "Review 所有 dashboards" →
grafana_explore_datasources→grafana_check_alerts(列表 + list_rules) →grafana_search→grafana_get_dashboard(audit=真 对于 每个) → summarize - "Audit my dashboard" / "哪个 panels broken?" →
grafana_get_dashboard(audit=真) → reviewauditSummary+ per-panelhealth - "Am I 正在 attacked?" / "Security check" / "Security status" →
grafana_security_check - "设置 up security monitoring" →
grafana_check_alerts(setup) →grafana_create_dashboard(security-overview) →grafana_create_alert(webhook 错误 burst, cost spike, tool loops, injection signals) - "Investigate security 提醒" →
grafana_security_check→grafana_query_logs(correlate) →grafana_annotate(mark investigation) →grafana_check_alerts(silence) - "Investigate 提醒" / "为什么 X broken?" / "Debug issue" / "Triage" / "Root cause" →
grafana_investigate(multi-signal triage) → 关注suggestedHypotheses.testWith对于 deep-dives - " metric normal?" / " 那里 anomaly?" →
grafana_explain_metric(returnsanomalyz-score +seasonalityvs 1d/7d ago 对于 24h period) - "RED analysis" / "什么's 错误 rate?" / "服务 health" → RED 方法 queries (see sre-investigation.md §2)
- "提醒 fatigue" / "哪个 alerts noisy?" / "提醒 health" →
grafana_check_alerts带有 actionanalyze— fatigue 举报 - "Postmortem" / "Incident summary" / "什么 happened?" →
grafana_investigate→ 5-Phase methodology → postmortem 模板 (see sre-investigation.md §9) - "Compare 之前/之后 deployment" →
grafana_annotate(列表, tags: ["deploy"]) →grafana_explain_metric(compareWith: "上一个")
Data Collection Pipelines (Alloy)
- "Monitor 服务/数据库/app" →
alloy_pipelineactionrecipes(过滤 由 category) → select recipe →创建→status→ 查询 → dashboard → 提醒 - "Scrape metrics 从 [endpoint]" / "My app exposes /metrics" →
alloy_pipeline带有 recipescrape-endpoint+ params{ url } - "Monitor PostgreSQL/MySQL/Redis/MongoDB/Memcached" →
alloy_pipeline带有 recipe[db]-exporter+ params{ connectionString } - "Collect 和 解析 logs 带有 JSON extraction" →
alloy_pipeline(log recipe + 处理中 params:jsonExpressions,labelFields,structuredMetadata) - "Collect Docker logs" / "See container logs 在...中 Grafana" →
alloy_pipeline带有 recipedocker-logs - "Tail log files" / "Collect app logs 从 /var/log" →
alloy_pipeline带有 recipefile-logs+ params{ paths } - "Accept logs 通过 HTTP 推送 API" / "Centralized log gateway" →
alloy_pipeline带有 recipeloki-推送-api - "Consume logs 从 Kafka" →
alloy_pipeline带有 recipekafka-logs+ params{ brokers, topics } - "设置 up syslog collection" →
alloy_pipeline带有 recipesyslog - "Monitor endpoint availability" / "Synthetic probing" / "HTTP health checks" →
alloy_pipeline带有 recipeblackbox-exporter+ params{ targets } - "Kubernetes monitoring" / "Monitor my K8s cluster" →
alloy_pipeline带有 recipekubernetes-pods+kubernetes-services+kubernetes-logs(3 pipelines) - "接收 OTLP data" / "设置 up trace collection" →
alloy_pipeline带有 recipeotlp-receiver - "Generate RED metrics 从 traces" / "Span metrics" →
alloy_pipeline带有 recipespan-metrics - "服务 dependency 图形 从 traces" →
alloy_pipeline带有 recipe服务-图形 - "Monitor Alloy itself" / "Self-monitoring" →
alloy_pipeline带有 recipeself-monitoring - "Redact secrets 从 logs" / "Compliance logging" →
alloy_pipeline带有 recipesecret-过滤-logs+ params{ paths } - "Monitor Elasticsearch/Kafka" →
alloy_pipeline带有 recipeelasticsearch-exporter/kafka-exporter - "System metrics" / "节点 monitoring" / "CPU/memory/disk" →
alloy_pipeline带有 recipe节点-exporter - "Docker container metrics" / "Container resource usage" →
alloy_pipeline带有 recipedocker-metrics - "归约 trace costs" / "Keep 仅 错误 traces" / "Smart trace sampling" / "Tail sampling" →
alloy_pipeline带有 recipeapplication-traces+samplingPolicies数组 (keep errors, keep slow, 过滤 health checks, sample rest) - "Multi-tenant Loki" / "路由 logs 由 tenant" / "不同 tenants 对于 不同 apps" → 任何 log recipe +
tenantValue或matchRoutes处理中 param - "个人资料 my app" / "CPU profiling" / "Memory profiling" / "Continuous profiling" / "Go pprof" →
alloy_pipeline带有 recipecontinuous-profiling+targets - "Frontend observability" / "Browser RUM" / "Web vitals" / "Faro SDK" →
alloy_pipeline带有 recipefaro-frontend - "GELF logs" / "Graylog" / "Docker GELF driver" →
alloy_pipeline带有 recipegelf-logs - "Custom Alloy pattern" / "Advanced pipeline" → 读取
references/alloy-components.md→alloy_pipeline带有 raw配置+ 可选sampleQueries - "什么 data collection recipes 可用?" →
alloy_pipeline带有 actionrecipes - "什么 pipelines 做 I 有?" / "Pipeline 列表" →
alloy_pipeline带有 action列表 - " my pipeline working?" / "Pipeline health" →
alloy_pipeline带有 actionstatus+ name - "Pipeline problems" / "为什么 isn't data showing up?" →
alloy_pipeline带有 actiondiagnose→ 关注 remediation - "删除 pipeline" / "移除 monitoring 对于..." →
alloy_pipeline带有 action删除+ name (confirm 带有 用户 第一个)
Working 带有 Multiple Grafana Instances
When several Grafana environments are configured (dev, staging, prod), every tool accepts an optional instance parameter. grafana_explore_datasources returns availableInstances — use the name values from that list.
为什么 matters: Users often 需要 到 查询 production metrics, 创建 dashboards 在...中 dev, 或 compare environments side 由 side. 每个 tool call targets one instance.
Smart defaults: Omitting instance always targets configured 默认 — safe 和 invisible 对于 single-environment setups. 仅 specify instance 当...时 用户 explicitly names non-默认 environment.
Cross-environment workflows: 每个 call independent. 查询 prod, 创建 dashboard 在...中 dev — 只是 设置 instance differently 在...上 每个 call. 否 context switching needed.
Tool Inventory
| Tool | What It Does |
|---|---|
grafana_explore_datasources | Discover configured datasources (UIDs, types, query routing) — tells you which tool + query language to use for each datasource |
grafana_list_metrics | Discover available metrics or label values from a datasource. Use compact: true with metadata: true for minimal fields in multi-tool chains |
grafana_query | Run PromQL instant/range queries — get numbers directly |
grafana_query_logs | Run LogQL queries against Loki — search and filter logs |
grafana_query_traces | Run TraceQL queries against Tempo — search traces or get full trace by ID |
grafana_create_dashboard | Create dashboards from templates or custom JSON |
grafana_update_dashboard | Add/remove/update panels, change dashboard metadata, or delete dashboard |
grafana_get_dashboard | Get dashboard summary (panels, queries). Use compact: true for overview scans, audit: true to health-check all panels in one call |
grafana_search | Search existing dashboards by title, tags, or starred status |
grafana_share_dashboard | Render panel as image and deliver inline via messaging |
grafana_create_alert | Create Grafana-native alert rules on any metric |
grafana_annotate | Create or list annotations (events) on dashboards |
grafana_check_alerts | Check, acknowledge, list/delete rules, silence/unsilence, or set up Grafana alert webhook notifications. Use compact: true with list_rules for minimal fields |
grafana_push_metrics | Push custom data (calendar, git, fitness, finance) via OTLP |
grafana_explain_metric | Get metric context: current value, trend, stats, metadata, drill-down queries — agent interprets |
grafana_security_check | Run 6 parallel security checks and return threat-level assessment (green/yellow/red) — "Am I being attacked?" |
grafana_investigate | Multi-signal investigation triage — gathers metrics, logs, traces, and context in parallel, generates hypotheses with specific tool+params for follow-up |
alloy_pipeline | Create and manage Alloy data collection pipelines — 29 recipes for metrics, logs, traces, profiles from any infrastructure (databases, K8s, Docker, apps, profiling, frontend RUM) |
Tool Details
grafana_explore_datasources
当...时: 第一个 step 当...时 用户 mentions data, metrics, 或 monitoring. Gets datasource UIDs needed 由 grafana_query, grafana_query_logs, grafana_query_traces, grafana_list_metrics, grafana_create_alert, 和 grafana_explain_metric.
Params: instance (可选 — target Grafana instance, omit 对于 默认).
示例: {}
示例 (multi-instance): { "instance": "prod" }
Returns: 列表 的 datasources 带有 uid, name, 类型, isDefault, 加上 routing hints: queryTool (哪个 agent tool 到 使用, e.g. "grafana_query", "grafana_query_logs", 或 "grafana_query_traces"), queryLanguage (e.g. "PromQL", "LogQL", "TraceQL"), 和 supported (布尔值 — whether agent tool 可以 查询 datasource). 使用 queryTool 到 pick right tool 对于 每个 datasource. 当...时 multiple Grafana instances configured, 也 returns instance (哪个 instance 是 queried) 和 availableInstances (列表 的 { name, url, isDefault } 对于 所有 configured instances).grafana_list_metrics
当...时: 用户 asks "什么 metrics 可用?" 或 您 需要 到 discover metrics 之前 querying 或 composing dashboards. 也 当...时 grouping metrics 由 函数 — metadata mode adds category 到 每个 openclaw_ metric. 使用 purpose 当...时 用户 asks 关于 specific concern (e.g., "performance metrics", "cost metrics").
Params: datasourceUid (必填), prefix (过滤 由 prefix), 搜索 (targeted discovery — server-side regex, 仅 matching metrics returned), purpose ("performance" | "cost" | "reliability" | "capacity" — pre-过滤 由 intent, composable 带有 prefix 和 搜索), label (列表 label values 代替), metadata (布尔值 — enriched results 带有 类型/help/category), compact (布尔值 — 带有 metadata, returns 仅 name/类型/category, ~60% smaller).
示例 names: { "datasourceUid": "prom1", "prefix": "openclaw_lens_" }
示例 搜索: { "datasourceUid": "prom1", "搜索": "steps" }
示例 purpose: { "datasourceUid": "prom1", "purpose": "performance", "metadata": 真 }
示例 combined: { "datasourceUid": "prom1", "prefix": "openclaw_ext_", "搜索": "fitness" }
示例 metadata: { "datasourceUid": "prom1", "metadata": 真, "prefix": "openclaw_" }
示例 compact: { "datasourceUid": "prom1", "metadata": 真, "compact": 真 }
Returns names: { metrics: ["metric1", "metric2", ...] }. Truncated 在 200.
Returns metadata: { metadataSource, categorySummary: { cost: 3, usage: 4, 会话: 5, ... }, metrics: [{ name, 类型, help, category?, source? }, ...] }. 使用 之前 composing custom dashboards — 类型 tells 您 counter vs gauge vs histogram, category groups openclaw_ metrics 由 函数. 搜索 也 matches help text. Categories: cost, usage, 会话, 队列, messaging, webhook, tools, agent, custom. categorySummary gives counts per category 对于 quick overview (omitted 当...时 否 openclaw_ metrics). Purpose maps: performance → 会话 + tools, cost → cost + usage, reliability → webhook + messaging + agent, capacity → 队列 + 会话. metadataSource: "prometheus" 当...时 Prometheus metadata endpoint 有 data, "synthetic" 当...时 OTLP-仅 (metadata synthesized 从 known metric registry — histogram sub-metrics deduplicated, 类型/help 从 Grafana Lens definitions). 在...上 OTLP stacks, includes hint explaining 为什么 metadata synthetic. source: "synthetic" 在...上 individual entries 从 registry; source: "custom" 在...上 entries 从 custom metrics store.
Returns compact: { metadataSource, categorySummary: {...}, metrics: [{ name, 类型, category? }, ...] }. 相同 作为 metadata 但是 drops help, source, labelNames — 使用 在...中 multi-tool chains 在哪里 您 需要 metric names 和 types 但是 不 满 descriptions.
示例 label: { "datasourceUid": "prom1", "label": "任务" }
Returns label: { label, 计数, totalCount, values: ["value1", "value2", ...] }. Truncated 在 200.grafana_query
当...时: 用户 asks data question needs direct answer, 不 dashboard. 也 对于 re-running existing dashboard panel's 查询 带有 不同 时间 ranges.
Params: datasourceUid, expr (PromQL), queryType (instant/range), 开始 (range 仅, 必填), end (range 仅, 默认 "现在"), step (range 仅, 可选 — auto-calculated 从 时间 range 如果 omitted, targeting ~300 datapoints), dashboardUid (可选 — resolve 查询 从 panel), panelId (可选 — 使用 带有 dashboardUid).
示例 instant: { "datasourceUid": "prom1", "expr": "求和(increase(openclaw_lens_cost_by_model_total[1d])) 或 vector(0)" }
示例 range (auto-step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "开始": "现在-30d" }
示例 range (explicit step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "开始": "现在-1h", "end": "现在", "step": "60" }
示例 panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 10, "queryType": "range", "开始": "现在-7d" }
Tip: 开始/end accept Unix seconds 或 relative expressions 点赞 "现在-1h", "现在-7d". 对于 range queries, 只是 设置 开始 — end defaults 到 "现在" 和 step auto-calculated. Override step 仅 当...时 您 需要 specific resolution.
Tip (panel re-run): 设置 dashboardUid + panelId 到 re-run panel's 查询 没有 manually extracting PromQL. tool auto-resolves expr 和 datasourceUid 从 panel definition. 模板 variables replaced 带有 wildcards. 您 可以 仍然 override expr 或 datasourceUid explicitly 如果 needed. 获取 panel IDs 从 grafana_get_dashboard.
Returns instant: { metrics: [{ metric: {...}, 值: "1.23", 时间戳: "...", healthContext?: { status, thresholds, description, direction } }], datasourceUid, resultCount, warnings?, hint? } — healthContext included 对于 well-known openclaw_lens_ gauge metrics, providing SRE-grade health assessment: status ("healthy"/"warning"/"critical"), thresholds (warning/critical values), description (什么 metric means), direction ("higher_is_worse"/"lower_is_worse"). Omitted 对于 unknown metrics. Capped 在 50 results; 当...时 exceeded includes truncated: 真, totalResults, 和 truncationHint advising 到 narrow 查询.
Returns range: { series: [{ metric: {...}, values: [{ 时间, 值 }...] }], datasourceUid, resultCount, warnings?, hint? } — truncated 到 20 points per series 和 50 series max. 当...时 series truncated includes truncated: 真, totalSeries, 和 truncationHint. 当...时 step auto-calculated, includes step: { 值: "288s", display: "5m", auto: 真 }.
Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal 查询 results. 如果 panel uses Loki datasource, returns 错误 directing 您 到 使用 grafana_query_logs 代替.
Returns (warnings): 当...时 Prometheus flags non-fatal issue (e.g., rate() 在...上 gauge), warnings: [{ cause, suggestion, 示例? }] included. 示例: rate() 在...上 gauge → cause says "rate() applied 到 'metric' 哪个 appears 到 gauge", suggestion says "使用 delta() 或 deriv() 代替", 示例 shows corrected 查询.
Returns (hint): 当...时 查询 returns zero results, hint: { cause, suggestion } explains 为什么 (metric 可能 不 exist, label filters 可能 不 match) 和 suggests 使用 grafana_list_metrics 到 验证.
Returns (错误 带有 guidance): 在...上 查询 failure, includes guidance: { cause, suggestion, 示例? } alongside raw 错误. Pattern-matched 对于 common PromQL mistakes: unclosed parenthesis, missing range selector, 超时, auth failure, rate 在...上 gauge, etc. Omitted 当...时 错误 unrecognized.
Tip (chaining): Both instant 和 range responses include datasourceUid — pass directly 到 grafana_create_alert 或 其他 tools 没有 re-calling grafana_explore_datasources. enables zero-friction 查询→提醒 chains.grafana_query_logs
当...时: 用户 asks 关于 logs, errors, 或 needs 到 investigate issues 由 searching log data. 也 对于 会话 debugging, OTel log investigation, 和 re-running existing log panel queries.
Params: datasourceUid, expr (LogQL), queryType (instant/range, 默认 range), 开始/end (默认 现在-1h/现在), step (metric queries 仅), limit (默认 100), direction (backward/转发), lineLimit (max chars per log line, 默认 500, max 2000), extractFields (布尔值, 默认 假 — extract structured OTel attributes 进入 clean fields 对象), dashboardUid (可选 — resolve 查询 从 panel), panelId (可选 — 使用 带有 dashboardUid).
示例 log 搜索: { "datasourceUid": "loki1", "expr": "{任务=\"api\"} |= \"错误\"" }
示例 带有 filters: { "datasourceUid": "loki1", "expr": "{任务=\"api\"} |~ \"超时|refused\"", "limit": 50, "direction": "转发" }
示例 满 stack traces: { "datasourceUid": "loki1", "expr": "{任务=\"api\"} |= \"Exception\"", "lineLimit": 2000 }
示例 会话 debugging: { "datasourceUid": "loki1", "expr": "{service_name=\"openclaw\"} | json | 组件=\"lifecycle\"", "extractFields": 真 }
示例 metric 查询: { "datasourceUid": "loki1", "expr": "rate({任务=\"api\"}[5m])", "queryType": "range", "开始": "现在-6h", "end": "现在", "step": "60" }
示例 panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 18, "开始": "现在-24h", "extractFields": 真 }
Returns streams: { entries: [{ labels: {...}, 时间戳: "...", line: "..." }], datasourceUid, totalEntries, truncated } — capped 在 100 entries, lines 在 500 chars (设置 lineLimit: 2000 对于 满 stack traces).
Returns streams (extractFields): { entries: [{ labels: {...cleaned...}, 时间戳: "...", line: "...", fields: { 组件, event_name, session_id, trace_id, 模型, duration_s, ... } }], datasourceUid } — infrastructure noise labels removed, openclaw_ prefix stripped 从 字段 keys, numeric values auto-converted. 也 parses JSON log bodies 如果 present.
Returns streams (traceCorrelation): 当...时 extractFields: 真 和 entries contain trace_id, includes traceCorrelation: { traceIds: [...], tool: "grafana_query_traces", tip } — up 到 5 unique trace IDs 就绪 对于 grafana_query_traces 带有 queryType: "获取".
Returns metric: 相同 shape 作为 grafana_query range/instant results (matrix capped 在 50 series, vector capped 在 50 results — includes datasourceUid, truncated, totalSeries/totalResults, 和 truncationHint 当...时 exceeded).
Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. 如果 panel uses Prometheus datasource, returns 错误 directing 您 到 使用 grafana_query 代替.
Returns (错误 带有 guidance): 在...上 查询 failure, includes guidance: { cause, suggestion, 示例? } alongside raw 错误. Pattern-matched 对于 common LogQL mistakes: bare text 没有 stream selector, 空 {}, unclosed braces, missing label matchers, auth failure, 超时. Omitted 当...时 错误 unrecognized.
Tip: LogQL: {label="值"} selects streams, |= substring 过滤, |~ regex, != exclude. Metric wrappers: rate(), count_over_time(), bytes_rate(). 使用 extractFields: 真 当...时 investigating OTel/lifecycle logs — surfaces trace_id, session_id, event_name, 模型, 和 其他 attributes 作为 第一个-类 fields 代替 的 buried 在...中 raw labels.
Tip (panel re-run): 相同 作为 grafana_query — 设置 dashboardUid + panelId 到 auto-resolve LogQL 和 datasource. tool routes Prometheus panels 到 grafana_query 带有 helpful 错误.grafana_query_traces
当...时: 用户 asks 关于 traces, distributed tracing, slow spans, 会话 trace hierarchies, 或 needs 到 debug 请求 flows 穿过 services.
Params: datasourceUid, 查询 (TraceQL 表达式 或 trace ID), queryType (搜索/获取, 默认 搜索), 开始/end (默认 现在-1h/现在), limit (默认 20, max 50), minDuration/maxDuration (e.g., "1s", "10s"), dashboardUid (可选 — resolve 查询 从 panel), panelId (可选 — 使用 带有 dashboardUid).
示例 搜索: { "datasourceUid": "tempo1", "查询": "{ resource.服务.name = \"openclaw\" }" }
示例 搜索 slow: { "datasourceUid": "tempo1", "查询": "{ resource.服务.name = \"openclaw\" }", "minDuration": "5s" }
示例 搜索 带有 时间: { "datasourceUid": "tempo1", "查询": "{ span.gen_ai.system = \"anthropic\" }", "开始": "现在-24h", "limit": 50 }
示例 获取: { "datasourceUid": "tempo1", "查询": "abc123def456789...", "queryType": "获取" }
示例 panel re-run: { "dashboardUid": "openclaw-会话-explorer", "panelId": 12, "开始": "现在-24h" }
Returns 搜索: { traces: [{ traceId, rootServiceName, rootTraceName, startTime, durationMs, spanCount? }], datasourceUid, totalTraces, truncated?, correlationHint? } — capped 在 50 traces. 当...时 exceeded includes truncated: 真 和 truncationHint. 当...时 traces found, includes correlationHint: { logQuery, tool, tip } 带有 就绪-到-使用 LogQL 表达式 对于 grafana_query_logs.
Returns 获取: { traceId, spans: [{ traceId, spanId, parentSpanId?, operationName, serviceName, startTime, durationMs, status, kind?, attributes: {...} }], datasourceUid, totalSpans, truncated? } — flattened OTLP spans 带有 resolved attributes (字符串/数字/布尔值). Capped 在 200 spans. Sorted 由 开始 时间 (earliest 第一个).
Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. 如果 panel uses Prometheus 或 Loki datasource, returns 错误 directing 您 到 使用 正确 tool.
Returns (错误 带有 guidance): 在...上 查询 failure, includes guidance: { cause, suggestion, 示例? } alongside raw 错误. Pattern-matched 对于 common TraceQL mistakes: syntax errors, 无效 attributes, auth failure, 超时, 不-found, 无效 trace ID. Omitted 当...时 错误 unrecognized.
Returns (否 results): 当...时 搜索 returns zero traces, includes hint: { cause, suggestion } suggesting 到 broaden 查询 或 check datasource.
Tip: TraceQL: { } matches 所有 traces, resource.服务.name 对于 服务 过滤, span.http.status_code 对于 HTTP spans, name 对于 operation name, 持续时间 对于 span 持续时间, status 对于 错误/ok filtering. 使用 minDuration/maxDuration 到 查找 performance outliers. Trace-到-Log: 搜索 和 获取 results include correlationHint.logQuery — pass directly 到 grafana_query_logs 到 查找 correlated logs. Log-到-Trace: grafana_query_logs results (带有 extractFields: 真) include traceCorrelation.traceIds — pass 任何 ID 到 grafana_query_traces 带有 queryType: "获取".
Tip (panel re-run): 相同 作为 grafana_query — 设置 dashboardUid + panelId 到 auto-resolve TraceQL 和 datasource. tool routes Prometheus/Loki panels 到 正确 tool 带有 helpful 错误.grafana_create_dashboard
当...时: 用户 wants persistent dashboard 对于 ongoing monitoring.
Params: 模板 或 dashboard (custom JSON) — one 必填. 可选: title (overrides 模板 默认), folderUid (target folder), overwrite (默认 真).
Returns: { uid, url, status, 消息, suggestedNext?: [{ 模板, reason }], validation?: DashboardValidation }. 对于 模板-based dashboards, suggestedNext lists complementary templates 到 deploy 下一个. 对于 custom JSON dashboards, validation dry-runs 每个 panel's PromQL 和 reports per-panel health — check validation.panelsError 对于 broken queries.Choose right 模板 (3-tier SRE drill-down hierarchy):
Tier 1 → System: 开始 这里 对于 overall health. Tier 2 → 会话: Click 会话 从 Tier 1 到 investigate. Tier 3 → Deep Dive: Cost, tool, 或 SRE details.
| Template | Tier | Domain | Variables | Use When |
|---|---|---|---|---|
llm-command-center | Tier 1 | System overview | $prometheus, $loki, $tempo, $provider, $model, $channel | Golden signals, session table with click-to-drill-down, cost, cache, live feeds |
session-explorer | Tier 2 | Session debug | $prometheus, $loki, $tempo, $session (textbox) | Per-session trace hierarchy, LLM calls, tool calls, conversation flow |
cost-intelligence | Tier 3a | Cost analysis | $prometheus, $loki, $provider, $model | Spending trends, model attribution, cache savings, per-session cost table |
tool-performance | Tier 3b | Tool analytics | $prometheus, $loki, $tempo, $tool | Tool leaderboard, latency ranking, error rates, tool traces |
sre-operations | Tier 3c | SRE operations | $prometheus, $loki | Queue health, webhooks, stuck sessions, tool loops |
genai-observability | — | OTel gen_ai standard | $prometheus, $loki, $tempo, $model, $provider | Industry-standard AI monitoring: token analytics, LLM performance, traces, logs, cache efficiency. Works with any gen_ai data. |
node-exporter | — | System/DevOps | $datasource, $instance | Server CPU, memory, disk, network |
http-service | — | Web/DevOps | $datasource, $job | HTTP request rate, errors, latency (RED signals) |
metric-explorer | — | Any domain | $datasource, $metric | Deep-dive into any single metric from a dropdown |
multi-kpi | — | Any domain | $datasource, $metric1..$metric4 | 4-metric KPI overview (business, fitness, finance, IoT) |
weekly-review | — | Any domain | $datasource, $metric1, $metric2 | Weekly overview of 2 external metrics with trends + all openclaw_ext_ table |
示例 AI health: { "模板": "llm-command-center", "title": "My AI Dashboard" }
示例 会话 debug: { "模板": "会话-explorer", "title": "会话 Debug" }
示例 cost analysis: { "模板": "cost-intelligence", "title": "My AI Costs" }
示例 tool analytics: { "模板": "tool-performance", "title": "Tool Health" }
示例 SRE ops: { "模板": "sre-operations", "title": "SRE Health" }
示例 GenAI observability: { "模板": "genai-observability", "title": "GenAI Observability" }
示例 system: { "模板": "节点-exporter", "title": "Server Health" }
示例 generic: { "模板": "metric-explorer", "title": "Explore My Data" }
示例 multi-KPI: { "模板": "multi-kpi", "title": "Business KPIs" }
示例 weekly review: { "模板": "weekly-review", "title": "My Weekly Review" }
示例 custom 带有 validation: { "dashboard": { "title": "模型 Comparison", "panels": [{ "id": 1, "title": "Cost 由 模型", "类型": "timeseries", "targets": [{ "refId": "", "expr": "求和 由 (模型) (rate(openclaw_lens_cost_by_token_type[1h]))", "datasource": { "uid": "prometheus" } }] }] } }
Custom dashboard validation (returned 仅 对于 dashboard param, 不 templates):
validation: { panelsTotal: 3, panelsValid: 1, panelsNoData: 1, panelsError: 1, panelsSkipped: 0, details: [{ panelId: 1, title: "Cost by Model", status: "ok", queries: [{ refId: "A", expr: "...", valid: true, sampleValue: 0.42 }] }, { panelId: 2, title: "Latency", status: "nodata" }, { panelId: 3, title: "Bad Query", status: "error", error: "parse error at char 5" }] }
Panel statuses: ok (query returned data), nodata (valid query, no results — metric may not exist yet), error (PromQL syntax error or datasource issue), skipped (no datasource UID found). Dashboard is always created regardless — validation is informational.
grafana_update_dashboard
当...时: 用户 wants 到 添加 panel, 移除 panel, 更改 查询, 更新 dashboard settings, 或 删除 dashboard.
Params: uid (必填), operation (必填: add_panel, remove_panel, update_panel, update_metadata, 删除).
add_panel params: panel (对象 带有 title, 类型, targets). Auto-layouts 下面 existing panels.
remove_panel / update_panel params: panelId (preferred) 或 panelTitle (case-insensitive substring fallback). updates (对象) 对于 update_panel.
update_metadata params: title, description, tags, 时间 (e.g., { "从": "现在-7d", "到": "现在" }), 刷新 (e.g., "1m").
删除 params: 无 此外 uid — permanently removes dashboard. Always confirm 带有 用户 第一个.
示例 添加: { "uid": "abc123", "operation": "add_panel", "panel": { "title": "错误 Rate", "类型": "timeseries", "targets": [{ "refId": "", "expr": "rate(errors_total[5m])", "datasource": { "uid": "prom1" } }] } }
示例 添加 (否 datasource): { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Latency", "类型": "timeseries", "targets": [{ "refId": "", "expr": "histogram_quantile(0.99, rate(http_duration_bucket[5m]))" }] } } — validation skipped 如果 否 datasource UID found, panel 仍然 saved.
示例 移除: { "uid": "abc123", "operation": "remove_panel", "panelId": 3 }
示例 更新 panel: { "uid": "abc123", "operation": "update_panel", "panelId": 1, "updates": { "title": "新的 Title", "targets": [{ "refId": "", "expr": "new_query" }] } }
示例 更新 metadata: { "uid": "abc123", "operation": "update_metadata", "title": "My Dashboard v2", "时间": { "从": "现在-7d", "到": "现在" }, "刷新": "5m" }
示例 删除: { "uid": "abc123", "operation": "删除" }
Returns 更新: { status: "updated", uid, url, version, operation, panelCount, affectedPanel?: { id, title }, changedFields?: [...], queryValidation?: { validated, results, datasourceUid?, skippedReason? } }.
Returns queryValidation: 对于 add_panel 和 update_panel (当...时 targets 更改), PromQL queries dry-run against Grafana. 每个 结果: { refId, expr, 有效: 布尔值, 错误?: 字符串, sampleValue?: 数字 }. Panel always saved — validation informational. 如果 有效: 假, check 错误 字段 对于 PromQL syntax issues. 如果 skippedReason 设置, 否 datasource UID 是 found — include datasource: { uid: "..." } 在...上 targets 到 enable validation.
Returns 删除: { status: "deleted", uid, title, 消息 }.
Tip: targets 在...中 update_panel replaces entirely — include 所有 targets, 不 只是 changed ones. Include datasource.uid 在...上 targets 对于 查询 validation feedback.grafana_get_dashboard
当...时: 需要 到 inspect dashboard's panels — 查找 panel IDs 对于 sharing, 验证 structure, scan multiple dashboards 对于 overview, 或 audit 哪个 panels returning data.
Params: uid (必填). 可选: compact (布尔值, 默认 假) — return panel titles 和 types 仅, 否 queries 或 metadata (~70% smaller). audit (布尔值, 默认 假) — dry-run 每个 panel's 查询 和 添加 health status.
示例 (满): { "uid": "abc123" }
示例 (compact overview): { "uid": "abc123", "compact": 真 }
示例 (audit): { "uid": "abc123", "audit": 真 }
Returns (满): { uid, title, description?, url, tags, 时间?, 刷新?, panelCount, panels: [{ id, title, 类型, queries: [{ refId, expr }] }], folderUid, created?, updated? }.
Returns (compact): { uid, title, url, tags, panelCount, panels: [{ id, title, 类型 }] }.
Returns (audit): 相同 作为 满, 加上 每个 panel gets health: { status: "ok"|"nodata"|"错误"|"skipped", 错误?, sampleValue? } 和 响应 includes auditSummary: { ok, nodata, 错误, skipped }. Resolves 模板 变量 datasources ($prometheus, $loki) 和 replaces 表达式 模板 vars 带有 wildcards.
Tip: 使用 audit: 真 当...时 用户 asks "哪个 panels broken?" 或 "audit my dashboard" — replaces N separate grafana_query calls 带有 one tool call. 使用 compact: 真 对于 lightweight overview scans. Omit both 当...时 您 需要 查询 details (之前 更新 或 分享).grafana_search
当...时: 用户 mentions dashboard 由 name, 之前 creating one (check duplicates), 或 对于 reporting/audit workflows.
Params: 查询 (必填). 可选: tags (数组 — 过滤 由 tags), starred (布尔值 — 仅 starred), 排序 ("alpha-asc"/"alpha-desc"), limit (数字, 默认 100), enrich (布尔值 — 添加 updatedAt + panelCount per 结果, 默认 假).
示例: { "查询": "cost" }
示例 带有 tags: { "查询": "", "tags": ["production"] }
示例 starred: { "查询": "", "starred": 真, "limit": 10 }
示例 enriched: { "查询": "", "enrich": 真 }
Returns: { 计数, enriched, dashboards: [{ uid, title, url, tags, folderTitle?, folderUid?, updatedAt?, panelCount? }] }. folderTitle/folderUid always included 当...时 dashboard 在...中 folder. updatedAt (ISO 8601) 和 panelCount 仅 present 当...时 enrich: 真 — enables staleness detection 和 reporting 没有 per-dashboard get_dashboard calls.
Tip: 使用 enrich: 真 对于 reporting workflows ("哪个 dashboards stale?", "give me summary 的 所有 dashboards"). Skip enrichment 对于 simple lookups. 之后 finding dashboard, 使用 grafana_get_dashboard 到 inspect panels, grafana_share_dashboard 到 render 图表, 或 grafana_update_dashboard 到 修改 .grafana_share_dashboard
当...时: 用户 says "show me" 或 "发送 me" 图表/dashboard.
Params: dashboardUid, panelId (必填). 可选: 从 (默认 "现在-6h"), 到 (默认 "现在"), width (默认 1000), height (默认 500), 主题 ("light"/"dark", 默认 "dark").
示例: { "dashboardUid": "abc123", "panelId": 2, "从": "现在-6h", "到": "现在" }
Returns: Image rendered inline (tier 1), 或 snapshot URL (tier 2), 或 deep 链接 (tier 3). Always delivers something. Includes deliveryTier ("image" | "snapshot" | "链接"), rendererAvailable (布尔值 — 假 当...时 Image Renderer 插件 missing), renderFailureReason (为什么 image rendering 失败), 和 remediation (如何 到 fix ). Tier 3 也 includes snapshotFailureReason.
Tip: 使用 grafana_get_dashboard 第一个 到 查找 panel IDs. 如果 rendererAvailable 假, tell 用户 到 install grafana-image-renderer 插件.grafana_create_alert
当...时: 用户 wants notifications 当...时 metric crosses threshold.
Params: title, datasourceUid, expr (PromQL), threshold (所有 必填). 可选: evaluation ("instant"/"rate"/"increase", 默认 "instant"), evaluationWindow (默认 "5m", used 带有 rate/increase), 条件 (gt/lt/gte/lte, 默认 gt), 对于 (持续时间, 默认 5m), folderUid, labels (e.g., { "severity": "warning" }), annotations (e.g., { "summary": "Cost too high" }), noDataState (NoData/Alerting/OK, 默认 NoData).
IMPORTANT: 对于 counter metrics (_total), always 使用 evaluation: "rate" (per-第二个 rate) 或 evaluation: "increase" (总计 更改 在...上 window). Raw counter values always increase 和 将 immediately breach 任何 threshold. 使用 "instant" (默认) 仅 对于 gauges.
示例 gauge 提醒: { "title": "High Cost 提醒", "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "threshold": 5, "条件": "gt" }
示例 rate 提醒: { "title": "High 错误 Rate", "datasourceUid": "prom1", "expr": "openclaw_lens_webhook_error_total", "threshold": 0.1, "evaluation": "rate" }
示例 increase 提醒: { "title": "令牌 Burst", "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total", "threshold": 10000, "evaluation": "increase", "evaluationWindow": "1h" }
Returns: { uid, title, status: "created", datasourceUid, url, evaluation?: { mode, window, evaluatedExpr }, metricValidation: { 有效, 错误?, sampleValue? }, 消息 }. datasourceUid echoes back 哪个 datasource rule targets (验证 correctness). metricValidation dry-runs 表达式 之前 creation — 有效: 真 + sampleValue confirms data exists; 有效: 假 + 错误 warns 的 typos/missing metrics. 提醒 always created regardless (metric 可能 不 有 data 尚未). 当...时 evaluation "rate" 或 "increase", validation runs wrapped 表达式.
Note: Auto-creates "Grafana Lens Alerts" folder 如果 否 folderUid specified.grafana_annotate
当...时: 用户 deploys, changes 配置, 或 wants 到 mark 事件 对于 correlation.
Params: action ("创建" 默认, 或 "列表").
创建 params: text (必填), tags, dashboardUid, panelId, 时间 (epoch ms 或 relative 点赞 "现在-2h", 默认 现在), timeEnd (epoch ms 或 relative).
列表 params: 从, 到 (epoch ms 或 relative 点赞 "现在-7d", "现在-24h", "现在"), tags, limit (默认 20).
时间 formats: 所有 时间 params accept epoch ms (e.g., 1700000000000) 或 Grafana-样式 relative strings ("现在", "现在-1h", "现在-7d", "现在-30m"). Prefer relative strings — 它们're simpler 和 avoid arithmetic errors.
示例 创建: { "text": "Deployed v2.1.0", "tags": ["deploy", "production"] }
示例 创建 past: { "text": "Incident started", "时间": "现在-2h", "timeEnd": "现在-30m", "tags": ["incident"] }
示例 列表 recent: { "action": "列表", "从": "现在-7d", "到": "现在", "tags": ["deploy"] }
示例 列表: { "action": "列表", "tags": ["deploy"], "limit": 10 }
Returns 创建: { status: "created", id, 消息, 时间, comparisonHint: { beforeWindow: { 从, 到 }, afterWindow: { 从, 到 }, suggestion } }. comparisonHint provides 就绪-到-使用 ISO 8601 时间 ranges (30-min windows) 对于 之前/之后 comparison 通过 grafana_query — 否 manual 时间 math needed. 对于 region annotations (带有 timeEnd), afterWindow starts 在 timeEnd.
Returns 列表: { annotations: [{ id, text, tags, 时间, timeEnd?, dashboardUID?, panelId? }] }.grafana_check_alerts
当...时: Prompt context shows "GRAFANA ALERTS", 需要 到 manage 提醒 rules (列表/删除), 设置 up 提醒 webhook, silence alerts 期间 investigation, 或 acknowledge investigated 提醒.
Params: action ("列表" 默认, "acknowledge", "list_rules", "delete_rule", "silence", "unsilence", "setup").
列表 params: 无 — returns 所有 待处理 (unacknowledged) alerts. Instances capped 在 5 per 提醒.
Acknowledge params: alertId (必填) — marks 提醒 作为 investigated.
列表 rules params: compact (布尔值, 默认 假 — returns 仅 uid/title/state/条件). 满 mode returns 所有 configured 提醒 rules 从 Grafana 带有 UID, title, 条件 (PromQL), folder, labels, annotations, 和 live evaluation state (normal/firing/待处理/nodata/错误), health, 和 lastEvaluation. One call gives complete 提醒 health picture.
删除 rule params: ruleUid (必填) — permanently deletes 提醒 rule. 获取 UIDs 从 list_rules.
Silence params: matchers (必填 — 数组 的 { name, 值, isRegex? } 从 提醒's commonLabels), 持续时间 (默认 "2h"), 评论 (可选).
Unsilence params: silenceId (必填) — removes silence 所以 alerts 恢复 notifying.
Setup params: webhookUrl (可选, auto-detected) — creates webhook contact point 和 通知 policy 路由 在...中 Grafana.
示例 列表: {}
示例 acknowledge: { "action": "acknowledge", "alertId": "提醒-1" }
示例 列表 rules: { "action": "list_rules" }
示例 列表 rules compact: { "action": "list_rules", "compact": 真 }
示例 删除 rule: { "action": "delete_rule", "ruleUid": "abc123-def456" }
示例 silence: { "action": "silence", "matchers": [{ "name": "alertname", "值": "HighCost" }], "持续时间": "2h", "评论": "Investigating cost spike" }
示例 unsilence: { "action": "unsilence", "silenceId": "silence-uuid-123" }
示例 setup: { "action": "setup" }
Returns 列表: { status: "成功", alertCount, alerts: [{ id, status, title, 消息, receivedAt, commonLabels, totalInstances, truncated?, suggestedInvestigation?: { datasourceUid, 条件, tool, queryLanguage, hint }, instances: [{ status, labels, annotations, startsAt, values }] }] }. suggestedInvestigation auto-enriched 由 matching 提醒 到 rule — provides PromQL/LogQL 表达式, datasource, 和 tool 到 使用 对于 immediate investigation (eliminates 需要 对于 separate list_rules + explore_datasources calls).
Returns acknowledge: { status: "acknowledged", alertId }.
Returns list_rules: { status: "成功", ruleCount, rules: [{ uid, title, folder, ruleGroup, state, health, lastEvaluation, 对于, labels, annotations, 条件, updated }] }. state live evaluation state: "normal" (不 firing), "firing", "待处理" (在...内 对于 持续时间), "nodata", 或 "错误". Falls back 到 "unknown" 如果 eval state API 不可用. health "ok", "nodata", "错误", 或 "unknown". 条件 extracted PromQL 表达式 从 rule's data queries.
Returns list_rules (compact): { status: "成功", ruleCount, rules: [{ uid, title, state, 条件 }] }. Minimal fields 对于 multi-tool chains — 使用 当...时 您 需要 quick overview 的 所有 rules 没有 details.
Returns delete_rule: { status: "deleted", ruleUid, 消息 }.
Returns silence: { status: "silenced", silenceId, 持续时间, matchers, 消息 }.
Returns unsilence: { status: "unsilenced", silenceId, 消息 }.
Returns setup: { status: "created", contactPointUid, webhookUrl } 或 { status: "already_exists", contactPointUid }.
Note: Setup idempotent — safe 到 call multiple 乘以. 仅 alerts 带有 managed_by=openclaw label 路由 到 webhook (auto-added 由 grafana_create_alert). 使用 list_rules → delete_rule 对于 满 提醒 lifecycle management (创建 通过 grafana_create_alert, 列表/删除 通过 grafana_check_alerts).grafana_push_metrics
当...时: 用户 wants 到 track custom data (日历 events, git commits, fitness stats, financial data) 在...中 Grafana.
Params: action ("推送" 默认, "注册", "列表", "删除").
推送 params: metrics (必填 数组) — 每个: { name, 值, labels?, 类型?, help?, 时间戳? }. Names auto-获取 openclaw_ext_ prefix. 时间戳 可选 ISO 8601 对于 historical data (gauge 仅).
注册 params: name (必填), 类型 ("gauge"/"counter", 默认 "gauge"), help, labelNames (数组), ttlDays.
列表 params: 无 — returns 所有 custom metric definitions.
删除 params: name (必填) — removes custom metric.
示例 推送: { "metrics": [{ "name": "steps_today", "值": 8000 }, { "name": "meetings", "值": 3, "labels": { "类型": "standup" } }] }
示例 backfill: { "metrics": [{ "name": "steps", "值": 8000, "时间戳": "2025-01-15" }, { "name": "steps", "值": 10500, "时间戳": "2025-01-16" }] }
示例 mixed: { "metrics": [{ "name": "steps", "值": 9000, "时间戳": "2025-01-17" }, { "name": "heart_rate", "值": 72 }] }
示例 注册: { "action": "注册", "name": "weight_kg", "类型": "gauge", "help": "Body weight", "labelNames": ["person"], "ttlDays": 90 }
示例 列表: { "action": "列表" }
示例 删除: { "action": "删除", "name": "old_metric" }
Returns 推送: { status: "ok", accepted: 2, queryNames: { "openclaw_ext_steps": "openclaw_ext_steps", "openclaw_ext_events": "openclaw_ext_events_total" }, suggestedWorkflow: [{ tool, action, 示例 }], 消息: "..." }. suggestedWorkflow contains concrete 下一个-step examples 使用 actual pushed metric names — 验证 (grafana_query), visualize (grafana_create_dashboard 带有 metric-explorer 模板), 和 提醒 (grafana_create_alert, single-metric 仅). Partial 成功 supported. Timestamped 和 real-时间 points 在...中 相同 batch both accepted.
Returns 注册: { status: "registered", metric: { name, 类型, help, labelNames, ttlMs }, queryName: "openclaw_ext_events_total", suggestedWorkflow: [{ tool, action, 示例 }] }. suggestedWorkflow shows 如何 到 推送 data 和 查询 registered metric (带有 rate() wrapping 对于 counters).
Returns 列表: { 计数, metrics: [{ name, 类型, queryName, help, labelNames, createdAt, updatedAt }] }.
Returns 删除: { status: "deleted", name }.
Note: 推送 auto-registers unknown metrics. 响应 includes queryNames 带有 exact PromQL names 和 suggestedWorkflow 带有 concrete 下一个 steps. 关注 suggestedWorkflow 到 complete 推送→visualize pipeline. Timestamped pushes gauge-仅 — counters 带有 timestamps rejected. See external-data.md 对于 naming conventions 和 backfill patterns.grafana_explain_metric
当...时: 用户 asks "什么 做 metric mean?", "为什么 做过 spike?", " normal?", 或 "show me trend".
Params: datasourceUid (必填), expr (PromQL 或 plain metric name, 必填), period (24h/7d/30d, 默认 24h), compareWith ("上一个" — compare current period 带有 相同-length window immediately 之前 ).
示例: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd" }
示例 counter: { "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total" }
示例 7d: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d" }
示例 comparison: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d", "compareWith": "上一个" }
示例 PromQL: { "datasourceUid": "prom1", "expr": "rate(http_requests_total[5m])", "period": "24h" }
Returns: { metricType?, trendQuery?, current: { 值, 时间戳 }, healthContext?: { status, thresholds, description, direction }, trend: { changePercent, direction, 第一个, 最后的 }, stats: { min, max, avg, samples }, comparison?: { previousPeriod: { 从, 到, avg, min, max, samples }, 更改: { absolute, percentage, direction } }, metadata: { 类型, help, unit }, suggestedQueries?: [{ 查询, description }], suggestedBreakdowns?: 字符串[] }. Sections omitted 当...时 data 不可用. changePercent 空 当...时 第一个 值 zero. healthContext included 对于 well-known openclaw_lens_ gauge metrics — 相同 作为 grafana_query.
Counter-aware: Auto-detects counter metrics (通过 metadata 类型 或 _total suffix) 和 wraps trend 查询 在...中 rate(expr[5m]). current 值 stays raw (cumulative 总计), 但是 trend 和 stats show rate 的 更改. metricType 字段 tells 您 detected 类型 (counter/gauge/histogram). trendQuery shows actual PromQL used 对于 trend (仅 present 当...时 不同 从 expr).
Drill-down: 对于 multi-dimensional metrics (metrics 带有 labels 点赞 模型, token_type, provider), 响应 includes suggestedQueries — 就绪-到-使用 PromQL queries 对于 grafana_query break down metric 由 每个 label. Counter metrics 获取 rate() wrapping automatically. 使用 these 到 investigate cost attribution, identify top contributors, 或 decompose aggregates.
Breakdowns: suggestedBreakdowns provides label names 对于 decomposition — always 可用 对于 known OpenClaw metrics (cost, 会话, 队列, webhook families) 甚至 当...时 metric 有 否 data 尚未. 对于 unknown metrics, falls back 到 labels discovered 从 instant 查询. 使用 these labels 带有 grafana_query 到 build 求和 由 (label) (...) queries 对于 root-cause analysis.
Period comparison: 使用 compareWith: "上一个" 对于 period-在...上-period analysis (e.g., week vs. 最后的 week). Returns comparison 对象 带有 上一个 period's stats 和 更改 (absolute, percentage, direction). Works 带有 counters too (compares rates). Eliminates 需要 对于 manual multi-查询 workflows.
Tip: 对于 simple trend context, call 带有 只是 period. 对于 "做过 things improve?" questions, 添加 compareWith: "上一个". Metadata 仅 可用 对于 plain metric names (不 complex PromQL). 否 需要 到 manually wrap counters 在...中 rate() — tool 做 automatically.grafana_security_check
当...时: 用户 asks "am I 正在 attacked?", "security status", "security audit", "security check", 或 wants co免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制