Grafana Lens — Grafana 镜头

Name: Grafana Lens — Grafana 镜头
Rating: 1 (1 reviews)
Author: awsome-o

awsome-o

🔭 Grafana Lens — Grafana 镜头

v0.5.0

Grafana 镜头工具。

1· 484·3 当前·3 累计

by @awsome-o·MIT-0

数据分析开发工具自动化安全

下载技能包

License

MIT-0

最后更新

2026/4/13

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

The skill's claimed Grafana/Alloy capabilities match the code and docs, but the bundle is very large and contains lifecycle/telemetry components, recipe patterns that reference host-level sources (Docker socket, file tails, syslog, listening ports) and prompt-injection tokens in SKILL.md — these factors warrant careful review before installing and giving it access to your Grafana API key or host resources.

评估建议

Before installing or enabling Grafana Lens, do the following: - Review the bundled source (especially lifecycle-telemetry*, alloy-client, and any code that sends telemetry or posts to external endpoints) to confirm where data is sent and whether any telemetry leaves your environment. - Apply least privilege: create a Grafana API key scoped only to the operations you want (e.g., read-only for queries; narrower write scope if you allow dashboard creation). Avoid giving a single high-privilege key...

详细分析 ▾

ℹ 用途与能力

Name/description (Grafana visualization, alerts, Alloy pipeline management) align with the code and docs: grafana client, 18 agent tools, Alloy pipeline recipes, OTLP push providers and many dashboard templates. The required config keys (grafana.url, grafana.apiKey) are appropriate. However the skill's scope is very broad (collecting logs from Docker/socket, tailing files, binding 0.0.0.0 ports, creating exporters for DBs, managing pipelines) — these host-level capabilities are consistent with pipeline management but significantly expand what an agent can ask the runtime to do, so they deserve extra scrutiny.

⚠ 指令范围

SKILL.md contains detailed runtime instructions and 'musts' that keep operations within the Grafana/Alloy domain, but it also includes instructions and recipe examples that reference local files, Docker socket, systemd journal, binding network ports, and credentials via env vars. The SKILL.md itself also contains prompt-injection patterns (e.g., ignore-previous-instructions, system-prompt-override) — likely because the skill implements prompt-injection detection, but presence of those tokens still raises risk if an agent's tooling incorrectly interprets or forwards them. Overall the instructions can cause an agent to: create pipeline configs that access host resources, ask for credentials or env values at runtime, and push potentially sensitive telemetry (LLM inputs/outputs) unless redaction is correctly enforced.

✓ 安装机制

No install spec is provided (instruction-only install path), which lowers supply-chain risk from remote downloads. The registry bundle includes a large codebase (many TS files) but there is no external installer or archive download URL to inspect. That said, the plugin will install code into the agent environment via the platform's normal mechanism — review the bundled source before enabling.

⚠ 凭证需求

Declared requirements are limited to grafana.url and grafana.apiKey, which are proportional. However recipe docs and code reference using environment variables (sys.env()) for database connection strings, Kafka credentials, and other pipeline secrets; recipes may instruct the operator to provide connection strings or env vars. The skill also claims to capture session-scoped LLM inputs and lifecycle telemetry (with redaction claimed). Because the skill can be asked to create pipelines that reference host secrets or to accept credentials as parameters, there is real risk of exposing additional environment variables or secrets if the operator is not careful. Requesting a Grafana API key is expected, but ensure it is least-privilege.

ℹ 持久化与权限

The skill is not marked always:true and uses the normal autonomous-invocation default. It includes lifecycle-telemetry and many long-running pipeline recipes (OTLP receivers, HTTP listeners) in its design; creating those pipelines on a host can lead to long-lived processes and listening sockets. This is consistent with the stated purpose but increases blast radius if misconfigured. The skill does not declare that it will modify other skills' config, but review hooks that perform hot-reload or pipeline status checks before enabling.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.5.02026/3/7

Major update: Adds Alloy pipeline management, new recipes, and greatly expands data collection capabilities. - Introduced Alloy pipeline management with the new `alloy_pipeline` tool and support for recipes, creation, status, diagnosis, and deletion actions. - Added dozens of Alloy pipeline recipes and helpers for setting up metrics, logs, traces, exporters, and agent integrations. - Expanded documentation and quickstart references for Alloy, pipeline composition, and common data-collection use cases. - Extended the SKILL to include scenarios for managing data collection pipelines, collecting logs and metrics from multiple sources, and handling credentials securely. - Updated and refined limits, troubleshooting, and best-practices guidance in user instructions.

● 无害

安装命令点击复制

官方npx clawhub@latest install grafana-lens

镜像加速npx clawhub@latest install grafana-lens --registry https://cn.clawhub-mirror.com

技能文档

You have full native Grafana access — query data, create dashboards, set alerts, receive alert notifications, annotate events, explore datasources, push custom data, and deliver visualizations inline. Works with ANY data in Grafana, not just agent metrics.

Musts

Always call grafana_explore_datasources 第一个 当...时您需要 datasource UID — never guess UIDs
Always call grafana_search 之前 creating dashboard — avoid duplicates
Always call grafana_get_dashboard 之前 grafana_share_dashboard — 您需要 exact panel IDs
Always call grafana_get_dashboard 之前 grafana_update_dashboard — 您需要 panel IDs 和 current structure
Prefer grafana_query 对于 direct answers 在...上 creating dashboards — "什么's my cost?" needs 数字, 不 URL
Prefer grafana_query 在...上 grafana_create_dashboard + grafana_share_dashboard 对于 simple data questions — 数字 faster 比图表
使用 grafana_query_logs 对于 log searches — LogQL 对于 logs, PromQL 对于 metrics, TraceQL 对于 traces. Never 使用 grafana_query 对于 Loki datasources
使用 grafana_query_traces 对于 trace searches — TraceQL 对于 traces, PromQL 对于 metrics, LogQL 对于 logs. Never 使用 grafana_query 或 grafana_query_logs 对于 Tempo datasources
所有 tools work 带有任何 Prometheus datasource — 不只是 openclaw_lens_ metrics

当...时您 see "GRAFANA ALERTS" 在...中 prompt context, investigate immediately 带有 grafana_check_alerts — 使用 suggestedInvestigation 字段到 go directly 到 querying ( provides tool, 查询, 和 datasource)

Run grafana_check_alerts 带有 action setup once 之前提醒 notifications 可以 reach agent — creates webhook contact point

推送 data 之前 querying 或 dashboarding — data pushed 通过 OTLP 和可用 immediately

Prefer grafana_explain_metric 对于 "什么 metric?" questions 在...上 manual grafana_query — returns current 值, trend, stats, 和 metadata 在...中 one call

使用 queryNames 从推送响应对于 PromQL queries — don't guess metric names (counters 获取 _total suffix)

使用 openclaw_ext_ prefix 对于 custom metrics — grafana_push_metrics auto-prepends 如果 missing

关注 statistics-第一个 discipline 对于 log investigation — always run 计数/rate LogQL 之前 reading individual entries. 使用 grafana_query_logs 带有 metric-在...上-logs queries (count_over_time, rate, topk) 之前 switching 到 raw log entries

Silence alerts 期间 investigation — 使用 grafana_check_alerts 带有 action silence 到 prevent repeat notifications 当...时 investigating

使用 list_rules 对于 complete 提醒 health — grafana_check_alerts 带有 action list_rules returns 所有 rules 带有 live eval state (normal/firing/待处理/nodata/错误), health, 和 lastEvaluation — 否需要到 cross-reference 带有 列表 action

使用 dashboardUid + panelId 到 re-run panel queries — don't manually extract PromQL/LogQL 从 get_dashboard 输出. Both grafana_query 和 grafana_query_logs accept these params 到 auto-resolve panel's 查询表达式和 datasource. tool handles 模板变量 replacement 和 datasource routing automatically

Confirm 带有用户之前 deleting dashboards 或提醒 rules — grafana_update_dashboard 带有 operation 删除 和 grafana_check_alerts 带有 action delete_rule permanent 和 cannot undone

Always 使用 alloy_pipeline action recipes 第一个 当...时 unsure 哪个 pipeline recipe fits 用户's 请求 — 因为 recipes provide validation, credential handling, 和 sample queries raw 配置做不

Always call alloy_pipeline action status 之后 creating pipeline — 因为 data takes 15-20s 到 flow 通过 pipeline, 和 components 可能失败 silently 之后重新加载

Never guess Alloy 组件 names — 使用 recipes 对于 known patterns, 或 raw 配置 仅当...时用户 explicitly provides Alloy syntax

Prefer recipes 在...上 raw 配置 当...时 recipe exists — recipes provide validation, sample queries, credential handling, dashboard templates, 和 automatic 导出 target wiring

Never 写入 credentials 进入 raw 配置 — 当...时用户 provides 连接字符串, DSN, 密码, 或 API 键, ALWAYS 使用 matching recipe (哪个 routes credentials 通过 sys.env(), keeping secrets off disk). 如果您必须使用 raw 配置, wrap sensitive values 在...中 sys.env("MY_VAR_NAME") 和 tell 用户到设置 env var 在哪里 Alloy runs

读取 envVarsRequired 从 every pipeline 创建响应 — credential recipes 可能 return pending_credentials status 当...时 env vars aren't 设置尚未. Tell 用户 exact var names 和它们必须设置 them 在哪里 Alloy runs, 然后验证带有 action status

Warn users 之前 creating credential-必填 pipelines — Alloy 配置重新加载 atomic: 如果 credential recipe's env vars aren't 设置, 重新加载 failure blocks 所有 managed pipelines (不只是新的 one) until env vars 设置或 pipeline deleted. Always ask: "做您有 credentials 就绪到设置作为 env vars 在...上 Alloy host?"

Chain pipeline creation 进入 existing tools — 之后 pipeline 活跃: grafana_list_metrics 或 grafana_query_logs 到 discover data, grafana_create_dashboard 到 visualize, grafana_create_alert 到 monitor

使用 alloy_pipeline action diagnose 作为第一个 step 当...时用户 reports pipeline issues — 因为 checks Alloy connectivity, 所有 pipeline health, 配置 file drift, 和 limits 在...中 one call

Confirm 带有用户之前 deleting pipelines — alloy_pipeline 带有 action 删除 removes 配置和 data stops flowing

所有 log recipes accept 处理中 params — don't 创建 separate "处理中" pipelines. 添加 jsonExpressions, labelFields, structuredMetadata, tenantValue, matchRoutes, etc. directly 到任何 log recipe (docker-logs, file-logs, syslog, etc.)

使用 samplingPolicies 对于 multi-policy tail sampling — don't 创建 raw 配置当...时 application-traces 可以 handle . sampleRate 对于 simple probabilistic, samplingPolicies 对于 intelligent multi-policy (keep errors, keep slow, sample rest)

使用 log 处理中 params 对于 multi-tenant routing — tenantValue/tenantSource/matchRoutes work 在...上所有 log recipes. Don't 创建 separate "routing" pipelines

读取 references/alloy-components.md 之前 composing raw 配置 — 有复制-pasteable snippets 对于所有 common Alloy components

Quick Decision Tree

"什么 [metric]?" / "为什么做过 spike?" → grafana_explain_metric

"什么's current 值的 X?" / complex PromQL → grafana_query

"查找错误 logs" / "搜索 logs 对于..." → grafana_query_logs

"查找 slow traces" / "Show trace 对于会话 X" / "Debug distributed spans" → grafana_query_traces

"Debug 会话" / "为什么做过失败?" / "什么 went wrong?" → grafana_query_traces (搜索错误/slow) → grafana_query_traces (获取 → 关注 correlationHint) → grafana_query_logs → grafana_query → grafana_annotate

"Show me 图表" / "Visualize..." → grafana_search → grafana_get_dashboard → grafana_share_dashboard

"创建 dashboard 对于..." → grafana_search (check duplicates) → grafana_create_dashboard

"添加 panel 到 my dashboard" → grafana_get_dashboard → grafana_update_dashboard

"删除 dashboard" → grafana_update_dashboard 带有 operation 删除 (confirm 带有用户第一个)

"提醒 me 当...时..." → grafana_check_alerts (setup) → grafana_create_alert

"列表 my 提醒 rules" / "什么 alerts 做 I 有?" → grafana_check_alerts 带有 action list_rules

"删除提醒 rule X" → grafana_check_alerts 带有 action list_rules → delete_rule 带有 ruleUid

"Track my [custom data]" / "记录 my [past data]" → grafana_push_metrics (带有可选 时间戳 对于 historical data, auto-registers, returns queryNames) → grafana_query 带有 queryNames

"什么 data sources 做 I 有?" → grafana_explore_datasources

"什么 metrics 可用?" → grafana_list_metrics

"设置 up monitoring" / "Monitor my agent" / "什么 dashboards 应该 I 有?" → grafana_search (check existing) → grafana_create_dashboard 带有 llm-command-center → 关注 suggestedNext chain 通过 remaining templates

"GenAI observability" / "OTel gen_ai metrics" / "Standard AI monitoring" → grafana_create_dashboard 带有 genai-observability 模板

"什么 happened 在...中会话 X?" / "Debug 会话" → grafana_create_dashboard 带有 会话-explorer 模板 → paste 会话 ID

"Show me LLM traces" / "Show agent logs" → grafana_create_dashboard 带有 llm-command-center 模板 (Loki + Tempo)

"如何 much am I spending?" / "Cost analysis" → grafana_create_dashboard 带有 cost-intelligence 模板

"哪个 tools slow?" / "Tool errors" → grafana_create_dashboard 带有 tool-performance 模板

"队列 health" / "Webhook issues" / "Stuck sessions" → grafana_create_dashboard 带有 sre-operations 模板

"System health check" / "Status 举报" / "Review 所有 dashboards" → grafana_explore_datasources → grafana_check_alerts (列表 + list_rules) → grafana_search → grafana_get_dashboard (audit=真对于每个) → summarize

"Audit my dashboard" / "哪个 panels broken?" → grafana_get_dashboard (audit=真) → review auditSummary + per-panel health

"Am I 正在 attacked?" / "Security check" / "Security status" → grafana_security_check

"设置 up security monitoring" → grafana_check_alerts (setup) → grafana_create_dashboard (security-overview) → grafana_create_alert (webhook 错误 burst, cost spike, tool loops, injection signals)

"Investigate security 提醒" → grafana_security_check → grafana_query_logs (correlate) → grafana_annotate (mark investigation) → grafana_check_alerts (silence)

"Investigate 提醒" / "为什么 X broken?" / "Debug issue" / "Triage" / "Root cause" → grafana_investigate (multi-signal triage) → 关注 suggestedHypotheses.testWith 对于 deep-dives

" metric normal?" / " 那里 anomaly?" → grafana_explain_metric (returns anomaly z-score + seasonality vs 1d/7d ago 对于 24h period)

"RED analysis" / "什么's 错误 rate?" / "服务 health" → RED 方法 queries (see sre-investigation.md §2)

"提醒 fatigue" / "哪个 alerts noisy?" / "提醒 health" → grafana_check_alerts 带有 action analyze — fatigue 举报

"Postmortem" / "Incident summary" / "什么 happened?" → grafana_investigate → 5-Phase methodology → postmortem 模板 (see sre-investigation.md §9)

"Compare 之前/之后 deployment" → grafana_annotate (列表, tags: ["deploy"]) → grafana_explain_metric (compareWith: "上一个")

Data Collection Pipelines (Alloy)

"Monitor 服务/数据库/app" → alloy_pipeline action recipes (过滤由 category) → select recipe → 创建 → status → 查询 → dashboard → 提醒

"Scrape metrics 从 [endpoint]" / "My app exposes /metrics" → alloy_pipeline 带有 recipe scrape-endpoint + params { url }

"Monitor PostgreSQL/MySQL/Redis/MongoDB/Memcached" → alloy_pipeline 带有 recipe [db]-exporter + params { connectionString }

"Collect 和解析 logs 带有 JSON extraction" → alloy_pipeline (log recipe + 处理中 params: jsonExpressions, labelFields, structuredMetadata)

"Collect Docker logs" / "See container logs 在...中 Grafana" → alloy_pipeline 带有 recipe docker-logs

"Tail log files" / "Collect app logs 从 /var/log" → alloy_pipeline 带有 recipe file-logs + params { paths }

"Accept logs 通过 HTTP 推送 API" / "Centralized log gateway" → alloy_pipeline 带有 recipe loki-推送-api

"Consume logs 从 Kafka" → alloy_pipeline 带有 recipe kafka-logs + params { brokers, topics }

"设置 up syslog collection" → alloy_pipeline 带有 recipe syslog

"Monitor endpoint availability" / "Synthetic probing" / "HTTP health checks" → alloy_pipeline 带有 recipe blackbox-exporter + params { targets }

"Kubernetes monitoring" / "Monitor my K8s cluster" → alloy_pipeline 带有 recipe kubernetes-pods + kubernetes-services + kubernetes-logs (3 pipelines)

"接收 OTLP data" / "设置 up trace collection" → alloy_pipeline 带有 recipe otlp-receiver

"Generate RED metrics 从 traces" / "Span metrics" → alloy_pipeline 带有 recipe span-metrics

"服务 dependency 图形从 traces" → alloy_pipeline 带有 recipe 服务-图形

"Monitor Alloy itself" / "Self-monitoring" → alloy_pipeline 带有 recipe self-monitoring

"Redact secrets 从 logs" / "Compliance logging" → alloy_pipeline 带有 recipe secret-过滤-logs + params { paths }

"Monitor Elasticsearch/Kafka" → alloy_pipeline 带有 recipe elasticsearch-exporter / kafka-exporter

"System metrics" / "节点 monitoring" / "CPU/memory/disk" → alloy_pipeline 带有 recipe 节点-exporter

"Docker container metrics" / "Container resource usage" → alloy_pipeline 带有 recipe docker-metrics

"归约 trace costs" / "Keep 仅错误 traces" / "Smart trace sampling" / "Tail sampling" → alloy_pipeline 带有 recipe application-traces + samplingPolicies 数组 (keep errors, keep slow, 过滤 health checks, sample rest)

"Multi-tenant Loki" / "路由 logs 由 tenant" / "不同 tenants 对于不同 apps" → 任何 log recipe + tenantValue 或 matchRoutes 处理中 param

"个人资料 my app" / "CPU profiling" / "Memory profiling" / "Continuous profiling" / "Go pprof" → alloy_pipeline 带有 recipe continuous-profiling + targets

"Frontend observability" / "Browser RUM" / "Web vitals" / "Faro SDK" → alloy_pipeline 带有 recipe faro-frontend

"GELF logs" / "Graylog" / "Docker GELF driver" → alloy_pipeline 带有 recipe gelf-logs

"Custom Alloy pattern" / "Advanced pipeline" → 读取 references/alloy-components.md → alloy_pipeline 带有 raw 配置 + 可选 sampleQueries

"什么 data collection recipes 可用?" → alloy_pipeline 带有 action recipes

"什么 pipelines 做 I 有?" / "Pipeline 列表" → alloy_pipeline 带有 action 列表

" my pipeline working?" / "Pipeline health" → alloy_pipeline 带有 action status + name

"Pipeline problems" / "为什么 isn't data showing up?" → alloy_pipeline 带有 action diagnose → 关注 remediation

"删除 pipeline" / "移除 monitoring 对于..." → alloy_pipeline 带有 action 删除 + name (confirm 带有用户第一个)

Working 带有 Multiple Grafana Instances

When several Grafana environments are configured (dev, staging, prod), every tool accepts an optional instance parameter. grafana_explore_datasources returns availableInstances — use the name values from that list.

为什么 matters: Users often 需要到查询 production metrics, 创建 dashboards 在...中 dev, 或 compare environments side 由 side. 每个 tool call targets one instance.

Smart defaults: Omitting instance always targets configured 默认 — safe 和 invisible 对于 single-environment setups. 仅 specify instance 当...时用户 explicitly names non-默认 environment.

Cross-environment workflows: 每个 call independent. 查询 prod, 创建 dashboard 在...中 dev — 只是设置 instance differently 在...上每个 call. 否 context switching needed.

Tool Inventory

Tool	What It Does
`grafana_explore_datasources`	Discover configured datasources (UIDs, types, query routing) — tells you which tool + query language to use for each datasource
`grafana_list_metrics`	Discover available metrics or label values from a datasource. Use `compact: true` with `metadata: true` for minimal fields in multi-tool chains
`grafana_query`	Run PromQL instant/range queries — get numbers directly
`grafana_query_logs`	Run LogQL queries against Loki — search and filter logs
`grafana_query_traces`	Run TraceQL queries against Tempo — search traces or get full trace by ID
`grafana_create_dashboard`	Create dashboards from templates or custom JSON
`grafana_update_dashboard`	Add/remove/update panels, change dashboard metadata, or delete dashboard
`grafana_get_dashboard`	Get dashboard summary (panels, queries). Use `compact: true` for overview scans, `audit: true` to health-check all panels in one call
`grafana_search`	Search existing dashboards by title, tags, or starred status
`grafana_share_dashboard`	Render panel as image and deliver inline via messaging
`grafana_create_alert`	Create Grafana-native alert rules on any metric
`grafana_annotate`	Create or list annotations (events) on dashboards
`grafana_check_alerts`	Check, acknowledge, list/delete rules, silence/unsilence, or set up Grafana alert webhook notifications. Use `compact: true` with `list_rules` for minimal fields
`grafana_push_metrics`	Push custom data (calendar, git, fitness, finance) via OTLP
`grafana_explain_metric`	Get metric context: current value, trend, stats, metadata, drill-down queries — agent interprets
`grafana_security_check`	Run 6 parallel security checks and return threat-level assessment (green/yellow/red) — "Am I being attacked?"
`grafana_investigate`	Multi-signal investigation triage — gathers metrics, logs, traces, and context in parallel, generates hypotheses with specific tool+params for follow-up
`alloy_pipeline`	Create and manage Alloy data collection pipelines — 29 recipes for metrics, logs, traces, profiles from any infrastructure (databases, K8s, Docker, apps, profiling, frontend RUM)

Tool Details
grafana_explore_datasources
当...时: 第一个 step 当...时用户 mentions data, metrics, 或 monitoring. Gets datasource UIDs needed 由 grafana_query, grafana_query_logs, grafana_query_traces, grafana_list_metrics, grafana_create_alert, 和 grafana_explain_metric. Params: instance (可选 — target Grafana instance, omit 对于默认). 示例: {} 示例 (multi-instance): { "instance": "prod" } Returns: 列表的 datasources 带有 uid, name, 类型, isDefault, 加上 routing hints: queryTool (哪个 agent tool 到使用, e.g. "grafana_query", "grafana_query_logs", 或 "grafana_query_traces"), queryLanguage (e.g. "PromQL", "LogQL", "TraceQL"), 和 supported (布尔值 — whether agent tool 可以查询 datasource). 使用 queryTool 到 pick right tool 对于每个 datasource. 当...时 multiple Grafana instances configured, 也 returns instance (哪个 instance 是 queried) 和 availableInstances (列表的 { name, url, isDefault } 对于所有 configured instances).
grafana_list_metrics
当...时: 用户 asks "什么 metrics 可用?" 或您需要到 discover metrics 之前 querying 或 composing dashboards. 也当...时 grouping metrics 由函数 — metadata mode adds category 到每个 openclaw_ metric. 使用 purpose 当...时用户 asks 关于 specific concern (e.g., "performance metrics", "cost metrics"). Params: datasourceUid (必填), prefix (过滤由 prefix), 搜索 (targeted discovery — server-side regex, 仅 matching metrics returned), purpose ("performance" | "cost" | "reliability" | "capacity" — pre-过滤由 intent, composable 带有 prefix 和搜索), label (列表 label values 代替), metadata (布尔值 — enriched results 带有类型/help/category), compact (布尔值 — 带有 metadata, returns 仅 name/类型/category, ~60% smaller). 示例 names: { "datasourceUid": "prom1", "prefix": "openclaw_lens_" } 示例搜索: { "datasourceUid": "prom1", "搜索": "steps" } 示例 purpose: { "datasourceUid": "prom1", "purpose": "performance", "metadata": 真 } 示例 combined: { "datasourceUid": "prom1", "prefix": "openclaw_ext_", "搜索": "fitness" } 示例 metadata: { "datasourceUid": "prom1", "metadata": 真, "prefix": "openclaw_" } 示例 compact: { "datasourceUid": "prom1", "metadata": 真, "compact": 真 } Returns names: { metrics: ["metric1", "metric2", ...] }. Truncated 在 200. Returns metadata:

{ metadataSource, categorySummary: { cost: 3, usage: 4, 会话: 5, ... }, metrics: [{ name, 类型, help, category?, source? }, ...] }

. 使用之前 composing custom dashboards — 类型 tells 您 counter vs gauge vs histogram, category groups openclaw_ metrics 由函数. 搜索也 matches help text. Categories: cost, usage, 会话, 队列, messaging, webhook, tools, agent, custom. categorySummary gives counts per category 对于 quick overview (omitted 当...时否 openclaw_ metrics). Purpose maps: performance → 会话 + tools, cost → cost + usage, reliability → webhook + messaging + agent, capacity → 队列 + 会话. metadataSource: "prometheus" 当...时 Prometheus metadata endpoint 有 data, "synthetic" 当...时 OTLP-仅 (metadata synthesized 从 known metric registry — histogram sub-metrics deduplicated, 类型/help 从 Grafana Lens definitions). 在...上 OTLP stacks, includes hint explaining 为什么 metadata synthetic. source: "synthetic" 在...上 individual entries 从 registry; source: "custom" 在...上 entries 从 custom metrics store. Returns compact: { metadataSource, categorySummary: {...}, metrics: [{ name, 类型, category? }, ...] }. 相同作为 metadata 但是 drops help, source, labelNames — 使用在...中 multi-tool chains 在哪里您需要 metric names 和 types 但是不满 descriptions. 示例 label: { "datasourceUid": "prom1", "label": "任务" } Returns label: { label, 计数, totalCount, values: ["value1", "value2", ...] }. Truncated 在 200.

`grafana_query`

当...时: 用户 asks data question needs direct answer, 不 dashboard. 也对于 re-running existing dashboard panel's 查询带有不同时间 ranges. Params: datasourceUid, expr (PromQL), queryType (instant/range), 开始 (range 仅, 必填), end (range 仅, 默认 "现在"), step (range 仅, 可选 — auto-calculated 从时间 range 如果 omitted, targeting ~300 datapoints), dashboardUid (可选 — resolve 查询从 panel), panelId (可选 — 使用带有 dashboardUid). 示例 instant: { "datasourceUid": "prom1", "expr": "求和(increase(openclaw_lens_cost_by_model_total[1d])) 或 vector(0)" } 示例 range (auto-step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "开始": "现在-30d" } 示例 range (explicit step):

{ "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "开始": "现在-1h", "end": "现在", "step": "60" }

示例 panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 10, "queryType": "range", "开始": "现在-7d" } Tip: 开始/end accept Unix seconds 或 relative expressions 点赞 "现在-1h", "现在-7d". 对于 range queries, 只是设置 开始 — end defaults 到 "现在" 和 step auto-calculated. Override step 仅当...时您需要 specific resolution. Tip (panel re-run): 设置 dashboardUid + panelId 到 re-run panel's 查询没有 manually extracting PromQL. tool auto-resolves expr 和 datasourceUid 从 panel definition. 模板 variables replaced 带有 wildcards. 您可以仍然 override expr 或 datasourceUid explicitly 如果 needed. 获取 panel IDs 从 grafana_get_dashboard. Returns instant:

{ metrics: [{ metric: {...}, 值: "1.23", 时间戳: "...", healthContext?: { status, thresholds, description, direction } }], datasourceUid, resultCount, warnings?, hint? }

— healthContext included 对于 well-known openclaw_lens_ gauge metrics, providing SRE-grade health assessment: status ("healthy"/"warning"/"critical"), thresholds (warning/critical values), description (什么 metric means), direction ("higher_is_worse"/"lower_is_worse"). Omitted 对于 unknown metrics. Capped 在 50 results; 当...时 exceeded includes truncated: 真, totalResults, 和 truncationHint advising 到 narrow 查询. Returns range: { series: [{ metric: {...}, values: [{ 时间, 值 }...] }], datasourceUid, resultCount, warnings?, hint? } — truncated 到 20 points per series 和 50 series max. 当...时 series truncated includes truncated: 真, totalSeries, 和 truncationHint. 当...时 step auto-calculated, includes step: { 值: "288s", display: "5m", auto: 真 }. Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal 查询 results. 如果 panel uses Loki datasource, returns 错误 directing 您到使用 grafana_query_logs 代替. Returns (warnings): 当...时 Prometheus flags non-fatal issue (e.g., rate() 在...上 gauge), warnings: [{ cause, suggestion, 示例? }] included. 示例: rate() 在...上 gauge → cause says "rate() applied 到 'metric' 哪个 appears 到 gauge", suggestion says "使用 delta() 或 deriv() 代替", 示例 shows corrected 查询. Returns (hint): 当...时查询 returns zero results, hint: { cause, suggestion } explains 为什么 (metric 可能不 exist, label filters 可能不 match) 和 suggests 使用 grafana_list_metrics 到验证. Returns (错误带有 guidance): 在...上查询 failure, includes guidance: { cause, suggestion, 示例? } alongside raw 错误. Pattern-matched 对于 common PromQL mistakes: unclosed parenthesis, missing range selector, 超时, auth failure, rate 在...上 gauge, etc. Omitted 当...时错误 unrecognized. Tip (chaining): Both instant 和 range responses include datasourceUid — pass directly 到 grafana_create_alert 或其他 tools 没有 re-calling grafana_explore_datasources. enables zero-friction 查询→提醒 chains.
grafana_query_logs
当...时: 用户 asks 关于 logs, errors, 或 needs 到 investigate issues 由 searching log data. 也对于会话 debugging, OTel log investigation, 和 re-running existing log panel queries. Params: datasourceUid, expr (LogQL), queryType (instant/range, 默认 range), 开始/end (默认 现在-1h/现在), step (metric queries 仅), limit (默认 100), direction (backward/转发), lineLimit (max chars per log line, 默认 500, max 2000), extractFields (布尔值, 默认假 — extract structured OTel attributes 进入 clean fields 对象), dashboardUid (可选 — resolve 查询从 panel), panelId (可选 — 使用带有 dashboardUid). 示例 log 搜索: { "datasourceUid": "loki1", "expr": "{任务=\"api\"} |= \"错误\"" } 示例带有 filters: { "datasourceUid": "loki1", "expr": "{任务=\"api\"} |~ \"超时|refused\"", "limit": 50, "direction": "转发" } 示例满 stack traces: { "datasourceUid": "loki1", "expr": "{任务=\"api\"} |= \"Exception\"", "lineLimit": 2000 } 示例会话 debugging: { "datasourceUid": "loki1", "expr": "{service_name=\"openclaw\"} | json | 组件=\"lifecycle\"", "extractFields": 真 } 示例 metric 查询: { "datasourceUid": "loki1", "expr": "rate({任务=\"api\"}[5m])", "queryType": "range", "开始": "现在-6h", "end": "现在", "step": "60" } 示例 panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 18, "开始": "现在-24h", "extractFields": 真 } Returns streams: { entries: [{ labels: {...}, 时间戳: "...", line: "..." }], datasourceUid, totalEntries, truncated } — capped 在 100 entries, lines 在 500 chars (设置 lineLimit: 2000 对于满 stack traces). Returns streams (extractFields): { entries: [{ labels: {...cleaned...}, 时间戳: "...", line: "...", fields: { 组件, event_name, session_id, trace_id, 模型, duration_s, ... } }], datasourceUid } — infrastructure noise labels removed, openclaw_ prefix stripped 从字段 keys, numeric values auto-converted. 也 parses JSON log bodies 如果 present. Returns streams (traceCorrelation): 当...时 extractFields: 真 和 entries contain trace_id, includes traceCorrelation: { traceIds: [...], tool: "grafana_query_traces", tip } — up 到 5 unique trace IDs 就绪对于 grafana_query_traces 带有 queryType: "获取". Returns metric: 相同 shape 作为 grafana_query range/instant results (matrix capped 在 50 series, vector capped 在 50 results — includes datasourceUid, truncated, totalSeries/totalResults, 和 truncationHint 当...时 exceeded). Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. 如果 panel uses Prometheus datasource, returns 错误 directing 您到使用 grafana_query 代替. Returns (错误带有 guidance): 在...上查询 failure, includes guidance: { cause, suggestion, 示例? } alongside raw 错误. Pattern-matched 对于 common LogQL mistakes: bare text 没有 stream selector, 空 {}, unclosed braces, missing label matchers, auth failure, 超时. Omitted 当...时错误 unrecognized. Tip: LogQL: {label="值"} selects streams, |= substring 过滤, |~ regex, != exclude. Metric wrappers: rate(), count_over_time(), bytes_rate(). 使用 extractFields: 真 当...时 investigating OTel/lifecycle logs — surfaces trace_id, session_id, event_name, 模型, 和其他 attributes 作为第一个-类 fields 代替的 buried 在...中 raw labels. Tip (panel re-run): 相同作为 grafana_query — 设置 dashboardUid + panelId 到 auto-resolve LogQL 和 datasource. tool routes Prometheus panels 到 grafana_query 带有 helpful 错误.
grafana_query_traces
当...时: 用户 asks 关于 traces, distributed tracing, slow spans, 会话 trace hierarchies, 或 needs 到 debug 请求 flows 穿过 services. Params: datasourceUid, 查询 (TraceQL 表达式或 trace ID), queryType (搜索/获取, 默认 搜索), 开始/end (默认 现在-1h/现在), limit (默认 20, max 50), minDuration/maxDuration (e.g., "1s", "10s"), dashboardUid (可选 — resolve 查询从 panel), panelId (可选 — 使用带有 dashboardUid). 示例搜索: { "datasourceUid": "tempo1", "查询": "{ resource.服务.name = \"openclaw\" }" } 示例搜索 slow: { "datasourceUid": "tempo1", "查询": "{ resource.服务.name = \"openclaw\" }", "minDuration": "5s" } 示例搜索带有时间: { "datasourceUid": "tempo1", "查询": "{ span.gen_ai.system = \"anthropic\" }", "开始": "现在-24h", "limit": 50 } 示例获取: { "datasourceUid": "tempo1", "查询": "abc123def456789...", "queryType": "获取" } 示例 panel re-run: { "dashboardUid": "openclaw-会话-explorer", "panelId": 12, "开始": "现在-24h" } Returns 搜索: { traces: [{ traceId, rootServiceName, rootTraceName, startTime, durationMs, spanCount? }], datasourceUid, totalTraces, truncated?, correlationHint? } — capped 在 50 traces. 当...时 exceeded includes truncated: 真 和 truncationHint. 当...时 traces found, includes correlationHint: { logQuery, tool, tip } 带有就绪-到-使用 LogQL 表达式对于 grafana_query_logs. Returns 获取: { traceId, spans: [{ traceId, spanId, parentSpanId?, operationName, serviceName, startTime, durationMs, status, kind?, attributes: {...} }], datasourceUid, totalSpans, truncated? } — flattened OTLP spans 带有 resolved attributes (字符串/数字/布尔值). Capped 在 200 spans. Sorted 由开始时间 (earliest 第一个). Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. 如果 panel uses Prometheus 或 Loki datasource, returns 错误 directing 您到使用正确 tool. Returns (错误带有 guidance): 在...上查询 failure, includes guidance: { cause, suggestion, 示例? } alongside raw 错误. Pattern-matched 对于 common TraceQL mistakes: syntax errors, 无效 attributes, auth failure, 超时, 不-found, 无效 trace ID. Omitted 当...时错误 unrecognized. Returns (否 results): 当...时搜索 returns zero traces, includes hint: { cause, suggestion } suggesting 到 broaden 查询或 check datasource. Tip: TraceQL: { } matches 所有 traces, resource.服务.name 对于服务过滤, span.http.status_code 对于 HTTP spans, name 对于 operation name, 持续时间 对于 span 持续时间, status 对于错误/ok filtering. 使用 minDuration/maxDuration 到查找 performance outliers. Trace-到-Log: 搜索和获取 results include correlationHint.logQuery — pass directly 到 grafana_query_logs 到查找 correlated logs. Log-到-Trace: grafana_query_logs results (带有 extractFields: 真) include traceCorrelation.traceIds — pass 任何 ID 到 grafana_query_traces 带有 queryType: "获取". Tip (panel re-run): 相同作为 grafana_query — 设置 dashboardUid + panelId 到 auto-resolve TraceQL 和 datasource. tool routes Prometheus/Loki panels 到正确 tool 带有 helpful 错误.
grafana_create_dashboard
当...时: 用户 wants persistent dashboard 对于 ongoing monitoring. Params: 模板 或 dashboard (custom JSON) — one 必填. 可选: title (overrides 模板默认), folderUid (target folder), overwrite (默认 真). Returns: { uid, url, status, 消息, suggestedNext?: [{ 模板, reason }], validation?: DashboardValidation }. 对于模板-based dashboards, suggestedNext lists complementary templates 到 deploy 下一个. 对于 custom JSON dashboards, validation dry-runs 每个 panel's PromQL 和 reports per-panel health — check validation.panelsError 对于 broken queries.
Choose right 模板 (3-tier SRE drill-down hierarchy):
Tier 1 → System: 开始这里对于 overall health. Tier 2 → 会话: Click 会话从 Tier 1 到 investigate. Tier 3 → Deep Dive: Cost, tool, 或 SRE details.
Template Tier Domain Variables Use When
llm-command-center Tier 1 System overview $prometheus, $loki, $tempo, $provider, $model, $channel Golden signals, session table with click-to-drill-down, cost, cache, live feeds
session-explorer Tier 2 Session debug $prometheus, $loki, $tempo, $session (textbox) Per-session trace hierarchy, LLM calls, tool calls, conversation flow
cost-intelligence Tier 3a Cost analysis $prometheus, $loki, $provider, $model Spending trends, model attribution, cache savings, per-session cost table
tool-performance Tier 3b Tool analytics $prometheus, $loki, $tempo, $tool Tool leaderboard, latency ranking, error rates, tool traces
sre-operations Tier 3c SRE operations $prometheus, $loki Queue health, webhooks, stuck sessions, tool loops
genai-observability — OTel gen_ai standard $prometheus, $loki, $tempo, $model, $provider Industry-standard AI monitoring: token analytics, LLM performance, traces, logs, cache efficiency. Works with any gen_ai data.
node-exporter — System/DevOps $datasource, $instance Server CPU, memory, disk, network
http-service — Web/DevOps $datasource, $job HTTP request rate, errors, latency (RED signals)
metric-explorer — Any domain $datasource, $metric Deep-dive into any single metric from a dropdown
multi-kpi — Any domain $datasource, $metric1..$metric4 4-metric KPI overview (business, fitness, finance, IoT)
weekly-review — Any domain $datasource, $metric1, $metric2 Weekly overview of 2 external metrics with trends + all openclaw_ext_ table
All AI templates have Loki log-to-trace correlation via Tempo + stable UIDs for cross-dashboard navigation.
示例 AI health: { "模板": "llm-command-center", "title": "My AI Dashboard" } 示例会话 debug: { "模板": "会话-explorer", "title": "会话 Debug" } 示例 cost analysis: { "模板": "cost-intelligence", "title": "My AI Costs" } 示例 tool analytics: { "模板": "tool-performance", "title": "Tool Health" } 示例 SRE ops: { "模板": "sre-operations", "title": "SRE Health" } 示例 GenAI observability: { "模板": "genai-observability", "title": "GenAI Observability" } 示例 system: { "模板": "节点-exporter", "title": "Server Health" } 示例 generic: { "模板": "metric-explorer", "title": "Explore My Data" } 示例 multi-KPI: { "模板": "multi-kpi", "title": "Business KPIs" } 示例 weekly review: { "模板": "weekly-review", "title": "My Weekly Review" } 示例 custom 带有 validation: { "dashboard": { "title": "模型 Comparison", "panels": [{ "id": 1, "title": "Cost 由模型", "类型": "timeseries", "targets": [{ "refId": "", "expr": "求和由 (模型) (rate(openclaw_lens_cost_by_token_type[1h]))", "datasource": { "uid": "prometheus" } }] }] } }
Custom dashboard validation (returned 仅对于 dashboard param, 不 templates): validation: { panelsTotal: 3, panelsValid: 1, panelsNoData: 1, panelsError: 1, panelsSkipped: 0, details: [{ panelId: 1, title: "Cost by Model", status: "ok", queries: [{ refId: "A", expr: "...", valid: true, sampleValue: 0.42 }] }, { panelId: 2, title: "Latency", status: "nodata" }, { panelId: 3, title: "Bad Query", status: "error", error: "parse error at char 5" }] } Panel statuses: ok (query returned data), nodata (valid query, no results — metric may not exist yet), error (PromQL syntax error or datasource issue), skipped (no datasource UID found). Dashboard is always created regardless — validation is informational.
grafana_update_dashboard
当...时: 用户 wants 到添加 panel, 移除 panel, 更改查询, 更新 dashboard settings, 或删除 dashboard. Params: uid (必填), operation (必填: add_panel, remove_panel, update_panel, update_metadata, 删除). add_panel params: panel (对象带有 title, 类型, targets). Auto-layouts 下面 existing panels. remove_panel / update_panel params: panelId (preferred) 或 panelTitle (case-insensitive substring fallback). updates (对象) 对于 update_panel. update_metadata params: title, description, tags, 时间 (e.g., { "从": "现在-7d", "到": "现在" }), 刷新 (e.g., "1m"). 删除 params: 无此外 uid — permanently removes dashboard. Always confirm 带有用户第一个. 示例添加: { "uid": "abc123", "operation": "add_panel", "panel": { "title": "错误 Rate", "类型": "timeseries", "targets": [{ "refId": "", "expr": "rate(errors_total[5m])", "datasource": { "uid": "prom1" } }] } } 示例添加 (否 datasource): { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Latency", "类型": "timeseries", "targets": [{ "refId": "", "expr": "histogram_quantile(0.99, rate(http_duration_bucket[5m]))" }] } } — validation skipped 如果否 datasource UID found, panel 仍然 saved. 示例移除: { "uid": "abc123", "operation": "remove_panel", "panelId": 3 } 示例更新 panel: { "uid": "abc123", "operation": "update_panel", "panelId": 1, "updates": { "title": "新的 Title", "targets": [{ "refId": "", "expr": "new_query" }] } } 示例更新 metadata: { "uid": "abc123", "operation": "update_metadata", "title": "My Dashboard v2", "时间": { "从": "现在-7d", "到": "现在" }, "刷新": "5m" } 示例删除: { "uid": "abc123", "operation": "删除" } Returns 更新: { status: "updated", uid, url, version, operation, panelCount, affectedPanel?: { id, title }, changedFields?: [...], queryValidation?: { validated, results, datasourceUid?, skippedReason? } }. Returns queryValidation: 对于 add_panel 和 update_panel (当...时 targets 更改), PromQL queries dry-run against Grafana. 每个结果: { refId, expr, 有效: 布尔值, 错误?: 字符串, sampleValue?: 数字 }. Panel always saved — validation informational. 如果 有效: 假, check 错误 字段对于 PromQL syntax issues. 如果 skippedReason 设置, 否 datasource UID 是 found — include datasource: { uid: "..." } 在...上 targets 到 enable validation. Returns 删除: { status: "deleted", uid, title, 消息 }. Tip: targets 在...中 update_panel replaces entirely — include 所有 targets, 不只是 changed ones. Include datasource.uid 在...上 targets 对于查询 validation feedback.
grafana_get_dashboard
当...时: 需要到 inspect dashboard's panels — 查找 panel IDs 对于 sharing, 验证 structure, scan multiple dashboards 对于 overview, 或 audit 哪个 panels returning data. Params: uid (必填). 可选: compact (布尔值, 默认 假) — return panel titles 和 types 仅, 否 queries 或 metadata (~70% smaller). audit (布尔值, 默认 假) — dry-run 每个 panel's 查询和添加 health status. 示例 (满): { "uid": "abc123" } 示例 (compact overview): { "uid": "abc123", "compact": 真 } 示例 (audit): { "uid": "abc123", "audit": 真 } Returns (满): { uid, title, description?, url, tags, 时间?, 刷新?, panelCount, panels: [{ id, title, 类型, queries: [{ refId, expr }] }], folderUid, created?, updated? }. Returns (compact): { uid, title, url, tags, panelCount, panels: [{ id, title, 类型 }] }. Returns (audit): 相同作为满, 加上每个 panel gets health: { status: "ok"|"nodata"|"错误"|"skipped", 错误?, sampleValue? } 和响应 includes auditSummary: { ok, nodata, 错误, skipped }. Resolves 模板变量 datasources ($prometheus, $loki) 和 replaces 表达式模板 vars 带有 wildcards. Tip: 使用 audit: 真 当...时用户 asks "哪个 panels broken?" 或 "audit my dashboard" — replaces N separate grafana_query calls 带有 one tool call. 使用 compact: 真 对于 lightweight overview scans. Omit both 当...时您需要查询 details (之前更新或分享).
grafana_search
当...时: 用户 mentions dashboard 由 name, 之前 creating one (check duplicates), 或对于 reporting/audit workflows. Params: 查询 (必填). 可选: tags (数组 — 过滤由 tags), starred (布尔值 — 仅 starred), 排序 ("alpha-asc"/"alpha-desc"), limit (数字, 默认 100), enrich (布尔值 — 添加 updatedAt + panelCount per 结果, 默认假). 示例: { "查询": "cost" } 示例带有 tags: { "查询": "", "tags": ["production"] } 示例 starred: { "查询": "", "starred": 真, "limit": 10 } 示例 enriched: { "查询": "", "enrich": 真 } Returns: { 计数, enriched, dashboards: [{ uid, title, url, tags, folderTitle?, folderUid?, updatedAt?, panelCount? }] }. folderTitle/folderUid always included 当...时 dashboard 在...中 folder. updatedAt (ISO 8601) 和 panelCount 仅 present 当...时 enrich: 真 — enables staleness detection 和 reporting 没有 per-dashboard get_dashboard calls. Tip: 使用 enrich: 真 对于 reporting workflows ("哪个 dashboards stale?", "give me summary 的所有 dashboards"). Skip enrichment 对于 simple lookups. 之后 finding dashboard, 使用 grafana_get_dashboard 到 inspect panels, grafana_share_dashboard 到 render 图表, 或 grafana_update_dashboard 到修改 .
grafana_share_dashboard
当...时: 用户 says "show me" 或 "发送 me" 图表/dashboard. Params: dashboardUid, panelId (必填). 可选: 从 (默认 "现在-6h"), 到 (默认 "现在"), width (默认 1000), height (默认 500), 主题 ("light"/"dark", 默认 "dark"). 示例: { "dashboardUid": "abc123", "panelId": 2, "从": "现在-6h", "到": "现在" } Returns: Image rendered inline (tier 1), 或 snapshot URL (tier 2), 或 deep 链接 (tier 3). Always delivers something. Includes deliveryTier ("image" | "snapshot" | "链接"), rendererAvailable (布尔值 — 假当...时 Image Renderer 插件 missing), renderFailureReason (为什么 image rendering 失败), 和 remediation (如何到 fix ). Tier 3 也 includes snapshotFailureReason. Tip: 使用 grafana_get_dashboard 第一个到查找 panel IDs. 如果 rendererAvailable 假, tell 用户到 install grafana-image-renderer 插件.
grafana_create_alert
当...时: 用户 wants notifications 当...时 metric crosses threshold. Params: title, datasourceUid, expr (PromQL), threshold (所有必填). 可选: evaluation ("instant"/"rate"/"increase", 默认 "instant"), evaluationWindow (默认 "5m", used 带有 rate/increase), 条件 (gt/lt/gte/lte, 默认 gt), 对于 (持续时间, 默认 5m), folderUid, labels (e.g., { "severity": "warning" }), annotations (e.g., { "summary": "Cost too high" }), noDataState (NoData/Alerting/OK, 默认 NoData). IMPORTANT: 对于 counter metrics (_total), always 使用 evaluation: "rate" (per-第二个 rate) 或 evaluation: "increase" (总计更改在...上 window). Raw counter values always increase 和将 immediately breach 任何 threshold. 使用 "instant" (默认) 仅对于 gauges. 示例 gauge 提醒: { "title": "High Cost 提醒", "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "threshold": 5, "条件": "gt" } 示例 rate 提醒: { "title": "High 错误 Rate", "datasourceUid": "prom1", "expr": "openclaw_lens_webhook_error_total", "threshold": 0.1, "evaluation": "rate" } 示例 increase 提醒: { "title": "令牌 Burst", "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total", "threshold": 10000, "evaluation": "increase", "evaluationWindow": "1h" } Returns: { uid, title, status: "created", datasourceUid, url, evaluation?: { mode, window, evaluatedExpr }, metricValidation: { 有效, 错误?, sampleValue? }, 消息 }. datasourceUid echoes back 哪个 datasource rule targets (验证 correctness). metricValidation dry-runs 表达式之前 creation — 有效: 真 + sampleValue confirms data exists; 有效: 假 + 错误 warns 的 typos/missing metrics. 提醒 always created regardless (metric 可能不有 data 尚未). 当...时 evaluation "rate" 或 "increase", validation runs wrapped 表达式. Note: Auto-creates "Grafana Lens Alerts" folder 如果否 folderUid specified.
grafana_annotate
当...时: 用户 deploys, changes 配置, 或 wants 到 mark 事件对于 correlation. Params: action ("创建" 默认, 或 "列表"). 创建 params: text (必填), tags, dashboardUid, panelId, 时间 (epoch ms 或 relative 点赞 "现在-2h", 默认现在), timeEnd (epoch ms 或 relative). 列表 params: 从, 到 (epoch ms 或 relative 点赞 "现在-7d", "现在-24h", "现在"), tags, limit (默认 20). 时间 formats: 所有时间 params accept epoch ms (e.g., 1700000000000) 或 Grafana-样式 relative strings ("现在", "现在-1h", "现在-7d", "现在-30m"). Prefer relative strings — 它们're simpler 和 avoid arithmetic errors. 示例创建: { "text": "Deployed v2.1.0", "tags": ["deploy", "production"] } 示例创建 past: { "text": "Incident started", "时间": "现在-2h", "timeEnd": "现在-30m", "tags": ["incident"] } 示例列表 recent: { "action": "列表", "从": "现在-7d", "到": "现在", "tags": ["deploy"] } 示例列表: { "action": "列表", "tags": ["deploy"], "limit": 10 } Returns 创建: { status: "created", id, 消息, 时间, comparisonHint: { beforeWindow: { 从, 到 }, afterWindow: { 从, 到 }, suggestion } }. comparisonHint provides 就绪-到-使用 ISO 8601 时间 ranges (30-min windows) 对于之前/之后 comparison 通过 grafana_query — 否 manual 时间 math needed. 对于 region annotations (带有 timeEnd), afterWindow starts 在 timeEnd. Returns 列表: { annotations: [{ id, text, tags, 时间, timeEnd?, dashboardUID?, panelId? }] }.
grafana_check_alerts
当...时: Prompt context shows "GRAFANA ALERTS", 需要到 manage 提醒 rules (列表/删除), 设置 up 提醒 webhook, silence alerts 期间 investigation, 或 acknowledge investigated 提醒. Params: action ("列表" 默认, "acknowledge", "list_rules", "delete_rule", "silence", "unsilence", "setup"). 列表 params: 无 — returns 所有待处理 (unacknowledged) alerts. Instances capped 在 5 per 提醒. Acknowledge params: alertId (必填) — marks 提醒作为 investigated. 列表 rules params: compact (布尔值, 默认假 — returns 仅 uid/title/state/条件). 满 mode returns 所有 configured 提醒 rules 从 Grafana 带有 UID, title, 条件 (PromQL), folder, labels, annotations, 和 live evaluation state (normal/firing/待处理/nodata/错误), health, 和 lastEvaluation. One call gives complete 提醒 health picture. 删除 rule params: ruleUid (必填) — permanently deletes 提醒 rule. 获取 UIDs 从 list_rules. Silence params: matchers (必填 — 数组的 { name, 值, isRegex? } 从提醒's commonLabels), 持续时间 (默认 "2h"), 评论 (可选). Unsilence params: silenceId (必填) — removes silence 所以 alerts 恢复 notifying. Setup params: webhookUrl (可选, auto-detected) — creates webhook contact point 和通知 policy 路由在...中 Grafana. 示例列表: {} 示例 acknowledge: { "action": "acknowledge", "alertId": "提醒-1" } 示例列表 rules: { "action": "list_rules" } 示例列表 rules compact: { "action": "list_rules", "compact": 真 } 示例删除 rule: { "action": "delete_rule", "ruleUid": "abc123-def456" } 示例 silence: { "action": "silence", "matchers": [{ "name": "alertname", "值": "HighCost" }], "持续时间": "2h", "评论": "Investigating cost spike" } 示例 unsilence: { "action": "unsilence", "silenceId": "silence-uuid-123" } 示例 setup: { "action": "setup" } Returns 列表: { status: "成功", alertCount, alerts: [{ id, status, title, 消息, receivedAt, commonLabels, totalInstances, truncated?, suggestedInvestigation?: { datasourceUid, 条件, tool, queryLanguage, hint }, instances: [{ status, labels, annotations, startsAt, values }] }] }. suggestedInvestigation auto-enriched 由 matching 提醒到 rule — provides PromQL/LogQL 表达式, datasource, 和 tool 到使用对于 immediate investigation (eliminates 需要对于 separate list_rules + explore_datasources calls). Returns acknowledge: { status: "acknowledged", alertId }. Returns list_rules: { status: "成功", ruleCount, rules: [{ uid, title, folder, ruleGroup, state, health, lastEvaluation, 对于, labels, annotations, 条件, updated }] }. state live evaluation state: "normal" (不 firing), "firing", "待处理" (在...内对于持续时间), "nodata", 或 "错误". Falls back 到 "unknown" 如果 eval state API 不可用. health "ok", "nodata", "错误", 或 "unknown". 条件 extracted PromQL 表达式从 rule's data queries. Returns list_rules (compact): { status: "成功", ruleCount, rules: [{ uid, title, state, 条件 }] }. Minimal fields 对于 multi-tool chains — 使用当...时您需要 quick overview 的所有 rules 没有 details. Returns delete_rule: { status: "deleted", ruleUid, 消息 }. Returns silence: { status: "silenced", silenceId, 持续时间, matchers, 消息 }. Returns unsilence: { status: "unsilenced", silenceId, 消息 }. Returns setup: { status: "created", contactPointUid, webhookUrl } 或 { status: "already_exists", contactPointUid }. Note: Setup idempotent — safe 到 call multiple 乘以. 仅 alerts 带有 managed_by=openclaw label 路由到 webhook (auto-added 由 grafana_create_alert). 使用 list_rules → delete_rule 对于满提醒 lifecycle management (创建通过 grafana_create_alert, 列表/删除通过 grafana_check_alerts).
grafana_push_metrics
当...时: 用户 wants 到 track custom data (日历 events, git commits, fitness stats, financial data) 在...中 Grafana. Params: action ("推送" 默认, "注册", "列表", "删除"). 推送 params: metrics (必填数组) — 每个: { name, 值, labels?, 类型?, help?, 时间戳? }. Names auto-获取 openclaw_ext_ prefix. 时间戳 可选 ISO 8601 对于 historical data (gauge 仅). 注册 params: name (必填), 类型 ("gauge"/"counter", 默认 "gauge"), help, labelNames (数组), ttlDays. 列表 params: 无 — returns 所有 custom metric definitions. 删除 params: name (必填) — removes custom metric. 示例推送: { "metrics": [{ "name": "steps_today", "值": 8000 }, { "name": "meetings", "值": 3, "labels": { "类型": "standup" } }] } 示例 backfill: { "metrics": [{ "name": "steps", "值": 8000, "时间戳": "2025-01-15" }, { "name": "steps", "值": 10500, "时间戳": "2025-01-16" }] } 示例 mixed: { "metrics": [{ "name": "steps", "值": 9000, "时间戳": "2025-01-17" }, { "name": "heart_rate", "值": 72 }] } 示例注册: { "action": "注册", "name": "weight_kg", "类型": "gauge", "help": "Body weight", "labelNames": ["person"], "ttlDays": 90 } 示例列表: { "action": "列表" } 示例删除: { "action": "删除", "name": "old_metric" } Returns 推送: { status: "ok", accepted: 2, queryNames: { "openclaw_ext_steps": "openclaw_ext_steps", "openclaw_ext_events": "openclaw_ext_events_total" }, suggestedWorkflow: [{ tool, action, 示例 }], 消息: "..." }. suggestedWorkflow contains concrete 下一个-step examples 使用 actual pushed metric names — 验证 (grafana_query), visualize (grafana_create_dashboard 带有 metric-explorer 模板), 和提醒 (grafana_create_alert, single-metric 仅). Partial 成功 supported. Timestamped 和 real-时间 points 在...中相同 batch both accepted. Returns 注册: { status: "registered", metric: { name, 类型, help, labelNames, ttlMs }, queryName: "openclaw_ext_events_total", suggestedWorkflow: [{ tool, action, 示例 }] }. suggestedWorkflow shows 如何到推送 data 和查询 registered metric (带有 rate() wrapping 对于 counters). Returns 列表: { 计数, metrics: [{ name, 类型, queryName, help, labelNames, createdAt, updatedAt }] }. Returns 删除: { status: "deleted", name }. Note: 推送 auto-registers unknown metrics. 响应 includes queryNames 带有 exact PromQL names 和 suggestedWorkflow 带有 concrete 下一个 steps. 关注 suggestedWorkflow 到 complete 推送→visualize pipeline. Timestamped pushes gauge-仅 — counters 带有 timestamps rejected. See external-data.md 对于 naming conventions 和 backfill patterns.
grafana_explain_metric
当...时: 用户 asks "什么做 metric mean?", "为什么做过 spike?", " normal?", 或 "show me trend". Params: datasourceUid (必填), expr (PromQL 或 plain metric name, 必填), period (24h/7d/30d, 默认 24h), compareWith ("上一个" — compare current period 带有相同-length window immediately 之前 ). 示例: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd" } 示例 counter: { "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total" } 示例 7d: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d" } 示例 comparison: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d", "compareWith": "上一个" } 示例 PromQL: { "datasourceUid": "prom1", "expr": "rate(http_requests_total[5m])", "period": "24h" } Returns: { metricType?, trendQuery?, current: { 值, 时间戳 }, healthContext?: { status, thresholds, description, direction }, trend: { changePercent, direction, 第一个, 最后的 }, stats: { min, max, avg, samples }, comparison?: { previousPeriod: { 从, 到, avg, min, max, samples }, 更改: { absolute, percentage, direction } }, metadata: { 类型, help, unit }, suggestedQueries?: [{ 查询, description }], suggestedBreakdowns?: 字符串[] }. Sections omitted 当...时 data 不可用. changePercent 空 当...时第一个值 zero. healthContext included 对于 well-known openclaw_lens_ gauge metrics — 相同作为 grafana_query. Counter-aware: Auto-detects counter metrics (通过 metadata 类型 或 _total suffix) 和 wraps trend 查询在...中 rate(expr[5m]). current 值 stays raw (cumulative 总计), 但是 trend 和 stats show rate 的更改. metricType 字段 tells 您 detected 类型 (counter/gauge/histogram). trendQuery shows actual PromQL used 对于 trend (仅 present 当...时不同从 expr). Drill-down: 对于 multi-dimensional metrics (metrics 带有 labels 点赞 模型, token_type, provider), 响应 includes suggestedQueries — 就绪-到-使用 PromQL queries 对于 grafana_query break down metric 由每个 label. Counter metrics 获取 rate() wrapping automatically. 使用 these 到 investigate cost attribution, identify top contributors, 或 decompose aggregates. Breakdowns: suggestedBreakdowns provides label names 对于 decomposition — always 可用对于 known OpenClaw metrics (cost, 会话, 队列, webhook families) 甚至当...时 metric 有否 data 尚未. 对于 unknown metrics, falls back 到 labels discovered 从 instant 查询. 使用 these labels 带有 grafana_query 到 build 求和由 (label) (...) queries 对于 root-cause analysis. Period comparison: 使用 compareWith: "上一个" 对于 period-在...上-period analysis (e.g., week vs. 最后的 week). Returns comparison 对象带有上一个 period's stats 和更改 (absolute, percentage, direction). Works 带有 counters too (compares rates). Eliminates 需要对于 manual multi-查询 workflows. Tip: 对于 simple trend context, call 带有只是 period. 对于 "做过 things improve?" questions, 添加 compareWith: "上一个". Metadata 仅可用对于 plain metric names (不 complex PromQL). 否需要到 manually wrap counters 在...中 rate() — tool 做 automatically.
grafana_security_check
当...时: 用户 asks "am I 正在 attacked?", "security status", "security audit", "security check", 或 wants co

You have full native Grafana access — query data, create dashboards, set alerts, receive alert notifications, annotate events, explore datasources, push custom data, and deliver visualizations inline. Works with ANY data in Grafana, not just agent metrics.

Musts

Always call grafana_explore_datasources first when you need a datasource UID — never guess UIDs

Always call grafana_search before creating a dashboard — avoid duplicates

Always call grafana_get_dashboard before grafana_share_dashboard — you need exact panel IDs

Always call grafana_get_dashboard before grafana_update_dashboard — you need panel IDs and current structure

Prefer grafana_query for direct answers over creating dashboards — "what's my cost?" needs a number, not a URL

Prefer grafana_query over grafana_create_dashboard + grafana_share_dashboard for simple data questions — a number is faster than a chart

Use grafana_query_logs for log searches — LogQL for logs, PromQL for metrics, TraceQL for traces. Never use grafana_query for Loki datasources

Use grafana_query_traces for trace searches — TraceQL for traces, PromQL for metrics, LogQL for logs. Never use grafana_query or grafana_query_logs for Tempo datasources

All tools work with ANY Prometheus datasource — not just openclaw_lens_ metrics

When you see "GRAFANA ALERTS" in prompt context, investigate immediately with grafana_check_alerts — use the suggestedInvestigation field to go directly to querying (it provides the tool, query, and datasource)

Run grafana_check_alerts with action setup once before alert notifications can reach the agent — this creates the webhook contact point

Push data before querying or dashboarding it — data is pushed via OTLP and available immediately

Prefer grafana_explain_metric for "what is this metric?" questions over manual grafana_query — it returns current value, trend, stats, and metadata in one call

Use queryNames from push response for PromQL queries — don't guess metric names (counters get _total suffix)

Use openclaw_ext_ prefix for custom metrics — grafana_push_metrics auto-prepends it if missing

Follow statistics-first discipline for log investigation — always run count/rate LogQL before reading individual entries. Use grafana_query_logs with metric-over-logs queries (count_over_time, rate, topk) before switching to raw log entries

Silence alerts during investigation — use grafana_check_alerts with action silence to prevent repeat notifications while investigating

Use list_rules for complete alert health — grafana_check_alerts with action list_rules returns all rules with live eval state (normal/firing/pending/nodata/error), health, and lastEvaluation — no need to cross-reference with list action

Use dashboardUid + panelId to re-run panel queries — don't manually extract PromQL/LogQL from get_dashboard output. Both grafana_query and grafana_query_logs accept these params to auto-resolve the panel's query expression and datasource. The tool handles template variable replacement and datasource routing automatically

Confirm with user before deleting dashboards or alert rules — grafana_update_dashboard with operation delete and grafana_check_alerts with action delete_rule are permanent and cannot be undone

Always use alloy_pipeline action recipes first when unsure which pipeline recipe fits the user's request — because recipes provide validation, credential handling, and sample queries that raw config does not

Always call alloy_pipeline action status after creating a pipeline — because data takes 15-20s to flow through the pipeline, and components may fail silently after reload

Never guess Alloy component names — use recipes for known patterns, or raw config only when the user explicitly provides Alloy syntax

Prefer recipes over raw config when a recipe exists — recipes provide validation, sample queries, credential handling, dashboard templates, and automatic export target wiring

Never write credentials into raw config — when the user provides a connection string, DSN, password, or API key, ALWAYS use the matching recipe (which routes credentials through sys.env(), keeping secrets off disk). If you must use raw config, wrap sensitive values in sys.env("MY_VAR_NAME") and tell the user to set that env var where Alloy runs

Read envVarsRequired from every pipeline create response — credential recipes may return pending_credentials status when env vars aren't set yet. Tell the user the exact var names and that they must set them where Alloy runs, then verify with action status

Warn users before creating credential-required pipelines — Alloy config reload is atomic: if a credential recipe's env vars aren't set, the reload failure blocks ALL managed pipelines (not just the new one) until the env vars are set or the pipeline is deleted. Always ask: "Do you have the credentials ready to set as env vars on the Alloy host?"

Chain pipeline creation into existing tools — after pipeline is active: grafana_list_metrics or grafana_query_logs to discover data, grafana_create_dashboard to visualize, grafana_create_alert to monitor

Use alloy_pipeline action diagnose as first step when user reports pipeline issues — because it checks Alloy connectivity, all pipeline health, config file drift, and limits in one call

Confirm with user before deleting pipelines — alloy_pipeline with action delete removes the config and data stops flowing

All log recipes accept processing params — don't create separate "processing" pipelines. Add jsonExpressions, labelFields, structuredMetadata, tenantValue, matchRoutes, etc. directly to any log recipe (docker-logs, file-logs, syslog, etc.)

Use samplingPolicies for multi-policy tail sampling — don't create raw config when application-traces can handle it. sampleRate is for simple probabilistic, samplingPolicies is for intelligent multi-policy (keep errors, keep slow, sample rest)

Use log processing params for multi-tenant routing — tenantValue/tenantSource/matchRoutes work on ALL log recipes. Don't create separate "routing" pipelines

Read references/alloy-components.md before composing raw config — it has copy-pasteable snippets for all common Alloy components

Quick Decision Tree

"What is [metric]?" / "Why did it spike?" → grafana_explain_metric

"What's the current value of X?" / complex PromQL → grafana_query

"Find error logs" / "Search logs for..." → grafana_query_logs

"Find slow traces" / "Show trace for session X" / "Debug distributed spans" → grafana_query_traces

"Debug this session" / "Why did it fail?" / "What went wrong?" → grafana_query_traces (search error/slow) → grafana_query_traces (get → follow correlationHint) → grafana_query_logs → grafana_query → grafana_annotate

"Show me a chart" / "Visualize..." → grafana_search → grafana_get_dashboard → grafana_share_dashboard

"Create a dashboard for..." → grafana_search (check duplicates) → grafana_create_dashboard

"Add a panel to my dashboard" → grafana_get_dashboard → grafana_update_dashboard

"Delete this dashboard" → grafana_update_dashboard with operation delete (confirm with user first)

"Alert me when..." → grafana_check_alerts (setup) → grafana_create_alert

"List my alert rules" / "What alerts do I have?" → grafana_check_alerts with action list_rules

"Delete alert rule X" → grafana_check_alerts with action list_rules → delete_rule with ruleUid

"Track my [custom data]" / "Record my [past data]" → grafana_push_metrics (with optional timestamp for historical data, auto-registers, returns queryNames) → grafana_query with queryNames

"What data sources do I have?" → grafana_explore_datasources

"What metrics are available?" → grafana_list_metrics

"Set up monitoring" / "Monitor my agent" / "What dashboards should I have?" → grafana_search (check existing) → grafana_create_dashboard with llm-command-center → follow suggestedNext chain through remaining templates

"GenAI observability" / "OTel gen_ai metrics" / "Standard AI monitoring" → grafana_create_dashboard with genai-observability template

"What happened in session X?" / "Debug this session" → grafana_create_dashboard with session-explorer template → paste session ID

"Show me LLM traces" / "Show agent logs" → grafana_create_dashboard with llm-command-center template (Loki + Tempo)

"How much am I spending?" / "Cost analysis" → grafana_create_dashboard with cost-intelligence template

"Which tools are slow?" / "Tool errors" → grafana_create_dashboard with tool-performance template

"Queue health" / "Webhook issues" / "Stuck sessions" → grafana_create_dashboard with sre-operations template

"System health check" / "Status report" / "Review all dashboards" → grafana_explore_datasources → grafana_check_alerts (list + list_rules) → grafana_search → grafana_get_dashboard (audit=true for each) → summarize

"Audit my dashboard" / "Which panels are broken?" → grafana_get_dashboard (audit=true) → review auditSummary + per-panel health

"Am I being attacked?" / "Security check" / "Security status" → grafana_security_check

"Set up security monitoring" → grafana_check_alerts (setup) → grafana_create_dashboard (security-overview) → grafana_create_alert (webhook error burst, cost spike, tool loops, injection signals)

"Investigate security alert" → grafana_security_check → grafana_query_logs (correlate) → grafana_annotate (mark investigation) → grafana_check_alerts (silence)

"Investigate this alert" / "Why is X broken?" / "Debug this issue" / "Triage" / "Root cause" → grafana_investigate (multi-signal triage) → follow suggestedHypotheses.testWith for deep-dives

"Is this metric normal?" / "Is there an anomaly?" → grafana_explain_metric (returns anomaly z-score + seasonality vs 1d/7d ago for 24h period)

"RED analysis" / "What's the error rate?" / "Service health" → RED Method queries (see sre-investigation.md §2)

"Alert fatigue" / "Which alerts are noisy?" / "Alert health" → grafana_check_alerts with action analyze — fatigue report

"Postmortem" / "Incident summary" / "What happened?" → grafana_investigate → 5-Phase methodology → postmortem template (see sre-investigation.md §9)

"Compare before/after deployment" → grafana_annotate (list, tags: ["deploy"]) → grafana_explain_metric (compareWith: "previous")

Data Collection Pipelines (Alloy)

"Monitor a service/database/app" → alloy_pipeline action recipes (filter by category) → select recipe → create → status → query → dashboard → alert

"Scrape metrics from [endpoint]" / "My app exposes /metrics" → alloy_pipeline with recipe scrape-endpoint + params { url }

"Monitor PostgreSQL/MySQL/Redis/MongoDB/Memcached" → alloy_pipeline with recipe [db]-exporter + params { connectionString }

"Collect and parse logs with JSON extraction" → alloy_pipeline (log recipe + processing params: jsonExpressions, labelFields, structuredMetadata)

"Collect Docker logs" / "See container logs in Grafana" → alloy_pipeline with recipe docker-logs

"Tail log files" / "Collect app logs from /var/log" → alloy_pipeline with recipe file-logs + params { paths }

"Accept logs via HTTP push API" / "Centralized log gateway" → alloy_pipeline with recipe loki-push-api

"Consume logs from Kafka" → alloy_pipeline with recipe kafka-logs + params { brokers, topics }

"Set up syslog collection" → alloy_pipeline with recipe syslog

"Monitor endpoint availability" / "Synthetic probing" / "HTTP health checks" → alloy_pipeline with recipe blackbox-exporter + params { targets }

"Kubernetes monitoring" / "Monitor my K8s cluster" → alloy_pipeline with recipe kubernetes-pods + kubernetes-services + kubernetes-logs (3 pipelines)

"Receive OTLP data" / "Set up trace collection" → alloy_pipeline with recipe otlp-receiver

"Generate RED metrics from traces" / "Span metrics" → alloy_pipeline with recipe span-metrics

"Service dependency graph from traces" → alloy_pipeline with recipe service-graph

"Monitor Alloy itself" / "Self-monitoring" → alloy_pipeline with recipe self-monitoring

"Redact secrets from logs" / "Compliance logging" → alloy_pipeline with recipe secret-filter-logs + params { paths }

"Monitor Elasticsearch/Kafka" → alloy_pipeline with recipe elasticsearch-exporter / kafka-exporter

"System metrics" / "Node monitoring" / "CPU/memory/disk" → alloy_pipeline with recipe node-exporter

"Docker container metrics" / "Container resource usage" → alloy_pipeline with recipe docker-metrics

"Reduce trace costs" / "Keep only error traces" / "Smart trace sampling" / "Tail sampling" → alloy_pipeline with recipe application-traces + samplingPolicies array (keep errors, keep slow, filter health checks, sample rest)

"Multi-tenant Loki" / "Route logs by tenant" / "Different tenants for different apps" → any log recipe + tenantValue or matchRoutes processing param

"Profile my app" / "CPU profiling" / "Memory profiling" / "Continuous profiling" / "Go pprof" → alloy_pipeline with recipe continuous-profiling + targets

"Frontend observability" / "Browser RUM" / "Web vitals" / "Faro SDK" → alloy_pipeline with recipe faro-frontend

"GELF logs" / "Graylog" / "Docker GELF driver" → alloy_pipeline with recipe gelf-logs

"Custom Alloy pattern" / "Advanced pipeline" → Read references/alloy-components.md → alloy_pipeline with raw config + optional sampleQueries

"What data collection recipes are available?" → alloy_pipeline with action recipes

"What pipelines do I have?" / "Pipeline list" → alloy_pipeline with action list

"Is my pipeline working?" / "Pipeline health" → alloy_pipeline with action status + name

"Pipeline problems" / "Why isn't data showing up?" → alloy_pipeline with action diagnose → follow remediation

"Delete pipeline" / "Remove monitoring for..." → alloy_pipeline with action delete + name (confirm with user first)

Working with Multiple Grafana Instances

When several Grafana environments are configured (dev, staging, prod), every tool accepts an optional instance parameter. grafana_explore_datasources returns availableInstances — use the name values from that list.

Why this matters: Users often need to query production metrics, create dashboards in dev, or compare environments side by side. Each tool call targets one instance.

Smart defaults: Omitting instance always targets the configured default — safe and invisible for single-environment setups. Only specify instance when the user explicitly names a non-default environment.

Cross-environment workflows: Each call is independent. Query prod, create dashboard in dev — just set instance differently on each call. No context switching needed.

Tool Inventory

Tool	What It Does
`grafana_explore_datasources`	Discover configured datasources (UIDs, types, query routing) — tells you which tool + query language to use for each datasource
`grafana_list_metrics`	Discover available metrics or label values from a datasource. Use `compact: true` with `metadata: true` for minimal fields in multi-tool chains
`grafana_query`	Run PromQL instant/range queries — get numbers directly
`grafana_query_logs`	Run LogQL queries against Loki — search and filter logs
`grafana_query_traces`	Run TraceQL queries against Tempo — search traces or get full trace by ID
`grafana_create_dashboard`	Create dashboards from templates or custom JSON
`grafana_update_dashboard`	Add/remove/update panels, change dashboard metadata, or delete dashboard
`grafana_get_dashboard`	Get dashboard summary (panels, queries). Use `compact: true` for overview scans, `audit: true` to health-check all panels in one call
`grafana_search`	Search existing dashboards by title, tags, or starred status
`grafana_share_dashboard`	Render panel as image and deliver inline via messaging
`grafana_create_alert`	Create Grafana-native alert rules on any metric
`grafana_annotate`	Create or list annotations (events) on dashboards
`grafana_check_alerts`	Check, acknowledge, list/delete rules, silence/unsilence, or set up Grafana alert webhook notifications. Use `compact: true` with `list_rules` for minimal fields
`grafana_push_metrics`	Push custom data (calendar, git, fitness, finance) via OTLP
`grafana_explain_metric`	Get metric context: current value, trend, stats, metadata, drill-down queries — agent interprets
`grafana_security_check`	Run 6 parallel security checks and return threat-level assessment (green/yellow/red) — "Am I being attacked?"
`grafana_investigate`	Multi-signal investigation triage — gathers metrics, logs, traces, and context in parallel, generates hypotheses with specific tool+params for follow-up
`alloy_pipeline`	Create and manage Alloy data collection pipelines — 29 recipes for metrics, logs, traces, profiles from any infrastructure (databases, K8s, Docker, apps, profiling, frontend RUM)

Tool Details
grafana_explore_datasources
When: First step when user mentions data, metrics, or monitoring. Gets datasource UIDs needed by grafana_query, grafana_query_logs, grafana_query_traces, grafana_list_metrics, grafana_create_alert, and grafana_explain_metric. Params: instance (optional — target Grafana instance, omit for default). Example: {} Example (multi-instance): { "instance": "prod" } Returns: List of datasources with uid, name, type, isDefault, plus routing hints: queryTool (which agent tool to use, e.g. "grafana_query", "grafana_query_logs", or "grafana_query_traces"), queryLanguage (e.g. "PromQL", "LogQL", "TraceQL"), and supported (boolean — whether an agent tool can query this datasource). Use queryTool to pick the right tool for each datasource. When multiple Grafana instances are configured, also returns instance (which instance was queried) and availableInstances (list of { name, url, isDefault } for all configured instances).
grafana_list_metrics
When: User asks "what metrics are available?" or you need to discover metrics before querying or composing dashboards. Also when grouping metrics by function — metadata mode adds category to each openclaw_ metric. Use purpose when user asks about a specific concern (e.g., "performance metrics", "cost metrics"). Params: datasourceUid (required), prefix (filter by prefix), search (targeted discovery — server-side regex, only matching metrics returned), purpose ("performance" | "cost" | "reliability" | "capacity" — pre-filter by intent, composable with prefix and search), label (list label values instead), metadata (boolean — enriched results with type/help/category), compact (boolean — with metadata, returns only name/type/category, ~60% smaller). Example names: { "datasourceUid": "prom1", "prefix": "openclaw_lens_" } Example search: { "datasourceUid": "prom1", "search": "steps" } Example purpose: { "datasourceUid": "prom1", "purpose": "performance", "metadata": true } Example combined: { "datasourceUid": "prom1", "prefix": "openclaw_ext_", "search": "fitness" } Example metadata: { "datasourceUid": "prom1", "metadata": true, "prefix": "openclaw_" } Example compact: { "datasourceUid": "prom1", "metadata": true, "compact": true } Returns names: { metrics: ["metric1", "metric2", ...] }. Truncated at 200. Returns metadata: { metadataSource, categorySummary: { cost: 3, usage: 4, session: 5, ... }, metrics: [{ name, type, help, category?, source? }, ...] }. Use this before composing custom dashboards — type tells you counter vs gauge vs histogram, category groups openclaw_ metrics by function. Search also matches help text. Categories: cost, usage, session, queue, messaging, webhook, tools, agent, custom. categorySummary gives counts per category for quick overview (omitted when no openclaw_ metrics). Purpose maps: performance → session + tools, cost → cost + usage, reliability → webhook + messaging + agent, capacity → queue + session. metadataSource: "prometheus" when Prometheus metadata endpoint has data, "synthetic" when OTLP-only (metadata synthesized from known metric registry — histogram sub-metrics deduplicated, type/help from Grafana Lens definitions). On OTLP stacks, includes hint explaining why metadata is synthetic. source: "synthetic" on individual entries from the registry; source: "custom" on entries from the custom metrics store. Returns compact: { metadataSource, categorySummary: {...}, metrics: [{ name, type, category? }, ...] }. Same as metadata but drops help, source, labelNames — use in multi-tool chains where you need metric names and types but not full descriptions. Example label: { "datasourceUid": "prom1", "label": "job" } Returns label: { label, count, totalCount, values: ["value1", "value2", ...] }. Truncated at 200.
grafana_query
When: User asks a data question that needs a direct answer, not a dashboard. Also for re-running an existing dashboard panel's query with different time ranges. Params: datasourceUid, expr (PromQL), queryType (instant/range), start (range only, required), end (range only, default "now"), step (range only, optional — auto-calculated from time range if omitted, targeting ~300 datapoints), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid). Example instant: { "datasourceUid": "prom1", "expr": "sum(increase(openclaw_lens_cost_by_model_total[1d])) or vector(0)" } Example range (auto-step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "start": "now-30d" } Example range (explicit step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "start": "now-1h", "end": "now", "step": "60" } Example panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 10, "queryType": "range", "start": "now-7d" } Tip: start/end accept Unix seconds or relative expressions like "now-1h", "now-7d". For range queries, just set start — end defaults to "now" and step is auto-calculated. Override step only when you need specific resolution. Tip (panel re-run): Set dashboardUid + panelId to re-run a panel's query without manually extracting PromQL. The tool auto-resolves expr and datasourceUid from the panel definition. Template variables are replaced with wildcards. You can still override expr or datasourceUid explicitly if needed. Get panel IDs from grafana_get_dashboard. Returns instant: { metrics: [{ metric: {...}, value: "1.23", timestamp: "...", healthContext?: { status, thresholds, description, direction } }], datasourceUid, resultCount, warnings?, hint? } — healthContext is included for well-known openclaw_lens_ gauge metrics, providing SRE-grade health assessment: status ("healthy"/"warning"/"critical"), thresholds (warning/critical values), description (what the metric means), direction ("higher_is_worse"/"lower_is_worse"). Omitted for unknown metrics. Capped at 50 results; when exceeded includes truncated: true, totalResults, and truncationHint advising to narrow the query. Returns range: { series: [{ metric: {...}, values: [{ time, value }...] }], datasourceUid, resultCount, warnings?, hint? } — truncated to 20 points per series and 50 series max. When series are truncated includes truncated: true, totalSeries, and truncationHint. When step is auto-calculated, includes step: { value: "288s", display: "5m", auto: true }. Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal query results. If the panel uses a Loki datasource, returns an error directing you to use grafana_query_logs instead. Returns (warnings): When Prometheus flags a non-fatal issue (e.g., rate() on a gauge), warnings: [{ cause, suggestion, example? }] is included. Example: rate() on a gauge → cause says "rate() applied to 'metric' which appears to be a gauge", suggestion says "use delta() or deriv() instead", example shows the corrected query. Returns (hint): When the query returns zero results, hint: { cause, suggestion } explains why (metric may not exist, label filters may not match) and suggests using grafana_list_metrics to verify. Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common PromQL mistakes: unclosed parenthesis, missing range selector, timeout, auth failure, rate on gauge, etc. Omitted when the error is unrecognized. Tip (chaining): Both instant and range responses include datasourceUid — pass it directly to grafana_create_alert or other tools without re-calling grafana_explore_datasources. This enables zero-friction query→alert chains.
grafana_query_logs
When: User asks about logs, errors, or needs to investigate issues by searching log data. Also for session debugging, OTel log investigation, and re-running existing log panel queries. Params: datasourceUid, expr (LogQL), queryType (instant/range, default range), start/end (default now-1h/now), step (metric queries only), limit (default 100), direction (backward/forward), lineLimit (max chars per log line, default 500, max 2000), extractFields (boolean, default false — extract structured OTel attributes into a clean fields object), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid). Example log search: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |= \"error\"" } Example with filters: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |~ \"timeout|refused\"", "limit": 50, "direction": "forward" } Example full stack traces: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |= \"Exception\"", "lineLimit": 2000 } Example session debugging: { "datasourceUid": "loki1", "expr": "{service_name=\"openclaw\"} | json | component=\"lifecycle\"", "extractFields": true } Example metric query: { "datasourceUid": "loki1", "expr": "rate({job=\"api\"}[5m])", "queryType": "range", "start": "now-6h", "end": "now", "step": "60" } Example panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 18, "start": "now-24h", "extractFields": true } Returns streams: { entries: [{ labels: {...}, timestamp: "...", line: "..." }], datasourceUid, totalEntries, truncated } — capped at 100 entries, lines at 500 chars (set lineLimit: 2000 for full stack traces). Returns streams (extractFields): { entries: [{ labels: {...cleaned...}, timestamp: "...", line: "...", fields: { component, event_name, session_id, trace_id, model, duration_s, ... } }], datasourceUid } — infrastructure noise labels removed, openclaw_ prefix stripped from field keys, numeric values auto-converted. Also parses JSON log bodies if present. Returns streams (traceCorrelation): When extractFields: true and entries contain trace_id, includes traceCorrelation: { traceIds: [...], tool: "grafana_query_traces", tip } — up to 5 unique trace IDs ready for grafana_query_traces with queryType: "get". Returns metric: Same shape as grafana_query range/instant results (matrix capped at 50 series, vector capped at 50 results — includes datasourceUid, truncated, totalSeries/totalResults, and truncationHint when exceeded). Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. If the panel uses a Prometheus datasource, returns an error directing you to use grafana_query instead. Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common LogQL mistakes: bare text without stream selector, empty {}, unclosed braces, missing label matchers, auth failure, timeout. Omitted when the error is unrecognized. Tip: LogQL: {label="value"} selects streams, |= substring filter, |~ regex, != exclude. Metric wrappers: rate(), count_over_time(), bytes_rate(). Use extractFields: true when investigating OTel/lifecycle logs — it surfaces trace_id, session_id, event_name, model, and other attributes as first-class fields instead of buried in raw labels. Tip (panel re-run): Same as grafana_query — set dashboardUid + panelId to auto-resolve LogQL and datasource. The tool routes Prometheus panels to grafana_query with a helpful error.
grafana_query_traces
When: User asks about traces, distributed tracing, slow spans, session trace hierarchies, or needs to debug request flows across services. Params: datasourceUid, query (TraceQL expression or trace ID), queryType (search/get, default search), start/end (default now-1h/now), limit (default 20, max 50), minDuration/maxDuration (e.g., "1s", "10s"), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid). Example search: { "datasourceUid": "tempo1", "query": "{ resource.service.name = \"openclaw\" }" } Example search slow: { "datasourceUid": "tempo1", "query": "{ resource.service.name = \"openclaw\" }", "minDuration": "5s" } Example search with time: { "datasourceUid": "tempo1", "query": "{ span.gen_ai.system = \"anthropic\" }", "start": "now-24h", "limit": 50 } Example get: { "datasourceUid": "tempo1", "query": "abc123def456789...", "queryType": "get" } Example panel re-run: { "dashboardUid": "openclaw-session-explorer", "panelId": 12, "start": "now-24h" } Returns search: { traces: [{ traceId, rootServiceName, rootTraceName, startTime, durationMs, spanCount? }], datasourceUid, totalTraces, truncated?, correlationHint? } — capped at 50 traces. When exceeded includes truncated: true and truncationHint. When traces are found, includes correlationHint: { logQuery, tool, tip } with a ready-to-use LogQL expression for grafana_query_logs. Returns get: { traceId, spans: [{ traceId, spanId, parentSpanId?, operationName, serviceName, startTime, durationMs, status, kind?, attributes: {...} }], datasourceUid, totalSpans, truncated? } — flattened OTLP spans with resolved attributes (string/number/boolean). Capped at 200 spans. Sorted by start time (earliest first). Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. If the panel uses a Prometheus or Loki datasource, returns an error directing you to use the correct tool. Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common TraceQL mistakes: syntax errors, invalid attributes, auth failure, timeout, not-found, invalid trace ID. Omitted when the error is unrecognized. Returns (no results): When search returns zero traces, includes hint: { cause, suggestion } suggesting to broaden the query or check the datasource. Tip: TraceQL: { } matches all traces, resource.service.name for service filter, span.http.status_code for HTTP spans, name for operation name, duration for span duration, status for error/ok filtering. Use minDuration/maxDuration to find performance outliers. Trace-to-Log: search and get results include correlationHint.logQuery — pass it directly to grafana_query_logs to find correlated logs. Log-to-Trace: grafana_query_logs results (with extractFields: true) include traceCorrelation.traceIds — pass any ID to grafana_query_traces with queryType: "get". Tip (panel re-run): Same as grafana_query — set dashboardUid + panelId to auto-resolve TraceQL and datasource. The tool routes Prometheus/Loki panels to the correct tool with a helpful error.
grafana_create_dashboard
When: User wants a persistent dashboard for ongoing monitoring. Params: template or dashboard (custom JSON) — one required. Optional: title (overrides template default), folderUid (target folder), overwrite (default true). Returns: { uid, url, status, message, suggestedNext?: [{ template, reason }], validation?: DashboardValidation }. For template-based dashboards, suggestedNext lists complementary templates to deploy next. For custom JSON dashboards, validation dry-runs each panel's PromQL and reports per-panel health — check validation.panelsError for broken queries.
Choose the right template (3-tier SRE drill-down hierarchy):
Tier 1 → System: Start here for overall health. Tier 2 → Session: Click a session from Tier 1 to investigate. Tier 3 → Deep Dive: Cost, tool, or SRE details.
Template Tier Domain Variables Use When
llm-command-center Tier 1 System overview $prometheus, $loki, $tempo, $provider, $model, $channel Golden signals, session table with click-to-drill-down, cost, cache, live feeds
session-explorer Tier 2 Session debug $prometheus, $loki, $tempo, $session (textbox) Per-session trace hierarchy, LLM calls, tool calls, conversation flow
cost-intelligence Tier 3a Cost analysis $prometheus, $loki, $provider, $model Spending trends, model attribution, cache savings, per-session cost table
tool-performance Tier 3b Tool analytics $prometheus, $loki, $tempo, $tool Tool leaderboard, latency ranking, error rates, tool traces
sre-operations Tier 3c SRE operations $prometheus, $loki Queue health, webhooks, stuck sessions, tool loops
genai-observability — OTel gen_ai standard $prometheus, $loki, $tempo, $model, $provider Industry-standard AI monitoring: token analytics, LLM performance, traces, logs, cache efficiency. Works with any gen_ai data.
node-exporter — System/DevOps $datasource, $instance Server CPU, memory, disk, network
http-service — Web/DevOps $datasource, $job HTTP request rate, errors, latency (RED signals)
metric-explorer — Any domain $datasource, $metric Deep-dive into any single metric from a dropdown
multi-kpi — Any domain $datasource, $metric1..$metric4 4-metric KPI overview (business, fitness, finance, IoT)
weekly-review — Any domain $datasource, $metric1, $metric2 Weekly overview of 2 external metrics with trends + all openclaw_ext_ table
All AI templates have Loki log-to-trace correlation via Tempo + stable UIDs for cross-dashboard navigation.
Example AI health: { "template": "llm-command-center", "title": "My AI Dashboard" } Example session debug: { "template": "session-explorer", "title": "Session Debug" } Example cost analysis: { "template": "cost-intelligence", "title": "My AI Costs" } Example tool analytics: { "template": "tool-performance", "title": "Tool Health" } Example SRE ops: { "template": "sre-operations", "title": "SRE Health" } Example GenAI observability: { "template": "genai-observability", "title": "GenAI Observability" } Example system: { "template": "node-exporter", "title": "Server Health" } Example generic: { "template": "metric-explorer", "title": "Explore My Data" } Example multi-KPI: { "template": "multi-kpi", "title": "Business KPIs" } Example weekly review: { "template": "weekly-review", "title": "My Weekly Review" } Example custom with validation: { "dashboard": { "title": "Model Comparison", "panels": [{ "id": 1, "title": "Cost by Model", "type": "timeseries", "targets": [{ "refId": "A", "expr": "sum by (model) (rate(openclaw_lens_cost_by_token_type[1h]))", "datasource": { "uid": "prometheus" } }] }] } }
Custom dashboard validation (returned only for dashboard param, not templates): validation: { panelsTotal: 3, panelsValid: 1, panelsNoData: 1, panelsError: 1, panelsSkipped: 0, details: [{ panelId: 1, title: "Cost by Model", status: "ok", queries: [{ refId: "A", expr: "...", valid: true, sampleValue: 0.42 }] }, { panelId: 2, title: "Latency", status: "nodata" }, { panelId: 3, title: "Bad Query", status: "error", error: "parse error at char 5" }] } Panel statuses: ok (query returned data), nodata (valid query, no results — metric may not exist yet), error (PromQL syntax error or datasource issue), skipped (no datasource UID found). Dashboard is always created regardless — validation is informational.
grafana_update_dashboard
When: User wants to add a panel, remove a panel, change a query, update dashboard settings, or delete a dashboard. Params: uid (required), operation (required: add_panel, remove_panel, update_panel, update_metadata, delete). add_panel params: panel (object with title, type, targets). Auto-layouts below existing panels. remove_panel / update_panel params: panelId (preferred) or panelTitle (case-insensitive substring fallback). updates (object) for update_panel. update_metadata params: title, description, tags, time (e.g., { "from": "now-7d", "to": "now" }), refresh (e.g., "1m"). delete params: None besides uid — permanently removes the dashboard. Always confirm with user first. Example add: { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Error Rate", "type": "timeseries", "targets": [{ "refId": "A", "expr": "rate(errors_total[5m])", "datasource": { "uid": "prom1" } }] } } Example add (no datasource): { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Latency", "type": "timeseries", "targets": [{ "refId": "A", "expr": "histogram_quantile(0.99, rate(http_duration_bucket[5m]))" }] } } — validation skipped if no datasource UID found, panel still saved. Example remove: { "uid": "abc123", "operation": "remove_panel", "panelId": 3 } Example update panel: { "uid": "abc123", "operation": "update_panel", "panelId": 1, "updates": { "title": "New Title", "targets": [{ "refId": "A", "expr": "new_query" }] } } Example update metadata: { "uid": "abc123", "operation": "update_metadata", "title": "My Dashboard v2", "time": { "from": "now-7d", "to": "now" }, "refresh": "5m" } Example delete: { "uid": "abc123", "operation": "delete" } Returns update: { status: "updated", uid, url, version, operation, panelCount, affectedPanel?: { id, title }, changedFields?: [...], queryValidation?: { validated, results, datasourceUid?, skippedReason? } }. Returns queryValidation: For add_panel and update_panel (when targets change), PromQL queries are dry-run against Grafana. Each result: { refId, expr, valid: boolean, error?: string, sampleValue?: number }. Panel is always saved — validation is informational. If valid: false, check the error field for PromQL syntax issues. If skippedReason is set, no datasource UID was found — include datasource: { uid: "..." } on targets to enable validation. Returns delete: { status: "deleted", uid, title, message }. Tip: targets in update_panel replaces entirely — include all targets, not just changed ones. Include datasource.uid on targets for query validation feedback.
grafana_get_dashboard
When: Need to inspect a dashboard's panels — find panel IDs for sharing, verify structure, scan multiple dashboards for an overview, or audit which panels are returning data. Params: uid (required). Optional: compact (boolean, default false) — return panel titles and types only, no queries or metadata (~70% smaller). audit (boolean, default false) — dry-run each panel's query and add health status. Example (full): { "uid": "abc123" } Example (compact overview): { "uid": "abc123", "compact": true } Example (audit): { "uid": "abc123", "audit": true } Returns (full): { uid, title, description?, url, tags, time?, refresh?, panelCount, panels: [{ id, title, type, queries: [{ refId, expr }] }], folderUid, created?, updated? }. Returns (compact): { uid, title, url, tags, panelCount, panels: [{ id, title, type }] }. Returns (audit): Same as full, plus each panel gets health: { status: "ok"|"nodata"|"error"|"skipped", error?, sampleValue? } and the response includes auditSummary: { ok, nodata, error, skipped }. Resolves template variable datasources ($prometheus, $loki) and replaces expression template vars with wildcards. Tip: Use audit: true when the user asks "which panels are broken?" or "audit my dashboard" — it replaces N separate grafana_query calls with one tool call. Use compact: true for lightweight overview scans. Omit both when you need query details (before update or share).
grafana_search
When: User mentions a dashboard by name, before creating one (check duplicates), or for reporting/audit workflows. Params: query (required). Optional: tags (array — filter by tags), starred (boolean — only starred), sort ("alpha-asc"/"alpha-desc"), limit (number, default 100), enrich (boolean — add updatedAt + panelCount per result, default false). Example: { "query": "cost" } Example with tags: { "query": "", "tags": ["production"] } Example starred: { "query": "", "starred": true, "limit": 10 } Example enriched: { "query": "", "enrich": true } Returns: { count, enriched, dashboards: [{ uid, title, url, tags, folderTitle?, folderUid?, updatedAt?, panelCount? }] }. folderTitle/folderUid always included when dashboard is in a folder. updatedAt (ISO 8601) and panelCount only present when enrich: true — enables staleness detection and reporting without per-dashboard get_dashboard calls. Tip: Use enrich: true for reporting workflows ("which dashboards are stale?", "give me a summary of all dashboards"). Skip enrichment for simple lookups. After finding a dashboard, use grafana_get_dashboard to inspect panels, grafana_share_dashboard to render a chart, or grafana_update_dashboard to modify it.
grafana_share_dashboard
When: User says "show me" or "send me" a chart/dashboard. Params: dashboardUid, panelId (required). Optional: from (default "now-6h"), to (default "now"), width (default 1000), height (default 500), theme ("light"/"dark", default "dark"). Example: { "dashboardUid": "abc123", "panelId": 2, "from": "now-6h", "to": "now" } Returns: Image rendered inline (tier 1), or snapshot URL (tier 2), or deep link (tier 3). Always delivers something. Includes deliveryTier ("image" | "snapshot" | "link"), rendererAvailable (boolean — false when Image Renderer plugin is missing), renderFailureReason (why image rendering failed), and remediation (how to fix it). Tier 3 also includes snapshotFailureReason. Tip: Use grafana_get_dashboard first to find panel IDs. If rendererAvailable is false, tell the user to install the grafana-image-renderer plugin.
grafana_create_alert
When: User wants notifications when a metric crosses a threshold. Params: title, datasourceUid, expr (PromQL), threshold (all required). Optional: evaluation ("instant"/"rate"/"increase", default "instant"), evaluationWindow (default "5m", used with rate/increase), condition (gt/lt/gte/lte, default gt), for (duration, default 5m), folderUid, labels (e.g., { "severity": "warning" }), annotations (e.g., { "summary": "Cost too high" }), noDataState (NoData/Alerting/OK, default NoData). IMPORTANT: For counter metrics (_total), always use evaluation: "rate" (per-second rate) or evaluation: "increase" (total change over window). Raw counter values always increase and will immediately breach any threshold. Use "instant" (default) only for gauges. Example gauge alert: { "title": "High Cost Alert", "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "threshold": 5, "condition": "gt" } Example rate alert: { "title": "High Error Rate", "datasourceUid": "prom1", "expr": "openclaw_lens_webhook_error_total", "threshold": 0.1, "evaluation": "rate" } Example increase alert: { "title": "Token Burst", "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total", "threshold": 10000, "evaluation": "increase", "evaluationWindow": "1h" } Returns: { uid, title, status: "created", datasourceUid, url, evaluation?: { mode, window, evaluatedExpr }, metricValidation: { valid, error?, sampleValue? }, message }. The datasourceUid echoes back which datasource the rule targets (verify correctness). metricValidation dry-runs the expression before creation — valid: true + sampleValue confirms data exists; valid: false + error warns of typos/missing metrics. Alert is always created regardless (metric may not have data yet). When evaluation is "rate" or "increase", validation runs the wrapped expression. Note: Auto-creates a "Grafana Lens Alerts" folder if no folderUid is specified.
grafana_annotate
When: User deploys, changes config, or wants to mark an event for correlation. Params: action ("create" default, or "list"). Create params: text (required), tags, dashboardUid, panelId, time (epoch ms or relative like "now-2h", default now), timeEnd (epoch ms or relative). List params: from, to (epoch ms or relative like "now-7d", "now-24h", "now"), tags, limit (default 20). Time formats: All time params accept epoch ms (e.g., 1700000000000) OR Grafana-style relative strings ("now", "now-1h", "now-7d", "now-30m"). Prefer relative strings — they're simpler and avoid arithmetic errors. Example create: { "text": "Deployed v2.1.0", "tags": ["deploy", "production"] } Example create past: { "text": "Incident started", "time": "now-2h", "timeEnd": "now-30m", "tags": ["incident"] } Example list recent: { "action": "list", "from": "now-7d", "to": "now", "tags": ["deploy"] } Example list: { "action": "list", "tags": ["deploy"], "limit": 10 } Returns create: { status: "created", id, message, time, comparisonHint: { beforeWindow: { from, to }, afterWindow: { from, to }, suggestion } }. The comparisonHint provides ready-to-use ISO 8601 time ranges (30-min windows) for before/after comparison via grafana_query — no manual time math needed. For region annotations (with timeEnd), afterWindow starts at timeEnd. Returns list: { annotations: [{ id, text, tags, time, timeEnd?, dashboardUID?, panelId? }] }.
grafana_check_alerts
When: Prompt context shows "GRAFANA ALERTS", need to manage alert rules (list/delete), set up the alert webhook, silence alerts during investigation, or acknowledge an investigated alert. Params: action ("list" default, "acknowledge", "list_rules", "delete_rule", "silence", "unsilence", "setup"). List params: None — returns all pending (unacknowledged) alerts. Instances capped at 5 per alert. Acknowledge params: alertId (required) — marks an alert as investigated. List rules params: compact (boolean, default false — returns only uid/title/state/condition). Full mode returns all configured alert rules from Grafana with UID, title, condition (PromQL), folder, labels, annotations, AND live evaluation state (normal/firing/pending/nodata/error), health, and lastEvaluation. One call gives the complete alert health picture. Delete rule params: ruleUid (required) — permanently deletes an alert rule. Get UIDs from list_rules. Silence params: matchers (required — array of { name, value, isRegex? } from alert's commonLabels), duration (default "2h"), comment (optional). Unsilence params: silenceId (required) — removes a silence so alerts resume notifying. Setup params: webhookUrl (optional, auto-detected) — creates webhook contact point and notification policy route in Grafana. Example list: {} Example acknowledge: { "action": "acknowledge", "alertId": "alert-1" } Example list rules: { "action": "list_rules" } Example list rules compact: { "action": "list_rules", "compact": true } Example delete rule: { "action": "delete_rule", "ruleUid": "abc123-def456" } Example silence: { "action": "silence", "matchers": [{ "name": "alertname", "value": "HighCost" }], "duration": "2h", "comment": "Investigating cost spike" } Example unsilence: { "action": "unsilence", "silenceId": "silence-uuid-123" } Example setup: { "action": "setup" } Returns list: { status: "success", alertCount, alerts: [{ id, status, title, message, receivedAt, commonLabels, totalInstances, truncated?, suggestedInvestigation?: { datasourceUid, condition, tool, queryLanguage, hint }, instances: [{ status, labels, annotations, startsAt, values }] }] }. suggestedInvestigation is auto-enriched by matching the alert to its rule — provides the PromQL/LogQL expression, datasource, and tool to use for immediate investigation (eliminates the need for separate list_rules + explore_datasources calls). Returns acknowledge: { status: "acknowledged", alertId }. Returns list_rules: { status: "success", ruleCount, rules: [{ uid, title, folder, ruleGroup, state, health, lastEvaluation, for, labels, annotations, condition, updated }] }. state is the live evaluation state: "normal" (not firing), "firing", "pending" (within for duration), "nodata", or "error". Falls back to "unknown" if the eval state API is unavailable. health is "ok", "nodata", "error", or "unknown". condition is the extracted PromQL expression from the rule's data queries. Returns list_rules (compact): { status: "success", ruleCount, rules: [{ uid, title, state, condition }] }. Minimal fields for multi-tool chains — use when you need a quick overview of all rules without details. Returns delete_rule: { status: "deleted", ruleUid, message }. Returns silence: { status: "silenced", silenceId, duration, matchers, message }. Returns unsilence: { status: "unsilenced", silenceId, message }. Returns setup: { status: "created", contactPointUid, webhookUrl } or { status: "already_exists", contactPointUid }. Note: Setup is idempotent — safe to call multiple times. Only alerts with managed_by=openclaw label route to the webhook (auto-added by grafana_create_alert). Use list_rules → delete_rule for full alert lifecycle management (create via grafana_create_alert, list/delete via grafana_check_alerts).
grafana_push_metrics
When: User wants to track custom data (calendar events, git commits, fitness stats, financial data) in Grafana. Params: action ("push" default, "register", "list", "delete"). Push params: metrics (required array) — each: { name, value, labels?, type?, help?, timestamp? }. Names auto-get openclaw_ext_ prefix. timestamp is optional ISO 8601 for historical data (gauge only). Register params: name (required), type ("gauge"/"counter", default "gauge"), help, labelNames (array), ttlDays. List params: None — returns all custom metric definitions. Delete params: name (required) — removes a custom metric. Example push: { "metrics": [{ "name": "steps_today", "value": 8000 }, { "name": "meetings", "value": 3, "labels": { "type": "standup" } }] } Example backfill: { "metrics": [{ "name": "steps", "value": 8000, "timestamp": "2025-01-15" }, { "name": "steps", "value": 10500, "timestamp": "2025-01-16" }] } Example mixed: { "metrics": [{ "name": "steps", "value": 9000, "timestamp": "2025-01-17" }, { "name": "heart_rate", "value": 72 }] } Example register: { "action": "register", "name": "weight_kg", "type": "gauge", "help": "Body weight", "labelNames": ["person"], "ttlDays": 90 } Example list: { "action": "list" } Example delete: { "action": "delete", "name": "old_metric" } Returns push: { status: "ok", accepted: 2, queryNames: { "openclaw_ext_steps": "openclaw_ext_steps", "openclaw_ext_events": "openclaw_ext_events_total" }, suggestedWorkflow: [{ tool, action, example }], message: "..." }. suggestedWorkflow contains concrete next-step examples using the actual pushed metric names — verify (grafana_query), visualize (grafana_create_dashboard with metric-explorer template), and alert (grafana_create_alert, single-metric only). Partial success supported. Timestamped and real-time points in the same batch are both accepted. Returns register: { status: "registered", metric: { name, type, help, labelNames, ttlMs }, queryName: "openclaw_ext_events_total", suggestedWorkflow: [{ tool, action, example }] }. suggestedWorkflow shows how to push data and query the registered metric (with rate() wrapping for counters). Returns list: { count, metrics: [{ name, type, queryName, help, labelNames, createdAt, updatedAt }] }. Returns delete: { status: "deleted", name }. Note: Push auto-registers unknown metrics. Response includes queryNames with exact PromQL names and suggestedWorkflow with concrete next steps. Follow suggestedWorkflow to complete the push→visualize pipeline. Timestamped pushes are gauge-only — counters with timestamps are rejected. See external-data.md for naming conventions and backfill patterns.
grafana_explain_metric
When: User asks "what does this metric mean?", "why did it spike?", "is this normal?", or "show me the trend". Params: datasourceUid (required), expr (PromQL or plain metric name, required), period (24h/7d/30d, default 24h), compareWith ("previous" — compare current period with the same-length window immediately before it). Example: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd" } Example counter: { "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total" } Example 7d: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d" } Example comparison: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d", "compareWith": "previous" } Example PromQL: { "datasourceUid": "prom1", "expr": "rate(http_requests_total[5m])", "period": "24h" } Returns: { metricType?, trendQuery?, current: { value, timestamp }, healthContext?: { status, thresholds, description, direction }, trend: { changePercent, direction, first, last }, stats: { min, max, avg, samples }, comparison?: { previousPeriod: { from, to, avg, min, max, samples }, change: { absolute, percentage, direction } }, metadata: { type, help, unit }, suggestedQueries?: [{ query, description }], suggestedBreakdowns?: string[] }. Sections omitted when data unavailable. changePercent is null when first value is zero. healthContext is included for well-known openclaw_lens_ gauge metrics — same as grafana_query. Counter-aware: Auto-detects counter metrics (via metadata type or _total suffix) and wraps the trend query in rate(expr[5m]). The current value stays raw (cumulative total), but trend and stats show rate of change. metricType field tells you the detected type (counter/gauge/histogram). trendQuery shows the actual PromQL used for trend (only present when different from expr). Drill-down: For multi-dimensional metrics (metrics with labels like model, token_type, provider), the response includes suggestedQueries — ready-to-use PromQL queries for grafana_query that break down the metric by each label. Counter metrics get rate() wrapping automatically. Use these to investigate cost attribution, identify top contributors, or decompose aggregates. Breakdowns: suggestedBreakdowns provides label names for decomposition — always available for known OpenClaw metrics (cost, session, queue, webhook families) even when the metric has no data yet. For unknown metrics, falls back to labels discovered from the instant query. Use these labels with grafana_query to build sum by (label) (...) queries for root-cause analysis. Period comparison: Use compareWith: "previous" for period-over-period analysis (e.g., this week vs. last week). Returns a comparison object with the previous period's stats and the change (absolute, percentage, direction). Works with counters too (compares rates). Eliminates the need for manual multi-query workflows. Tip: For simple trend context, call with just period. For "did things improve?" questions, add compareWith: "previous". Metadata only available for plain metric names (not complex PromQL). No need to manually wrap counters in rate() — the tool does it automatically.
grafana_security_check
When: User asks "am I being attacked?", "security status", "security audit", "security check", or wants a co

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

Template	Tier	Domain	Variables	Use When
`llm-command-center`	Tier 1	System overview	`$prometheus`, `$loki`, `$tempo`, `$provider`, `$model`, `$channel`	Golden signals, session table with click-to-drill-down, cost, cache, live feeds
`session-explorer`	Tier 2	Session debug	`$prometheus`, `$loki`, `$tempo`, `$session` (textbox)	Per-session trace hierarchy, LLM calls, tool calls, conversation flow
`cost-intelligence`	Tier 3a	Cost analysis	`$prometheus`, `$loki`, `$provider`, `$model`	Spending trends, model attribution, cache savings, per-session cost table
`tool-performance`	Tier 3b	Tool analytics	`$prometheus`, `$loki`, `$tempo`, `$tool`	Tool leaderboard, latency ranking, error rates, tool traces
`sre-operations`	Tier 3c	SRE operations	`$prometheus`, `$loki`	Queue health, webhooks, stuck sessions, tool loops
`genai-observability`	—	OTel gen_ai standard	`$prometheus`, `$loki`, `$tempo`, `$model`, `$provider`	Industry-standard AI monitoring: token analytics, LLM performance, traces, logs, cache efficiency. Works with any gen_ai data.
`node-exporter`	—	System/DevOps	`$datasource`, `$instance`	Server CPU, memory, disk, network
`http-service`	—	Web/DevOps	`$datasource`, `$job`	HTTP request rate, errors, latency (RED signals)
`metric-explorer`	—	Any domain	`$datasource`, `$metric`	Deep-dive into any single metric from a dropdown
`multi-kpi`	—	Any domain	`$datasource`, `$metric1`..`$metric4`	4-metric KPI overview (business, fitness, finance, IoT)
`weekly-review`	—	Any domain	`$datasource`, `$metric1`, `$metric2`	Weekly overview of 2 external metrics with trends + all openclaw_ext_ table

License

运行时依赖

版本

安装命令 点击复制

技能文档

Musts

Quick Decision Tree

Data Collection Pipelines (Alloy)

Working 带有 Multiple Grafana Instances

Tool Inventory

Tool Details

grafana_explore_datasources

grafana_list_metrics

grafana_query

grafana_query_logs

grafana_query_traces

grafana_create_dashboard

grafana_update_dashboard

grafana_get_dashboard

grafana_search

grafana_share_dashboard

grafana_create_alert

grafana_annotate

grafana_check_alerts

grafana_push_metrics

grafana_explain_metric

grafana_security_check

Musts

Quick Decision Tree

Data Collection Pipelines (Alloy)

Working with Multiple Grafana Instances

Tool Inventory

Tool Details

grafana_explore_datasources

grafana_list_metrics

grafana_query

grafana_query_logs

grafana_query_traces

grafana_create_dashboard

grafana_update_dashboard

grafana_get_dashboard

grafana_search

grafana_share_dashboard

grafana_create_alert

grafana_annotate

grafana_check_alerts

grafana_push_metrics

grafana_explain_metric

grafana_security_check

安装命令点击复制

`grafana_explore_datasources`

`grafana_list_metrics`

`grafana_query`

`grafana_query_logs`

`grafana_query_traces`

`grafana_create_dashboard`

`grafana_update_dashboard`

`grafana_get_dashboard`

`grafana_search`

`grafana_share_dashboard`

`grafana_create_alert`

`grafana_annotate`

`grafana_check_alerts`

`grafana_push_metrics`

`grafana_explain_metric`

`grafana_security_check`

`grafana_explore_datasources`

`grafana_list_metrics`

`grafana_query`

`grafana_query_logs`

`grafana_query_traces`

`grafana_create_dashboard`

`grafana_update_dashboard`

`grafana_get_dashboard`

`grafana_search`

`grafana_share_dashboard`

`grafana_create_alert`

`grafana_annotate`

`grafana_check_alerts`

`grafana_push_metrics`

`grafana_explain_metric`

`grafana_security_check`