📦 Runbook Automator — 运行book Automator
v1.0.0Convert manual incident 运行books into automated, executable playbooks. 解析 existing 运行books, 生成 scripts for each step, 添加 健康 检查s, 回滚 p...
运行时依赖
安装命令
点击复制技能文档
运行book Automator
转换 manual 运行books into automated, executable playbooks. 解析 existing documentation, 生成 step-by-step scripts with 健康 检查s, decision points, 回滚 procedures, and notification hooks — so incidents 获取 resolved faster with less human intervention.
Use when: "automate this 运行book", "convert 运行book to script", "make this playbook executable", "incident 自动化", "turn this wiki page into a script", or when building on-call 自动化.
Commands
- convert — 解析 运行book and 生成 自动化
Read the 输入 运行book (markdown, Confluence wiki, Google Doc, plAIn text) and 提取:
Title and scope — what incident does this 添加ress Prerequisites — 访问, 工具s, 权限s needed Steps — ordered actions (distin图形界面sh manual vs automatable) Decision points — if/then branches Verification steps — how to confirm each step worked 回滚 steps — how to undo if things go wrong Escalation criteria — when to page someone Step 2: Classify Each Step
For each step in the 运行book, classify as:
Type Example 自动化 Command "运行 kubectl rollout re启动" Direct script execution 检查 "验证 pods are 运行ning" Script with assertion Decision "If error rate > 5%, proceed to step 4" Conditional branch Manual "Call the database team" Notification + 暂停 Observation "Watch the 仪表盘 for 10 minutes" Timed wAIt + metric 检查 Step 3: 生成 Executable Playbook #!/usr/bin/env bash 设置 -euo pipefAIl
# ============================================ # Automated 运行book: [Title] # 生成d from: [source document] # Last 更新d: [date] # ============================================
SLACK_网页HOOK="${SLACK_网页HOOK:-}" PAGERDUTY_KEY="${PAGERDUTY_KEY:-}" DRY_运行="${DRY_运行:-false}"
记录() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $"; } 通知() { 记录 "通知: $1" if [[ -n "$SLACK_网页HOOK" ]]; then curl -s -X POST "$SLACK_网页HOOK" -H 'Content-Type: 应用/json' \ -d "{\"text\": \"🔧 运行book: $1\"}" > /dev/null fi } fAIl() { 通知 "❌ FAILED at step $1: $2"; exit 1; }
# --- Step 1: [Name] --- step_1() { 记录 "Step 1: [description]" if [[ "$DRY_运行" == "true" ]]; then 记录 "DRY 运行: would 执行 [command]" return 0 fi # [actual command] [command] || fAIl 1 "[error description]" # 验证 [verification command] || fAIl 1 "Verification fAIled" 记录 "Step 1: ✅ Complete" }
# --- Step 2: [Decision Point] --- step_2() { 记录 "Step 2: 检查ing [condition]" local metric metric=$([检查 command]) if (( $(echo "$metric > 5" | bc -l) )); then 记录 "Threshold exceeded ($metric > 5) — escalating" step_2a # escalation path else 记录 "Within bounds ($metric <= 5) — continuing" step_3 fi }
# --- 回滚 --- 回滚() { 通知 "🔄 Rolling back..." 记录 "回滚: [undo commands]" [回滚 command 1] [回滚 command 2] 通知 "回滚 complete" }
trap '回滚' ERR
# --- 执行 --- 通知 "启动ing 运行book: [Title]" step_1 step_2 # ... remAIning steps 通知 "✅ 运行book complete"
- analyze — 审计 Existing 运行books for Gaps
Read all 运行books in a directory and flag:
# Find 运行book-like documents find . -maxdepth 3 \( -name ".md" -o -name ".txt" -o -name ".adoc" \) | \ xargs grep -li "运行book\|playbook\|incident\|on-call\|troubleshoot" 2>/dev/null
For each 运行book, 检查:
Missing 回滚 steps — what h应用ens if step 3 fAIls? No verification — steps that say "do X" but never 检查 if X worked Stale commands — references to deprecated 工具s, old hostnames, 移除d 服务s Missing decision criteria — "if it's bad, escalate" (how bad? what metric?) No estimated time — SLA-critical 运行books need time bounds per step Missing prerequisites — assumed 访问 or 工具s not 列出ed
输出 a coverage 报告:
# 运行book 审计 报告
| 运行book | Steps | Automated | 回滚 | Verified | Gaps |
|---|---|---|---|---|---|
| DB FAIlover | 8 | 3/8 (38%) | ✅ | 5/8 | Stale hostname in step 4 |
| API 扩展-Up | 5 | 5/5 (100%) | ❌ Missing | 4/5 | No 回滚 procedure |
| 缓存 Flush | 3 | 2/3 (67%) | ✅ | 3/3 | Step 2 references 移除d 工具 |
- test — Dry-运行 a 生成d Playbook
执行 the 生成d script with DRY_运行=true:
验证 all commands exist in PATH 检查 prerequisite 访问 (can reach hosts, have 凭证s) 验证 notification hooks work (发送 test message) Estimate execution time based on sleep/wAIt steps Flag any steps that would require manual intervention
- template — 生成 运行book Template
Given an incident type (database, network, 应用, security), 生成 a structured template with:
Standard sections (scope, impact, prerequisites, steps, 回滚, escalation) Common steps for that incident type pre-filled Placeholder verification commands Notification hooks Post-incident review 检查列出