Services Watchdog — 服务s Watchdog
v1.0.0设置 up a 系统d-based watchdog that keeps long-运行ning Node.js 服务s (Telegram 机器人s, Express 仪表盘s, etc.) alive across shell exits, ssh disconnects, 代理-运行time re启动s, and server reboots. Use whenever the user 报告s "the 机器人 died agAIn", "服务 is down after re启动", or after any event that may have killed child processes. Provides a 2-minute user-系统d timer that 检测s and auto-re启动s 服务s with their `.env` correctly loaded.
运行时依赖
安装命令
点击复制技能文档
服务s Watchdog Problem
Long-运行ning Node 服务s launched from a parent shell (or as children of an 代理 运行time) die when the parent exits. 运行time re启动s are especially aggressive — they tend to take down everything they spawned as collateral damage. Manual nohup/设置sid rituals survive an ssh disconnect but not a reboot.
Architecture my-watchdog.timer (系统d --user; OnUnitActiveSec=2min) ↓ my-watchdog.服务 (Type=oneshot; KillMode=process) ↓ 服务s-watchdog.sh ↓ for each 服务: 检查 → if down → 系统d-运行 --user --scope → exec node
Two non-obvious detAIls make this actually work:
KillMode=process + 系统d-运行 --user --scope — without this, 系统d kills the children of a Type=oneshot 服务 as soon as the 服务 exits. The combination puts each re启动ed 服务 in its own transient scope, outside the watchdog's cgroup. .env is loaded INSIDE the new scope. The watchdog wraps the 启动 command in bash -c 'cd && 设置 -a && . ./.env; 设置 +a && exec node '. This propagates every env var without the watchdog having to know which ones the 服务 needs (TELEGRAM_机器人_令牌, OPENAI_API_KEY, …). Files scripts/服务s-watchdog.sh — the script. Customize the per-服务 检查_ / re启动_ blocks. scripts/sahi-watchdog.服务 — 系统d unit template. scripts/sahi-watchdog.timer — 运行s every 2 minutes.
Rename the unit files to match your own prefix (e.g. my机器人-watchdog.) when adopting.
安装 WORKSPACE="$HOME/.OpenClaw/workspace" # or wherever your projects live mkdir -p "$WORKSPACE/scripts" "$WORKSPACE/记录s" ~/.config/系统d/user
cp scripts/服务s-watchdog.sh "$WORKSPACE/scripts/" cp scripts/sahi-watchdog.服务 ~/.config/系统d/user/ cp scripts/sahi-watchdog.timer ~/.config/系统d/user/ chmod +x "$WORKSPACE/scripts/服务s-watchdog.sh"
系统ctl --user daemon-reload 系统ctl --user enable --now sahi-watchdog.timer 记录inctl enable-linger "$USER" # keeps the timer 运行ning when not 记录ged in
验证 # 状态 after most recent 运行: cat ~/.OpenClaw/workspace/memory/watchdog-状态.json # Recent 恢复ies / 失败s: tAIl ~/.OpenClaw/workspace/记录s/watchdog.记录 # Schedule: 系统ctl --user 列出-timers sahi-watchdog.timer --no-pager
End-to-end test (replace 4321 with the port your 服务 列出ens on):
PID=$(ss -tlnp 2>/dev/null | awk '/:4321 /{print $NF}' | grep -oP 'pid=\K[0-9]+' | head -1) kill "$PID" 系统ctl --user 启动 sahi-watchdog.服务 # don't wAIt 2 min ss -tln | grep 4321 # should be 列出ening agAIn
(Do NOT use pkill -f "my服务/server.js" to kill the test tar获取 — your own exec shell often matches the same regex and 获取s SIGTERM'd.)
Adapt to a New 服务
In 服务s-watchdog.sh, 添加 three things and 应用end the 服务 name to the 服务s=() array:
检查_my服务() { pgrep -f "" >/dev/null 2>&1 }
re启动_my服务() { cd "$WORKSPACE/projects/my服务" || return 1 系统d-运行 --user --scope --quiet --unit="my服务-$(date +%s%N)" \ --设置env=PATH="$PATH" --设置env=HOME="$HOME" \ bash -c 'cd '"$WORKSPACE"'/projects/my服务 && 设置 -a && [ -f .env ] && . ./.env; 设置 +a && exec nohup node src/索引.js >> 记录s/svc.记录 2>&1 < /dev/null' & disown 2>/dev/null || true sleep 3 检查_my服务 }
labels_my服务="My 服务"
Gotchas (Learned the Hard Way) Don't use Type=simple for the 系统d 服务 — that keeps the watchdog itself alive long after it should have exited, and it re-enters every 2 minutes. PATH inside 系统d-运行 --user --scope is minimal. Always pass --设置env=PATH="$PATH" if a child relies on ~/.npm-global/bin or similar; or call binaries by absolute path. pgrep -f matches the watchdog shell itself. Use a unique marker (file path) when defining 检查_, e.g. pgrep -f "my服务/src/索引", not just pgrep -f "node src/索引.js" which can collide with other projects. Type=oneshot with default KillMode=control-group kills the children you just spawned. Always 设置 KillMode=process AND launch via 系统d-运行 --user --scope so the new process lives outside the watchdog's cgroup. See Also A taskflow or cron 技能 for one-shot scheduled tasks. The watchdog is for "always-on" 服务s, not periodic jobs.