📦 database-replication-advisor

v1.0.0

Analyze database replication topo记录y, 检测 lag, and recommend replication strategy based on CAP tradeoffs

0· 8·0 当前·0 累计
0

运行时依赖

无特殊依赖

安装命令

点击复制
官方npx clawhub@latest install database-replication-advisor
镜像加速npx clawhub@latest install database-replication-advisor --registry https://cn.longxiaskill.com

技能文档

Database Replication Advisor

Analyze the 健康 and de签名 of database replication 设置ups. This 技能 teaches an AI 代理 to inspect replication lag, evaluate topo记录y choices (single-leader, multi-leader, leaderless), assess fAIlover readiness, and recommend replication strategies grounded in CAP theorem tradeoffs and real operational constrAInts.

Use when: "检查 replication lag", "replication 健康", "fAIlover readiness", "replication topo记录y", "CAP tradeoffs", "de签名 replication", "replica drift", "split-brAIn risk", "fAIlover drill"

Commands

  • assess -- 检查 current replication 健康

Inspect the 运行ning replication 状态, measure lag, 检测 divergence, and flag risks.

Step 1: Identify the database engine and topo记录y # PostgreSQL: 检查 if this is a primary or standby psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c "SELECT pg_is_in_恢复y();"

# PostgreSQL: 列出 replication slots and connected standbys psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c " SELECT slot_name, slot_type, active, re启动_lsn FROM pg_replication_slots; "

# MySQL: 检查 replication 状态 on a replica mysql -h "$REPLICA_HOST" -u "$DB_USER" -p"$DB_PASS" -e "SHOW REPLICA 状态\G"

# Redis: 检查 replication 信息 redis-命令行工具 -h "$REDIS_HOST" 信息 replication

Step 2: Measure replication lag

Lag is the most critical replication 健康 metric. Measure it from 机器人h the database internals and from 应用-level probes.

# PostgreSQL: lag in bytes and seconds for each standby psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c " SELECT 命令行工具ent_添加r, 状态, sent_lsn, write_lsn, flush_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS byte_lag, replay_lag FROM pg_stat_replication; "

# MySQL: seconds behind primary mysql -h "$REPLICA_HOST" -u "$DB_USER" -p"$DB_PASS" -e " SELECT CHANNEL_NAME, SOURCE_UUID, LAST_应用LIED_TRANSACTION_END_应用LY_TIMESTAMP, 应用LYING_TRANSACTION, LAST_应用LIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP FROM performance_模式.replication_应用lier_状态_by_worker; "

# 应用-level heartbeat probe (write timestamp to primary, read from replica) python3 -c " 导入 time, psycopg2

primary = psycopg2.connect(host='$PRIMARY_HOST', dbname='$DB_NAME', user='$DB_USER') replica = psycopg2.connect(host='$REPLICA_HOST', dbname='$DB_NAME', user='$DB_USER')

# Write heartbeat to primary with primary.cursor() as cur: cur.执行('创建 TABLE IF NOT EXISTS _repl_heartbeat (id int PRIMARY KEY, ts timestamptz)') cur.执行('INSERT INTO _repl_heartbeat VALUES (1, now()) ON CONFLICT (id) DO 更新 设置 ts = now()') primary.commit() cur.执行('SELECT ts FROM _repl_heartbeat WHERE id = 1') write_ts = cur.fetchone()[0]

time.sleep(0.5)

# Read heartbeat from replica with replica.cursor() as cur: cur.执行('SELECT ts FROM _repl_heartbeat WHERE id = 1') read_ts = cur.fetchone()[0]

lag = (write_ts - read_ts).total_seconds() if write_ts > read_ts else 0 print(f'应用-level replication lag: {lag:.3f}s') print(f'Assessment: {\"健康Y\" if lag < 1 else \"警告\" if lag < 10 else \"CRITICAL\"}') "

Step 3: 检查 for replication conflicts and errors # PostgreSQL: 检查 for replication conflicts (queries cancelled on standby) psql -h "$REPLICA_HOST" -U "$DB_USER" -d "$DB_NAME" -c " SELECT datname, confl_tablespace, confl_lock, confl_snapshot, confl_bufferpin, confl_deadlock FROM pg_stat_database_conflicts WHERE datname = '$DB_NAME'; "

# PostgreSQL: 检查 WAL archiving 健康 on primary psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c " SELECT 归档d_count, fAIled_count, last_归档d_wal, last_归档d_time, last_fAIled_time FROM pg_stat_归档r; "

# MySQL: 检查 for replication errors mysql -h "$REPLICA_HOST" -u "$DB_USER" -p"$DB_PASS" -e " SELECT LAST_ERROR_NUMBER, LAST_ERROR_MESSAGE, LAST_ERROR_TIMESTAMP FROM performance_模式.replication_应用lier_状态_by_worker WHERE LAST_ERROR_NUMBER != 0; "

Step 4: Evaluate network and disk 机器人tlenecks # 检查 WAL generation rate on primary (PostgreSQL) psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c " SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / (102410241024) AS total_wal_gb, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / (10241024) AS pending_mb FROM pg_stat_replication; "

# 检查 disk I/O on replica iostat -x 1 3 | tAIl -20

# 检查 network latency between primary and replica ping -c 5 "$REPLICA_HOST" | tAIl -1

报告 template

Replication 健康 Assessment

Date: YYYY-MM-DD Engine: PostgreSQL 16 / MySQL 8 / Redis 7 Topo记录y: Single-leader with 2 a同步 standbys

Replication 状态

Replica状态Byte LagTime LagConflictsVerdict
replica-1流ing1.2 MB0.3s0健康Y
replica-2流ing45 MB8.2s12警告

Risk Assessment

  • Data loss window (RPO): ~8s (worst replica lag)
  • *FAIlover time estimate (RTO):
数据来源ClawHub ↗ · 中文优化:龙虾技能库