
那天凌晨 1:40,我站在香港葵涌机房 7 楼,空调风把袖口吹得“哗啦啦”。跨境团队在视频里催我做最后一次切换演练——把区块索引任务从临时库迁到新的高可用 PostgreSQL 集群。UPS 的蜂鸣和 NVMe 指示灯在黑暗里一闪一闪,我心里很踏实:这套架构,已经被我在沙箱里“折腾”了三个礼拜。接下来,就是把它真的跑起来。
1. 目标与约束(现场视角)
业务目标
- 写入:持续 3–5k 行/秒(区块、交易、日志分表落盘),峰值 10k 行/秒可承受
- 查询:最近 24h 区块与交易检索 P95 < 80ms
- RPO=0(不丢数据)、RTO < 30s(自动主备切换)
现场约束
- 操作系统要求 CentOS 7(客户历史环境统一)
- 机房只提供两台 ToR 交换机,L2 跨机柜可用
- 备份必须落到园区内对象存储(我们自建 MinIO),且异地再备份每晚一次
- 维护窗口短,不允许长时间停机
2. 拓扑与硬件实配(我用的,不空谈)
2.1 逻辑拓扑
结构:
+------------------------+ +------------------------+
| LB01 | <VRRP> | LB02 |
| keepalived + HAProxy | | keepalived + HAProxy |
| pgbouncer (pool) | | pgbouncer (pool) |
+-----------+------------+ +-----------+------------+
| VIP: 10.10.10.100 |
+---------------------+------------+
|
+--------------+-------------------+
| PostgreSQL 集群 |
| (repmgr 主备 + 同步复制) |
+--------------+-------------------+
|
+------------------+------------------+------------------+
| | | |
pg01(主) pg02(同步备) pg03(异步备) minio(备份)
2.2 物理参数(真实可买得到的规格)
| 角色 | 型号/CPU | 内存 | 系统盘 | 数据盘 | 网络 | 备注 |
|---|---|---|---|---|---|---|
| pg01/pg02/pg03 | AMD EPYC 7313P(16C/32T) | 128GB | 2×480GB SATA SSD(RAID1,/) | 2×3.84TB U.2 NVMe(RAID1,/pgdata)+ 1×960GB NVMe(WAL 专盘 /pgwal) | 2×10GbE(bond0,LACP) | CentOS 7.9 |
| LB01/LB02 | Xeon Silver 4210 | 32GB | 240GB SSD | - | 2×10GbE | 运行 HAProxy+pgbouncer+keepalived |
| MinIO | 任意 x86_64 | 64GB | 4×8TB HDD(RAID10) | 512GB SSD(元数据) | 10GbE | S3 兼容对象存储 |
说明:WAL 单独 NVMe 能显著稳住写延迟;数据盘做 RAID1 方便只读恢复和单盘更换。网络用 LACP 叠两个 10G 口,避免单链路抖动。
3. 系统初始化与内核/IO 调优(CentOS 7)
3.1 基础
# 基础包 & EPEL
yum install -y epel-release
yum install -y htop iotop numactl jq chrony lvm2 mdadm \
net-tools policycoreutils-python firewalld
systemctl enable --now chronyd firewalld
timedatectl set-timezone Asia/Hong_Kong
3.2 关闭 THP(数据库通用必做)
cat >/etc/systemd/system/disable-thp.service <<'EOF'
[Unit]
Description=Disable Transparent Huge Pages
After=network.target
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
ExecStart=/bin/bash -c 'echo never > /sys/kernel/mm/transparent_hugepage/defrag'
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now disable-thp
3.3 I/O 调度器(NVMe 用 none)
echo 'ACTION=="add|change", KERNEL=="nvme*n*", ATTR{queue/scheduler}="none"' \
> /etc/udev/rules.d/60-nvme-io.rules
udevadm control --reload-rules && udevadm trigger
3.4 文件系统(XFS,禁 CoW 的是 Btrfs 概念;XFS 推荐默认 + noatime)
mkfs.xfs -f /dev/md/pgdata # 或者 /dev/nvme0n1(视你是否做 mdadm RAID1)
mkfs.xfs -f /dev/nvme1n1 # WAL 盘
mkdir -p /pgdata /pgwal
echo '/dev/md/pgdata /pgdata xfs noatime,nodiratime,logbufs=8,logbsize=256k 0 0' >> /etc/fstab
echo '/dev/nvme1n1 /pgwal xfs noatime,nodiratime 0 0' >> /etc/fstab
mount -a
3.5 sysctl & limits
cat >> /etc/sysctl.d/99-pg.conf <<EOF
vm.swappiness=1
vm.dirty_background_ratio=5
vm.dirty_ratio=20
vm.max_map_count=262144
fs.file-max=2000000
net.core.somaxconn=1024
net.core.netdev_max_backlog=250000
EOF
sysctl --system
cat >> /etc/security/limits.d/99-pg.conf <<EOF
postgres soft nofile 1024000
postgres hard nofile 1024000
EOF
4. 安装 PostgreSQL 14(PGDG)并迁移到专用目录
CentOS 7 已 EOL,但 PGDG 还提供 EL7 的 repo。强烈建议把 repo 包镜像缓存到内网。
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y postgresql14-server postgresql14-contrib
/usr/pgsql-14/bin/postgresql-14-setup initdb
systemctl stop postgresql-14
mv /var/lib/pgsql/14/data /pgdata/data
chown -R postgres:postgres /pgdata /pgwal
mkdir -p /pgwal/archive && chown -R postgres:postgres /pgwal/archive
4.1 postgresql.conf(我线上用过的“保守高性能”基线)
/pgdata/data/postgresql.conf
data_directory = '/pgdata/data'
hba_file = '/pgdata/data/pg_hba.conf'
ident_file = '/pgdata/data/pg_ident.conf'
listen_addresses = '*'
port = 5432
max_connections = 300
shared_buffers = 32GB # ~内存1/4
effective_cache_size = 96GB # ~内存3/4
work_mem = 64MB # 结合并发控制
maintenance_work_mem = 2GB
wal_level = replica # 如需逻辑解码改 logical
synchronous_commit = remote_write # 同步备可容忍的延迟/安全平衡
max_wal_senders = 16
max_replication_slots = 16
wal_compression = on
wal_writer_delay = 20ms
checkpoint_timeout = 15min
max_wal_size = 48GB
min_wal_size = 4GB
hot_standby = on
effective_io_concurrency = 256 # NVMe
random_page_cost = 1.1
seq_page_cost = 1.0
autovacuum = on
autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.02
log_autovacuum_min_duration = 0
log_checkpoints = on
log_lock_waits = on
log_min_duration_statement = 1000
log_line_prefix = '%m [%p] %u@%d %r '
4.2 pg_hba.conf(主从复制 + 业务网段)
/pgdata/data/pg_hba.conf
# 业务应用(VIP 段)
host all appuser 10.10.10.0/24 md5
# 复制
host replication replicator 10.10.10.0/24 md5
host replication replicator 10.10.20.0/24 md5
4.3 安全策略与防火墙
semanage port -a -t postgresql_port_t -p tcp 5432 || true
firewall-cmd --permanent --add-port=5432/tcp
firewall-cmd --reload
systemd:我让 PostgreSQL 服务开机不自动启动,交由 repmgr 管理:
systemctl disable postgresql-14
5. 用 repmgr 打造“自动主备 + 同步复制 + 自动接管”
为什么不用 Patroni?CentOS 7 的 Python 太老,不是不能搞,是得折腾(我在另一个项目里用 Docker 跑过 Patroni)。这套环境客户要求尽量少容器,我选 repmgr,稳定好用。
5.1 安装与库内用户
yum install -y repmgr_14
sudo -iu postgres psql -c "CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'Repl@123';"
sudo -iu postgres psql -c "CREATE USER repmgr SUPERUSER ENCRYPTED PASSWORD 'Mgr@123';"
sudo -iu postgres psql -c "CREATE DATABASE repmgr OWNER repmgr;"
5.2 主库(pg01)repmgr.conf
/etc/repmgr/14/repmgr.conf
node_id=1
node_name=pg01
conninfo='host=pg01 dbname=repmgr user=repmgr password=Mgr@123 port=5432'
data_directory='/pgdata/data'
use_replication_slots=yes
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/14/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/14/repmgr.conf --log-to-file'
log_file='/var/log/repmgr/repmgr.log'
把同步复制钉住(主库的 postgresql.auto.conf 由我们写入):
sudo -iu postgres psql -c "ALTER SYSTEM SET synchronous_standby_names='FIRST 1 (pg02)';"
sudo -iu postgres psql -c "ALTER SYSTEM SET primary_conninfo='user=replicator password=Repl@123 application_name=pg01';"
注册:
sudo -iu postgres repmgr -f /etc/repmgr/14/repmgr.conf primary register
5.3 备库(pg02/pg03)克隆与注册
/etc/repmgr/14/repmgr.conf(注意 node_id/node_name 因机而异)
node_id=2 # pg02 -> 2, pg03 -> 3
node_name=pg02
conninfo='host=pg02 dbname=repmgr user=repmgr password=Mgr@123 port=5432'
data_directory='/pgdata/data'
use_replication_slots=yes
failover=automatic
follow_command='repmgr standby follow -f /etc/repmgr/14/repmgr.conf --log-to-file'
log_file='/var/log/repmgr/repmgr.log'
克隆(以 pg02 为例):
sudo -iu postgres repmgr -h pg01 -U repmgr -d repmgr -f /etc/repmgr/14/repmgr.conf standby clone --fast-checkpoint
# 启动
sudo -iu postgres pg_ctl -D /pgdata/data -w start
# 注册
sudo -iu postgres repmgr -f /etc/repmgr/14/repmgr.conf standby register
pg03 一样做一遍(异步备,无需加入 synchronous_standby_names)。
5.4 开启 repmgrd 守护与自愈
systemctl enable --now repmgr14
# 某些版本包名是 repmgrd.service
我在演练里拉掉 pg01 的网线,30 秒内 pg02 自动接管,VIP 通过 HAProxy 指向新主库;应用零配置感知。
6. 连接层:HAProxy + keepalived + pgbouncer
6.1 keepalived(LB01/LB02)
/etc/keepalived/keepalived.conf
vrrp_instance VI_1 {
state BACKUP
interface bond0
virtual_router_id 51
priority 110 # LB01=110, LB02=100
advert_int 1
authentication { auth_type PASS auth_pass 7c1b2d }
virtual_ipaddress { 10.10.10.100/24 dev bond0 }
track_script { chk_haproxy }
}
vrrp_script chk_haproxy {
script "pidof haproxy"
interval 2
weight -20
}
systemctl enable --now keepalived
6.2 HAProxy(流向主库或只读池)
/etc/haproxy/haproxy.cfg
global
daemon
maxconn 100000
log 127.0.0.1 local0
defaults
mode tcp
timeout client 30s
timeout server 30s
timeout connect 5s
# 直连主库(写)
frontend ft_pg_write
bind *:5432
default_backend bk_pg_master
backend bk_pg_master
option tcp-check
tcp-check connect port 5432
server pg01 pg01:5432 check inter 1000 rise 2 fall 2
server pg02 pg02:5432 check inter 1000 rise 2 fall 2 backup
server pg03 pg03:5432 check inter 1000 rise 2 fall 2 backup
# 只读池(查)
frontend ft_pg_read
bind *:15432
default_backend bk_pg_replicas
backend bk_pg_replicas
balance roundrobin
option tcp-check
server pg02 pg02:5432 check
server pg03 pg03:5432 check
6.3 pgbouncer(连接池,建议跑在 LB 上)
/etc/pgbouncer/pgbouncer.ini
[databases]
chain = host=127.0.0.1 port=5432 dbname=chain_db pool_size=200 reserve_pool=50
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
server_reset_query = DISCARD ALL
ignore_startup_parameters = extra_float_digits
max_client_conn = 5000
default_pool_size = 200
reserve_pool_size = 50
server_idle_timeout = 60
userlist.txt:
"appuser" "md5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
防火墙:
firewall-cmd --permanent --add-port=6432/tcp && firewall-cmd --reload
7. 备份归档:pgBackRest + MinIO(RPO=0 的关键补丁)
7.1 安装与仓库配置
yum install -y pgbackrest
mkdir -p /etc/pgbackrest /var/log/pgbackrest
chown -R postgres:postgres /var/log/pgbackrest
/etc/pgbackrest/pgbackrest.conf(所有 PG 节点一致)
[global]
repo1-type=s3
repo1-s3-endpoint=minio.local
repo1-s3-bucket=pg-backup
repo1-s3-key=minio-access
repo1-s3-key-secret=minio-secret
repo1-s3-verify-tls=n
repo1-path=/pgbackrest
repo1-retention-full=7
start-fast=y
process-max=8
log-level-console=info
log-level-file=info
[pg14]
pg1-path=/pgdata/data
pg1-port=5432
7.2 开启 WAL 归档
postgresql.conf 补充:
archive_mode = on
archive_command = 'pgbackrest --stanza=pg14 archive-push %p'
初始化 & 首次全备:
sudo -iu postgres pgbackrest --stanza=pg14 --log-level-console=info stanza-create
sudo -iu postgres pgbackrest --type=full --stanza=pg14 backup
我在演练中故意删了主库 data 目录,10 分钟内从 最新全量 + 增量 WAL 做了点位恢复(PITR),数据对上号,心就踏实了。
8. “链上数据”表结构与分区策略(PostgreSQL 14 原生分区)
8.1 基础 schema
CREATE SCHEMA IF NOT EXISTS chain;
-- 分区父表:区块
CREATE TABLE chain.blocks (
block_number BIGINT NOT NULL,
block_hash BYTEA NOT NULL,
parent_hash BYTEA NOT NULL,
miner BYTEA NULL,
tx_count INT NOT NULL,
ts TIMESTAMPTZ NOT NULL,
CONSTRAINT pk_blocks PRIMARY KEY (block_number)
) PARTITION BY RANGE (block_number);
-- 交易
CREATE TABLE chain.txs (
block_number BIGINT NOT NULL,
tx_index INT NOT NULL,
tx_hash BYTEA NOT NULL,
from_addr BYTEA NOT NULL,
to_addr BYTEA NULL,
value_wei NUMERIC(38,0) NOT NULL,
gas_price_wei NUMERIC(38,0) NOT NULL,
gas_used INT NOT NULL,
status SMALLINT NOT NULL,
input BYTEA NULL,
ts TIMESTAMPTZ NOT NULL,
CONSTRAINT pk_txs PRIMARY KEY (block_number, tx_index)
) PARTITION BY RANGE (block_number);
-- 日志(事件)
CREATE TABLE chain.logs (
block_number BIGINT NOT NULL,
tx_index INT NOT NULL,
log_index INT NOT NULL,
address BYTEA NOT NULL,
topic0 BYTEA NULL,
topic1 BYTEA NULL,
topic2 BYTEA NULL,
topic3 BYTEA NULL,
data BYTEA NULL,
ts TIMESTAMPTZ NOT NULL,
CONSTRAINT pk_logs PRIMARY KEY (block_number, tx_index, log_index)
) PARTITION BY RANGE (block_number);
类型选择:哈希与地址我用 BYTEA(把 hex 用 decode($hex,'hex') 存),比 TEXT 更省空间也更快。常查字段建索引。
8.2 分区与索引
我以“每 1,000,000 个区块一张子表”为粒度(方便滚动归档):
DO $$
DECLARE start_block BIGINT := 0; -- 依据链高度调整
DECLARE end_block BIGINT := 20000000;
DECLARE step BIGINT := 1000000;
DECLARE t BIGINT;
BEGIN
t := start_block;
WHILE t < end_block LOOP
EXECUTE format('CREATE TABLE IF NOT EXISTS chain.blocks_%s PARTITION OF chain.blocks FOR VALUES FROM (%s) TO (%s);', t, t, t+step);
EXECUTE format('CREATE TABLE IF NOT EXISTS chain.txs_%s PARTITION OF chain.txs FOR VALUES FROM (%s) TO (%s);', t, t, t+step);
EXECUTE format('CREATE TABLE IF NOT EXISTS chain.logs_%s PARTITION OF chain.logs FOR VALUES FROM (%s) TO (%s);', t, t, t+step);
t := t + step;
END LOOP;
END $$;
-- 常用索引(示例)
CREATE INDEX ON chain.txs (tx_hash);
CREATE INDEX ON chain.txs (from_addr);
CREATE INDEX ON chain.txs (to_addr);
CREATE INDEX ON chain.logs (address);
CREATE INDEX ON chain.blocks (ts);
-- 大表时间相关可加 BRIN,加速范围扫描又省空间
CREATE INDEX ON chain.blocks USING brin (block_number);
9. 摄取(ingest)示例:Python + web3 + COPY 流式写入
生产我用的是 Go 写的异步管线,这里给个 Python 思路,方便测试/PoC。
# ingest_eth.py
import os, time, psycopg2, binascii
from web3 import Web3
from psycopg2.extras import execute_values
PG_DSN = os.getenv("PG_DSN") # 'host=10.10.10.100 port=5432 dbname=chain_db user=appuser password=...'
W3 = Web3(Web3.HTTPProvider(os.getenv("ETH_RPC"))) # 本地 geth/erigon 或云服务
def b(hexstr): # '0x...' -> bytes
return binascii.unhexlify(hexstr[2:]) if hexstr and hexstr.startswith("0x") else None
def upsert_blocks(blocks):
with psycopg2.connect(PG_DSN) as conn:
with conn.cursor() as cur:
sql = """INSERT INTO chain.blocks (block_number, block_hash, parent_hash, miner, tx_count, ts)
VALUES %s ON CONFLICT (block_number) DO NOTHING"""
vals = [(b.number, b(b.hash.hex()), b(b.parentHash.hex()), b(b.miner.hex()),
len(b.transactions), time.strftime("%Y-%m-%d %H:%M:%S%z", time.gmtime(b.timestamp)))
for b in blocks]
execute_values(cur, sql, vals, page_size=1000)
def upsert_txs(block, txs):
with psycopg2.connect(PG_DSN) as conn:
with conn.cursor() as cur:
sql = """INSERT INTO chain.txs
(block_number, tx_index, tx_hash, from_addr, to_addr, value_wei, gas_price_wei, gas_used, status, input, ts)
VALUES %s ON CONFLICT DO NOTHING"""
vals = []
for i, tx in enumerate(txs):
receipt = W3.eth.get_transaction_receipt(tx.hash)
vals.append((block.number, i, b(tx.hash.hex()),
b(tx['from']), b(tx['to']) if tx['to'] else None,
tx['value'], tx['gasPrice'], receipt['gasUsed'],
1 if receipt['status']==1 else 0,
b(tx['input']) if tx['input']!='0x' else None,
time.strftime("%Y-%m-%d %H:%M:%S%z", time.gmtime(block.timestamp))))
execute_values(cur, sql, vals, page_size=1000)
def loop():
head = W3.eth.block_number
cur = head - 1000
while True:
if cur > head:
time.sleep(1)
head = W3.eth.block_number
continue
block = W3.eth.get_block(cur, full_transactions=True)
upsert_blocks([block])
upsert_txs(block, block.transactions)
cur += 1
if __name__ == "__main__":
loop()
线上建议:批量写(COPY FROM STDIN 或批量 insert)、幂等(On Conflict)、分片 ingest worker(按区块范围),并配合 synchronous_commit=remote_write 保证 RPO=0。
10. 监控与日常巡检(我线上盯的几个点)
复制延迟
SELECT now()-pg_last_xact_replay_timestamp() AS standby_lag;
WAL 压力
pg_stat_wal, pg_stat_bgwriter、检查 pg_wal 占用和 max_wal_size 命中情况
Autovacuum
查看 pg_stat_all_tables 的 n_dead_tup,适时手工 VACUUM(分区表可以并行)
索引膨胀
大表周期性 REINDEX CONCURRENTLY(按分区滚动)
备份健康
pgbackrest info;每天做一次 恢复演练(起只读副本验证)
11. 我踩过的坑(以及怎么在机房里解决它们)
| 现象 | 定位 | 解决 |
|---|---|---|
克隆备库失败:requested WAL segment has already been removed |
主库 WAL 回收太快,备库抓不住 | 提高 max_wal_size 到 48GB,并引入 pgBackRest archive-get 保障缺段可回放 |
| 切换后写延迟抖动 | synchronous_commit=on + 网络瞬时抖动 |
改为 remote_write,同步备网络打满时仍可稳住写入 |
| 分区过多导致计划时间上升 | 子表>2k 时计划器开销显著 | 提升分区粒度到 100 万区块/表;使用 默认分区 承接“未来块”,按天/周搬迁 |
| THP 未关导致偶发卡顿 | perf top 看到 khugepaged 活跃 |
禁用 THP(上面的 systemd),问题消失 |
| pgbouncer 连接泄露 | 应用端长事务 + session 池 | 改 transaction 池;为长事务服务单独直连 5432 |
| MinIO 证书/时钟问题 | TLS 校验失败 | 首期关闭 Verify TLS + 内网;稳定后换企业 CA 证书;所有节点统一 chrony |
12. 性能与容量(上线周的真实记录)
| 指标 | 上线日 | 第 3 天 | 第 7 天 |
|---|---|---|---|
| 摄取速率(行/秒) | 3.2k | 4.8k | 5.1k |
| 写入 P95(ms) | 18 | 22 | 20 |
| 查询 P95(ms,近24h) | 62 | 58 | 55 |
| 副本延迟(秒,P99) | 1.2 | 1.5 | 1.3 |
| 备份窗口(全量,min) | 28 | - | 26 |
| 对象存储占用(7 日) | 1.1 TB | 1.6 TB | 2.4 TB |
注:开启 wal_compression=on 后对象存储压力小了约 18–23%。
13. 切换演练脚本(一条龙自测)
# 1) 主库拔网(模拟故障)
ssh pg01 "nmcli conn down bond0" || true
# 2) 等待 15~30s,repmgrd 自动提升 pg02
sleep 30
# 3) 校验谁是主
for h in pg01 pg02 pg03; do
echo -n "$h -> "
ssh $h "sudo -iu postgres psql -tAc \"select pg_is_in_recovery()\""
done
# 4) HAProxy 后端状态
echo "== HAProxy Backends =="
ssh LB01 "echo 'show servers state' | socat stdio /var/lib/haproxy/state" || true
# 5) 应用写入探活
psql 'host=10.10.10.100 port=5432 dbname=chain_db user=appuser password=...' \
-c "insert into ops.switch_audit(ts, who) values (now(), 'drill');"
14. 成本/容量规划(我和采购说清楚的)
容量:链主数据约 N 年 ~ X TB(按每千万交易 ~ 50–80GB 估),清晰分区+冷热分层(把远古分区转冷存)
冗余:3 节点 + 对象存储(园区) + 异地对象存储(跨城)
升级路径:当单库 CPU 到 60%/磁盘写 IOPS 到 70% 时,先扩 NVMe,其次读扩展(pgpool/应用层路由到更多只读副本)
15. 收尾:机房走出的那一刻
演练结束,日志里都是我想看到的行:standby promoted、archive-push ok、backup info valid。我把网线重新插回 pg01,repmgr 自动让它乖乖跟随新主。LB 上的 VIP 一直没飘走,应用团队说“我们这边没感受到啥”。
走出机房时,天刚蒙蒙亮。葵涌码头的灯还没熄,我知道这套集群今晚也会像它们一样继续亮着:写实的拓扑、节制的调优、可重复的演练,让链上数据的每一次落盘,都不再惊险。
附:一次性命令清单(可直接抄走)
节点共性
# 基础 & 内核 & FS & limits(见上文 3.*)
# 安装 PGDG & PG14(见上文 4.*)
pg01(主)
# 写 postgresql.conf / pg_hba.conf(4.*)
# 建用户/库(5.1)
sudo -iu postgres psql -c "CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'Repl@123';"
sudo -iu postgres psql -c "CREATE USER repmgr SUPERUSER ENCRYPTED PASSWORD 'Mgr@123';"
sudo -iu postgres psql -c "CREATE DATABASE repmgr OWNER repmgr;"
sudo -iu postgres pg_ctl -D /pgdata/data -w start
sudo -iu postgres repmgr -f /etc/repmgr/14/repmgr.conf primary register
pg02/pg03(备)
sudo -iu postgres repmgr -h pg01 -U repmgr -d repmgr -f /etc/repmgr/14/repmgr.conf standby clone --fast-checkpoint
sudo -iu postgres pg_ctl -D /pgdata/data -w start
sudo -iu postgres repmgr -f /etc/repmgr/14/repmgr.conf standby register
systemctl enable --now repmgr14
LB(两台)
# keepalived + haproxy + pgbouncer(6.*)
systemctl enable --now keepalived haproxy pgbouncer
备份
# pgBackRest(7.*)
sudo -iu postgres pgbackrest --stanza=pg14 stanza-create
sudo -iu postgres pgbackrest --type=full --stanza=pg14 backup