CloudNativePG 레플리카 클러스터의 스위치오버와 스위치백 (K8s 분산 토폴로지) – 2부

Swapnil Suryawanshi · 2026년 5월 26일

이 런북은 CloudNativePG(CNPG)가 관리하는 두 개의 EDB Postgres Advanced 18 클러스터 사이에서 제어된 스위치오버(Switchover)와 스위치백(Switchback)을 수행하는 운영 절차를 자세히 다룹니다. 본 글은 시리즈의 2부입니다. 아직 분산 토폴로지를 구성하지 않으셨다면 1부: Barman Cloud Plugin을 이용한 CloudNativePG PostgreSQL 레플리카 클러스터 배포부터 시작하세요.


목표

목표는 분산 토폴로지 안에서 프라이머리 클러스터(cluster-primary)와 레플리카 클러스터(cluster-replica) 사이의 프라이머리 역할을 안전하게 교대(rotate)하는 것입니다. 이 과정은 CNPG 네이티브 승격(promotion)·강등(demotion) 워크플로를 활용해 무손실(zero data loss)을 보장합니다.


환경 정보

  • 오퍼레이터 버전: 1.27.0
  • 데이터베이스: EDB Postgres Advanced 18
  • 백업: Barman-Plugin
  • 프라이머리 클러스터: cluster-primary (네임스페이스: primary)
  • 레플리카 클러스터: cluster-replica (네임스페이스: replica)

1단계 페이즈: 최초 스위치오버 (프라이머리 → 레플리카)

Step 1: 두 클러스터의 초기 상태 확인

작업을 시작하기 전에 프라이머리와 레플리카 클러스터의 상태(health)와 LSN을 확인합니다.

프라이머리:

user% kubectl cnp status cluster-primary -n primary
Cluster Summary
Name                     primary/cluster-primary
System ID:                7611448666720448538
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Primary instance:        cluster-primary-1
Primary promotion time:  2026-02-27 07:48:28 +0000 UTC (310h54m54s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    185M
Current Write LSN:       0/9000060 (Timeline: 1 - WAL File: 000000010000000000000009)

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-primary
First Point of Recoverability:  2026-02-27 13:20:08 IST
Last Successful Backup:          2026-02-27 13:20:08 IST
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              000000010000000000000008   @   2026-02-27T07:54:14.417091Z
Last Failed WAL:                -

Streaming Replication status
Replication Slots Enabled
Name                Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority  Replication Slot
----                --------   ---------  ---------  ----------  ---------  ---------  ----------  -----      ----------  -------------  ----------------
cluster-primary-2  0/9000060  0/9000060  0/9000060  0/9000060   00:00:00   00:00:00   00:00:00    streaming  async       0              active
cluster-primary-3  0/9000060  0/9000060  0/9000060  0/9000060   00:00:00   00:00:00   00:00:00    streaming  async       0              active

Instances status
Name                Current LSN  Replication role  Status  QoS          Manager Version  Node
----                -----------  ----------------  ------  ---          ---------------  ----
cluster-primary-1  0/9000060    Primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-2  0/9000060    Standby (async)    OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-3  0/9000060    Standby (async)    OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

레플리카:

user% kubectl cnp status cluster-replica -n replica
Replica Cluster Summary
Name                     replica/cluster-replica
System ID:                7611448666720448538
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Designated primary:      cluster-replica-1
Source cluster:          cluster-primary
Primary promotion time:  2026-02-27 07:52:58 +0000 UTC (310h50m34s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    104M

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-replica
First Point of Recoverability:  -
Last Successful Backup:          -
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              000000010000000000000008   @   2026-02-27T07:54:22.112507Z
Last Failed WAL:                -

Instances status
Name                Current LSN  Replication role              Status  QoS          Manager Version  Node
----                -----------  ----------------              ------  ---          ---------------  ----
cluster-replica-1  0/9000000    Designated primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-2  0/9000000    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-3  0/9000000    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

Step 2: 프라이머리 클러스터 강등(Demote)

프라이머리의 replica 섹션을 변경하고 강등 토큰(demotion token)을 획득합니다.

프라이머리 클러스터 설정 변경:

변경 전:

replica:
  primary: cluster-primary
  source: cluster-primary

변경 후:

replica:
  primary: cluster-replica
  source: cluster-replica

토큰 획득 및 상태 확인:

user% kubectl get cluster cluster-primary -n primary \
  -o jsonpath='{.status.demotionToken}'
eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjEiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAxMDAwMDAwMDAwMDAwMDAwQSIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTE0NDg2NjY3MjA0NDg1MzgiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC9BMDAwMDI4IiwidGltZU9mTGF0ZXN0Q2hlY2twb2ludCI6IlRodSBNYXIgMTIgMDY6NTA6NTEgMjAyNiIsIm9wZXJhdG9yVmVyc2lvbiI6IjEuMjcuMCJ9%

user% kubectl cnp status cluster-primary -n primary
Replica Cluster Summary
Name                     primary/cluster-primary
System ID:                7611448666720448538
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Designated primary:      cluster-primary-1
Source cluster:          cluster-replica
Primary promotion time:  2026-03-12 06:50:52 +0000 UTC (1m25s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    200M

Demotion token
Token                               eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjEiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAxMDAwMDAwMDAwMDAwMDAwQSIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTE0NDg2NjY3MjA0NDg1MzgiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC9BMDAwMDI4IiwidGltZU9mTGF0ZXN0Q2hlY2twb2ludCI6IlRodSBNYXIgMTIgMDY6NTA6NTEgMjAyNiIsIm9wZXJhdG9yVmVyc2lvbiI6IjEuMjcuMCJ9
Validity                            valid
Latest checkpoint's TimeLineID     1
Latest checkpoint's REDO WAL file  00000001000000000000000A
Latest checkpoint's REDO location  0/A000028
Database system identifier          7611448666720448538 (ok)
Time of latest checkpoint          Thu Mar 12 06:50:51 2026
Version of the operator            1.27.0

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-primary
First Point of Recoverability:  2026-02-27 13:20:08 IST
Last Successful Backup:          2026-02-27 13:20:08 IST
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              00000001000000000000000A   @   2026-03-12T06:52:15.544285Z
Last Failed WAL:                -

Instances status
Name                Current LSN  Replication role              Status  QoS          Manager Version  Node
----                -----------  ----------------              ------  ---          ---------------  ----
cluster-primary-1  0/A0000A0    Designated primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-2  0/A0000A0    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-3  0/A0000A0    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

Step 3: 레플리카 클러스터 승격(Promote)

promotionToken을 레플리카 클러스터 설정에 적용합니다.

레플리카 클러스터 설정 변경:

변경 전:

replica:
  primary: cluster-primary
  source: cluster-primary

변경 후:

replica:
  primary: cluster-replica
  promotionToken: eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjEiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAxMDAwMDAwMDAwMDAwMDAwQSIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTE0NDg2NjY3MjA0NDg1MzgiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC9BMDAwMDI4IiwidGltZU9mTGF0ZXN0Q2hlY2twb2ludCI6IlRodSBNYXIgMTIgMDY6NTA6NTEgMjAyNiIsIm9wZXJhdG9yVmVyc2lvbiI6IjEuMjcuMCJ9
  source: cluster-primary

Step 4: 첫 스위치오버/승격 후 상태 확인

역할이 올바르게 뒤바뀌었는지 확인합니다.

A] 레플리카 클러스터가 프라이머리로 정상 승격됨:

user% kubectl cnp status cluster-replica -n replica
Cluster Summary
Name                     replica/cluster-replica
System ID:                7611448666720448538
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Primary instance:        cluster-replica-1
Primary promotion time:  2026-02-27 07:52:58 +0000 UTC (311h6m25s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    169M
Current Write LSN:       0/B006E90 (Timeline: 2 - WAL File: 00000002000000000000000B)

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-replica
First Point of Recoverability:  -
Last Successful Backup:          -
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              00000002000000000000000A   @   2026-03-12T06:57:33.229487Z
Last Failed WAL:                -

Streaming Replication status
Replication Slots Enabled
Name                Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority  Replication Slot
----                --------   ---------  ---------  ----------  ---------  ---------  ----------  -----      ----------  -------------  ----------------
cluster-replica-2  0/B006E90  0/B006E90  0/B006E90  0/B006E90   00:00:00   00:00:00   00:00:00    streaming  async       0              active
cluster-replica-3  0/B006E90  0/B006E90  0/B006E90  0/B006E90   00:00:00   00:00:00   00:00:00    streaming  async       0              active

Instances status
Name                Current LSN  Replication role  Status  QoS          Manager Version  Node
----                -----------  ----------------  ------  ---          ---------------  ----
cluster-replica-1  0/B006E90    Primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-2  0/B006E90    Standby (async)    OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-3  0/B006E90    Standby (async)    OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

B] 프라이머리 클러스터가 레플리카로 정상 강등됨:

user% kubectl cnp status cluster-primary -n primary
Replica Cluster Summary
Name                     primary/cluster-primary
System ID:                7611448666720448538
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Designated primary:      cluster-primary-1
Source cluster:          cluster-replica
Primary promotion time:  2026-03-12 06:50:52 +0000 UTC (9m32s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    216M

Demotion token
Token                               eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjEiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAxMDAwMDAwMDAwMDAwMDAwQSIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTE0NDg2NjY3MjA0NDg1MzgiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC9BMDAwMDI4IiwidGltZU9mTGF0ZXN0Q2hlY2twb2ludCI6IlRodSBNYXIgMTIgMDY6NTA6NTEgMjAyNiIsIm9wZXJhdG9yVmVyc2lvbiI6IjEuMjcuMCJ9
Validity                            valid
Latest checkpoint's TimeLineID     1
Latest checkpoint's REDO WAL file  00000001000000000000000A
Latest checkpoint's REDO location  0/A000028
Database system identifier          7611448666720448538 (ok)
Time of latest checkpoint          Thu Mar 12 06:50:51 2026
Version of the operator            1.27.0

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-primary
First Point of Recoverability:  2026-02-27 13:20:08 IST
Last Successful Backup:          2026-02-27 13:20:08 IST
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              00000002000000000000000A   @   2026-03-12T06:57:39.660756Z
Last Failed WAL:                -

Instances status
Name                Current LSN  Replication role              Status  QoS          Manager Version  Node
----                -----------  ----------------              ------  ---          ---------------  ----
cluster-primary-1  0/B000000    Designated primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-2  0/B000000    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-3  0/B000000    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

Step 5: 복제 확인

새 프라이머리에서 테이블을 생성하고, 레플리카에서 확인합니다.

승격된 프라이머리 (cluster-replica):

user% kubectl cnp psql cluster-replica -n replica
psql (18.1.0)
Type "help" for help.

postgres=# \dt
Did not find any tables.
postgres=# create table test (id int);
CREATE TABLE
postgres=# insert into test values (1);
INSERT 0 1
postgres=# select * from test;
 id
----
  1
(1 row)

강등된 레플리카 (cluster-primary):

user% kubectl cnp psql cluster-primary -n primary
psql (18.1.0)
Type "help" for help.

postgres=# \dt
Did not find any tables.

Step 6: 현재 프라이머리의 신규 백업 수행

user% kubectl cnp backup cluster-replica -n replica --method=plugin --plugin-name=barman-cloud.cloudnative-pg.io
backup/cluster-replica-20260312133305 created

swapnilsuryawanshi@MAC-CR9L20YFN6 plugin % kubectl get backup -n replica
NAME                               AGE   CLUSTER           METHOD   PHASE       ERROR
cluster-replica-20260312133305    58s   cluster-replica   plugin   completed  

Step 7: 백업 후 복제 확인

user% kubectl cnp psql cluster-primary -n primary
psql (18.1.0)
Type "help" for help.

postgres=# \dt
          List of tables
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | test | table | postgres
(1 row)

postgres=# select * from test;
 id
----
  1
(1 row)

2단계 페이즈: 스위치백 (레플리카 → 프라이머리)

Step 8: 현재 프라이머리(cluster-replica)를 레플리카로 강등

설정을 변경하고 새 강등 토큰을 가져옵니다.

레플리카 클러스터 설정 변경:

변경 전:

replica:
  primary: cluster-replica
  promotionToken: eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjEiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAxMDAwMDAwMDAwMDAwMDAwQSIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTYyNzEyOTAzMDUyODIwNzMiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC9BMDAwMDI4IiwidGltZU9mTGF0ZXN0Q2hlY2twb2ludCI6IlRodSBNYXIgMTIgMDc6NTE6MjMgMjAyNiIsIm9wZXJhdG9yVmVyc2lvbiI6IjEuMjcuMCJ9
  source: cluster-primary

변경 후:

replica:
  primary: cluster-primary
  source: cluster-primary

토큰 획득 및 상태:

user% kubectl get cluster cluster-replica -n replica -o jsonpath='{.status.demotionToken}'
eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjIiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAyMDAwMDAwMDAwMDAwMDAxMCIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTYyNzEyOTAzMDUyODIwNzMiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC8xMDAwMDAyOCIsInRpbWVPZkxhdGVzdENoZWNrcG9pbnQiOiJUaHUgTWFyIDEyIDA4OjEwOjA4IDIwMjYiLCJvcGVyYXRvclZlcnNpb24iOiIxLjI3LjAifQ==%

user% kubectl cnp status cluster-replica -n replica
Replica Cluster Summary
Name                     replica/cluster-replica
System ID:                7616271290305282073
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Designated primary:      cluster-replica-1
Source cluster:          cluster-primary
Primary promotion time:  2026-03-12 08:10:09 +0000 UTC (14s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    248M

Demotion token
Token                               eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjIiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAyMDAwMDAwMDAwMDAwMDAxMCIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTYyNzEyOTAzMDUyODIwNzMiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC8xMDAwMDAyOCIsInRpbWVPZkxhdGVzdENoZWNrcG9pbnQiOiJUaHUgTWFyIDEyIDA4OjEwOjA4IDIwMjYiLCJvcGVyYXRvclZlcnNpb24iOiIxLjI3LjAifQ==
Validity                            valid
Latest checkpoint's TimeLineID     2
Latest checkpoint's REDO WAL file  000000020000000000000010
Latest checkpoint's REDO location  0/10000028
Database system identifier          7616271290305282073 (ok)
Time of latest checkpoint          Thu Mar 12 08:10:08 2026
Version of the operator            1.27.0

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-replica
First Point of Recoverability:  2026-03-12 13:33:10 IST
Last Successful Backup:          2026-03-12 13:33:10 IST
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              00000002.history   @   2026-03-12T08:10:15.149343Z
Last Failed WAL:                -

Instances status
Name                Current LSN  Replication role              Status  QoS          Manager Version  Node
----                -----------  ----------------              ------  ---          ---------------  ----
cluster-replica-1  0/100000A0    Designated primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-2  0/100000A0    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-3  0/100000A0    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

Step 9: 현재 레플리카(cluster-primary)를 프라이머리로 승격

원래 프라이머리에 승격 토큰을 다시 적용합니다.

프라이머리 클러스터 설정 변경:

변경 전:

replica:
  primary: cluster-replica
  source: cluster-replica

변경 후:

replica:
  primary: cluster-primary
  promotionToken: eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjIiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAyMDAwMDAwMDAwMDAwMDAxMCIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTYyNzEyOTAzMDUyODIwNzMiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC8xMDAwMDAyOCIsInRpbWVPZkxhdGVzdENoZWNrcG9pbnQiOiJUaHUgTWFyIDEyIDA4OjEwOjA4IDIwMjYiLCJvcGVyYXRvclZlcnNpb24iOiIxLjI3LjAifQ==
  source: cluster-replica

Step 10: 두 클러스터의 상태 확인

최종 스위치백 상태를 검증합니다.

A] cluster-primary가 프라이머리로 정상 재승격됨:

user% kubectl cnp status cluster-primary -n primary
Cluster Summary
Name                     primary/cluster-primary
System ID:                7616271290305282073
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Primary instance:        cluster-primary-1
Primary promotion time:  2026-03-12 07:51:23 +0000 UTC (25m20s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    344M
Current Write LSN:       0/11001F18 (Timeline: 3 - WAL File: 000000030000000000000011)

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-primary
First Point of Recoverability:  2026-03-12 13:15:19 IST
Last Successful Backup:          2026-03-12 13:15:19 IST
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              000000030000000000000010   @   2026-03-12T08:16:05.880203Z
Last Failed WAL:                -

Streaming Replication status
Replication Slots Enabled
Name                Sent LSN     Write LSN    Flush LSN    Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority  Replication Slot
----                --------     ---------    ---------    ----------  ---------  ---------  ----------  -----      ----------  -------------  ----------------
cluster-primary-2  0/11001F18  0/11001F18  0/11001F18  0/11001F18  00:00:00   00:00:00   00:00:00    streaming  async       0              active
cluster-primary-3  0/11001F18  0/11001F18  0/11001F18  0/11001F18  00:00:00   00:00:00   00:00:00    streaming  async       0              active

Instances status
Name                Current LSN  Replication role  Status  QoS          Manager Version  Node
----                -----------  ----------------  ------  ---          ---------------  ----
cluster-primary-1  0/11001F18    Primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-2  0/11001F18    Standby (async)    OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-primary-3  0/11001F18    Standby (async)    OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

B] cluster-replica가 레플리카로 정상 재강등됨:

user% kubectl cnp status cluster-replica -n replica
Replica Cluster Summary
Name                     replica/cluster-replica
System ID:                7616271290305282073
PostgreSQL Image:        docker.enterprisedb.com/k8s/edb-postgres-advanced:18-standard-ubi9
Designated primary:      cluster-replica-1
Source cluster:          cluster-primary
Primary promotion time:  2026-03-12 08:10:09 +0000 UTC (8m7s)
Status:                  Cluster in healthy state
Instances:               3
Ready instances:         3
Size:                    264M

Demotion token
Token                               eyJsYXRlc3RDaGVja3BvaW50VGltZWxpbmVJRCI6IjIiLCJyZWRvV2FsRmlsZSI6IjAwMDAwMDAyMDAwMDAwMDAwMDAwMDAxMCIsImRhdGFiYXNlU3lzdGVtSWRlbnRpZmllciI6Ijc2MTYyNzEyOTAzMDUyODIwNzMiLCJsYXRlc3RDaGVja3BvaW50UkVET0xvY2F0aW9uIjoiMC8xMDAwMDAyOCIsInRpbWVPZkxhdGVzdENoZWNrcG9pbnQiOiJUaHUgTWFyIDEyIDA4OjEwOjA4IDIwMjYiLCJvcGVyYXRvclZlcnNpb24iOiIxLjI3LjAifQ==
Validity                            valid
Latest checkpoint's TimeLineID     2
Latest checkpoint's REDO WAL file  000000020000000000000010
Latest checkpoint's REDO location  0/10000028
Database system identifier          7616271290305282073 (ok)
Time of latest checkpoint          Thu Mar 12 08:10:08 2026
Version of the operator            1.27.0

Continuous Backup status (Barman Cloud Plugin)
ObjectStore / Server name:      s3-store/cluster-replica
First Point of Recoverability:  2026-03-12 13:33:10 IST
Last Successful Backup:          2026-03-12 13:33:10 IST
Last Failed Backup:              -
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              000000030000000000000010   @   2026-03-12T08:16:08.35668Z
Last Failed WAL:                -

Instances status
Name                Current LSN  Replication role              Status  QoS          Manager Version  Node
----                -----------  ----------------              ------  ---          ---------------  ----
cluster-replica-1  0/11000000    Designated primary            OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-2  0/11000000    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane
cluster-replica-3  0/11000000    Standby (in Replica Cluster)  OK      BestEffort  1.27.0           cnp-1.27.0-control-plane

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.11.0   N/A     Reconciler Hooks, Lifecycle Service

Step 11: 최종 복제 상태 검증

데이터 일관성을 확인하고, 원래 프라이머리에서 쓰기/아카이브 기능을 테스트합니다.

프라이머리 (cluster-primary):

user% kubectl cnp psql cluster-primary -n primary
psql (18.1.0)
Type "help" for help.

postgres=# \dt
          List of tables
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | test | table | postgres
(1 row)

postgres=# select * from test;
 id
----
  1
  2
(2 rows)

postgres=# insert into test values (3);
INSERT 0 1

postgres=# checkpoint;
CHECKPOINT

postgres=# select * from pg_switch_wal();
 pg_switch_wal
---------------
 0/11002180
(1 row)

레플리카 (cluster-replica):

user% kubectl cnp psql cluster-replica -n replica
psql (18.1.0)
Type "help" for help.

postgres=# \dt
          List of tables
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | test | table | postgres
(1 row)

postgres=# select * from test;
 id
----
  1
  2
  3
(3 rows)

결론

이 런북은 CNPG 네이티브 강등 토큰(demotion token)·승격 토큰(promotion token) 워크플로를 사용해, 두 CloudNativePG 클러스터 사이에서 완전한 무손실(zero-data-loss) 스위치오버·스위치백 사이클을 시연했습니다.

구조화된 2단계 접근법, 즉 프라이머리 역할을 먼저 cluster-replica로 교대했다가 다시 cluster-primary로 되돌리는 과정을 통해 다음을 확인했습니다.

  • 강등 토큰은 승격이 시작되기 전에 기존 프라이머리를 안전하게 펜싱(fencing)합니다.
  • 승격 토큰은 새 프라이머리가 이전 프라이머리가 멈춘 지점을 정확히 이어받도록 보장하여, 타임라인 전환 전반에 걸쳐 WAL 연속성을 유지합니다.
  • 각 페이즈 동안 기록된 모든 데이터는 스탠바이 측에 정확히 복제되어 검증되었습니다.

이 패턴은 WAL 아카이빙과 백업을 위한 Barman Cloud Plugin과 완전히 호환되므로, 쿠버네티스 상의 프로덕션 분산 토폴로지에 적합합니다.


EDB Postgres AI와 CloudNativePG, 더 알아보기

쿠버네티스 기반 PostgreSQL 운영(HA·DR·마이그레이션)이나 PoC가 궁금하시다면 EDB Korea로 문의해 주세요.

메일: salesinquiry@enterprisedb.com


원문: Switchover and Switchback of CloudNativePG Replica Clusters in a Distributed Topology (K8s) – Part 2 (EDB Blog)

Visited 1 times, 1 visit(s) today