Speeding Up Percona XtraDB Cluster State Transfers with Kubernetes Volume Snapshots

When using the Percona Operator for MySQL based on Percona XtraDB Cluster (PXC), it’s common to encounter scenarios where cluster nodes request a full State Snapshot Transfer (SST) when rejoining the cluster.

One typical scenario where a State Snapshot Transfer (SST) is required is when a node has been offline long enough that the GCache no longer contains the necessary write sets for an Incremental State Transfer (IST). Unlike SST, which involves a full data copy from another node, IST is a much lighter process that replays the missing write sets from the donor’s GCache, avoiding the need for a complete data transfer.

Another situation that triggers SST is scaling up the cluster by adding new nodes. Each joiner node will require a full SST to synchronize with the cluster.

Additionally, when adding multiple nodes at once, the cluster must perform a separate backup for each joiner. This results in repeated reads from the donor and multiple data transfers over the network, which can quickly become a bottleneck.

In PXC, SST is performed by default using Percona XtraBackup, a physical backup tool. The process involves reading the donor’s data files and streaming them to the joiner node. While the backup operation can be optimized by increasing parallelism and enabling compression, the data must still be read and transferred over the network.

This process can be time-consuming in environments with large database sizes, as it involves transferring a full backup from an existing node to a new one.

SST based on K8s Volume Snapshots:

These scenarios are ideal for K8s volume snapshots, which operate at the storage layer via the Container Storage Interface (CSI). This almost immediate process doesn’t involve compressing, sending data over the network, or even reading the full dataset.

The PXC Operator supports creating a new cluster from a volume snapshot, a useful feature for cloning or disaster recovery scenarios. In this blog post, however, we’ll explore how volume snapshots can also be used to add new nodes to an existing cluster, significantly reducing the time and resource cost, especially when dealing with large datasets.

Disclaimer:

The procedure described in this post involves directly manipulating PersistentVolumeClaims (PVCs), including deletion and restoration operations. These actions can lead to data loss or cluster instability if not performed carefully.

Ensure you have proper backups and fully understand the implications before proceeding in a production environment. Always test in a staging setup first.

For this test, I used Google Kubernetes Engine (GKE) with the Percona XtraDB Cluster Operator v1.16.1, running Percona Cluster 8.0.39 images. The PersistentVolumeClaims (PVCs) were 1 TiB in size, hosting a database dataset of approximately 500 GiB.

Prerequisites:

K8s relies on the CSI (Container Storage Interface) to manage volume operations, including snapshots. To use snapshots, your StorageClass must be associated with a CSI driver that supports the VolumeSnapshot feature.

The Volume Snapshot Class should be created first, as below:

$ cat snapshot-class.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: snapshot-class
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

$ kubectl apply -f snapshot-class.yaml
volumesnapshotclass.snapshot.storage.k8s.io/snapshot-class created

$ cat snapshot-class.yaml

apiVersion: snapshot.storage.k8s.io/v1

kind: VolumeSnapshotClass

metadata:

driver: pd.csi.storage.gke.io

deletionPolicy: Delete

$ kubectl apply -f snapshot-class.yaml

volumesnapshotclass.snapshot.storage.k8s.io/snapshot-class created

There are two approaches to perform the PVC restore procedure: online, it allows to add nodes while the cluster is still running, and offline, which involves scaling down the cluster and performing the restore while all pods are stopped.

Online:

In this example, we’re re-joining existing nodes that would request SST using volume snapshots. The process for joining cluster members without downtime involves the following:

Taking a snapshot of the volume from a healthy, running node.
Scaling down the cluster temporarily to prepare for volume restoration.
Deleting the PVCs associated with Joiner nodes.
Restoring each Joiner PVC from the snapshot, so the nodes start with fully populated data volumes.

One important caveat is that the healthy pod used for the snapshot must be the number zero PXC member, pxc-0. This ensures that when the cluster is scaled down, the joiner nodes (e.g., pxc-1, pxc-2, etc.) are terminated, allowing their PVCs to be safely deleted and recreated from the snapshot. This is typically the most common scenario, as the pxc-0 pod is often the most up-to-date node in the cluster since, by default, the HAProxy service routes traffic to this member when it’s available, making it a reliable source for snapshotting.

Snapshots are crash-consistent by nature, they capture the state of the filesystem at a specific point in time without coordinating with the database to flush in-memory data to disk. When restored, InnoDB performs crash recovery to bring the database to a consistent state. To minimize the risk of data corruption or recovery failure, it’s critical to ensure that the instance used for the snapshot is fully ACID-compliant at the moment of capture.

This behavior is controlled by the innodb_flush_log_at_trx_commit variable. When set to 1, InnoDB writes and flushes the redo log to disk at every transaction commit, ensuring durability and reducing the chance of data loss during recovery.

By default, the PXC Operator sets innodb_flush_log_at_trx_commit to 2 to optimize performance. In terms of durability, a transaction on a PXC node is only considered committed after it has been replicated and certified by the cluster. While it may not yet be applied on the remote nodes, it has already been safely propagated, ensuring consistency across the cluster. This makes it generally safe to use the value 2, as you would need to lose all nodes simultaneously to lose up to one second of transactions.

We confirm the innodb_flush_log_at_trx_commit variable value by running the following command:

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "select @@innodb_flush_log_at_trx_commit;"'
+----------------------------------+
| @@innodb_flush_log_at_trx_commit |
+----------------------------------+
|                            	2  |
+----------------------------------+

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "select @@innodb_flush_log_at_trx_commit;"'

+----------------------------------+

| @@innodb_flush_log_at_trx_commit |

+----------------------------------+

| 2 |

+----------------------------------+

We’ll need to enforce stricter ACID compliance to take the snapshot, which may impact database performance due to an increased number of fsync operations:

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global innodb_flush_log_at_trx_commit=1; select @@innodb_flush_log_at_trx_commit;"'
+----------------------------------+
| @@innodb_flush_log_at_trx_commit |
+----------------------------------+
|                            	1  |
+----------------------------------+

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global innodb_flush_log_at_trx_commit=1; select @@innodb_flush_log_at_trx_commit;"'

+----------------------------------+

| @@innodb_flush_log_at_trx_commit |

+----------------------------------+

| 1 |

+----------------------------------+

Next, ensure the Donor instance retains the required write-sets to serve an Incremental State Transfer (IST) to the Joiners after the snapshot is restored. This is done by freezing the Gcache with the following command:

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global wsrep_provider_options="gcache.freeze_purge_at_seqno = now""'

1	$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global wsrep_provider_options="gcache.freeze_purge_at_seqno = now""'

In write-intensive workloads, you may want to increase the pxc.livenessProbes.initialDelaySeconds from its default value of 300 seconds. This allows the instance more time to apply IST write sets before the liveness probe checks kick in, reducing the risk of premature pod restarts during recovery. Please note that this change will trigger a restart of the PXC pods, which is not the intended outcome of this procedure. This applies to both snapshot and regular XtraBackup SST. So, if you’ve previously handled SST under a heavy workload on this cluster, the pxc.livenessProbes.initialDelaySeconds setting should already be adjusted accordingly.

The next step is to create the sleep-forever file inside the pxc-0 data directory. This ensures the file is included in the snapshot and will be present on the Joiner nodes after restore. This will prevent the MySQL process from automatically running, giving the chance to adjust the node before it joins the cluster.

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'touch /var/lib/mysql/sleep-forever; sync;'

1	$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'touch /var/lib/mysql/sleep-forever; sync;'

The following step is taking the snapshot from the pxc-0 pod PVC:

$ cat snapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: pxc-pvc-snapshot
spec:
  volumeSnapshotClassName: snapshot-class
  source:
	persistentVolumeClaimName: datadir-cluster1-pxc-0

$ kubectl apply -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/pxc-pvc-snapshot created

$ cat snapshot.yaml

apiVersion: snapshot.storage.k8s.io/v1

kind: VolumeSnapshot

metadata:

spec:

volumeSnapshotClassName: snapshot-class

source:

persistentVolumeClaimName: datadir-cluster1-pxc-0

$ kubectl apply -f snapshot.yaml

volumesnapshot.snapshot.storage.k8s.io/pxc-pvc-snapshot created

You can check the snapshot’s status. Once READYTOUSE changes to true, it will be ready.

$ kubectl get VolumeSnapshot -w
NAME           	READYTOUSE   SOURCEPVC            	SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS	SNAPSHOTCONTENT                                	CREATIONTIME   AGE
pxc-pvc-snapshot   false    	datadir-cluster1-pxc-0                       	1Ti       	snapshot-class   snapcontent-ded52414-2974-445f-b8a9-69610ee30da9   15s        	16s
pxc-pvc-snapshot   true     	datadir-cluster1-pxc-0                       	1Ti       	snapshot-class   snapcontent-ded52414-2974-445f-b8a9-69610ee30da9   29s        	30s

$ kubectl get VolumeSnapshot -w

NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE

pxc-pvc-snapshot false datadir-cluster1-pxc-0 1Ti snapshot-class snapcontent-ded52414-2974-445f-b8a9-69610ee30da9 15s 16s

pxc-pvc-snapshot true datadir-cluster1-pxc-0 1Ti snapshot-class snapcontent-ded52414-2974-445f-b8a9-69610ee30da9 29s 30s

Then, we must scale down the cluster to restore the snapshot. We will first need to set the spec.unsafeFlags.pxcSize to true to allow the cluster to scale down.

$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": true}}}'
perconaxtradbcluster.pxc.percona.com/cluster1 patched

1 2	$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": true}}}' perconaxtradbcluster.pxc.percona.com/cluster1 patched

Once done, we can set only one replica for the PXC cluster:

$ kubectl scale --replicas=1 pxc/cluster1
perconaxtradbcluster.pxc.percona.com/cluster1 scaled

1 2	$ kubectl scale --replicas=1 pxc/cluster1 perconaxtradbcluster.pxc.percona.com/cluster1 scaled

We’ll see that only the pxc-0 pod is running:

$ kubectl get pods
NAME                                           	READY   STATUS	RESTARTS    	AGE
cluster1-haproxy-0                             	2/2 	Running   6 (3h26m ago)   3h44m
cluster1-haproxy-1                             	2/2 	Running   0           	3h23m
cluster1-haproxy-2                             	2/2 	Running   0           	3h22m
cluster1-pxc-0                                 	3/3 	Running   1 (3h24m ago)   3h44m

$ kubectl get pods

NAME READY STATUS RESTARTS AGE

cluster1-haproxy-0 2/2 Running 6 (3h26m ago) 3h44m

cluster1-haproxy-1 2/2 Running 0 3h23m

cluster1-haproxy-2 2/2 Running 0 3h22m

cluster1-pxc-0 3/3 Running 1 (3h24m ago) 3h44m

Now we can delete the cluster1-pxc-1 PVC:

$ kubectl delete pvc datadir-cluster1-pxc-1
persistentvolumeclaim "datadir-cluster1-pxc-1" deleted

1 2	$ kubectl delete pvc datadir-cluster1-pxc-1 persistentvolumeclaim "datadir-cluster1-pxc-1" deleted

We can also restore the snapshot to a new PVC with the same name. The target PVC size should be at least the same as the original:

$ cat restore.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: datadir-cluster1-pxc-1
spec:
  accessModes:
	- ReadWriteOnce
  resources:
	requests:
  	storage: 1024Gi
  dataSource:
	name: pxc-pvc-snapshot
	kind: VolumeSnapshot
	apiGroup: snapshot.storage.k8s.io

$ kubectl apply -f restore.yaml
persistentvolumeclaim/datadir-cluster1-pxc-1 created

$ cat restore.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

spec:

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 1024Gi

dataSource:

kind: VolumeSnapshot

apiGroup: snapshot.storage.k8s.io

$ kubectl apply -f restore.yaml

persistentvolumeclaim/datadir-cluster1-pxc-1 created

In this case, the restored snapshot PVC shows as Pending since the storage class volumeBindingMode is WaitForFirstConsumer. This means that it will wait until the pod starts to bind the PVC:

$ kubectl get pvc
NAME                 	STATUS	VOLUME                                 	CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
datadir-cluster1-pxc-0   Bound 	pvc-1885ffe5-d99e-40c9-a05e-3c2f1443d504   1Ti    	RWO        	standard-rwo   <unset>             	9d
datadir-cluster1-pxc-1   Pending                                                                    	standard-rwo   <unset>             	14s
datadir-cluster1-pxc-2   Bound 	pvc-24d3aa1a-53fa-42b1-b075-dfc9af358979   1Ti    	RWO        	standard-rwo   <unset>             	3h49m

$ kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

datadir-cluster1-pxc-0 Bound pvc-1885ffe5-d99e-40c9-a05e-3c2f1443d504 1Ti RWO standard-rwo <unset> 9d

datadir-cluster1-pxc-1 Pending standard-rwo <unset> 14s

datadir-cluster1-pxc-2 Bound pvc-24d3aa1a-53fa-42b1-b075-dfc9af358979 1Ti RWO standard-rwo <unset> 3h49m

Now we can scale up the cluster to start the pod pxc-1:

$ kubectl scale --replicas=2 pxc/cluster1
perconaxtradbcluster.pxc.percona.com/cluster1 scaled

1 2	$ kubectl scale --replicas=2 pxc/cluster1 perconaxtradbcluster.pxc.percona.com/cluster1 scaled

We see the datadir-cluster1-pxc-1 is bound:

$ kubectl get pvc
NAME                 	STATUS   VOLUME                                 	CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
datadir-cluster1-pxc-0   Bound	pvc-1885ffe5-d99e-40c9-a05e-3c2f1443d504   1Ti    	RWO        	standard-rwo   <unset>             	9d
datadir-cluster1-pxc-1   Bound	pvc-d1cf2ed5-6491-4c6e-a1a2-92d41d899808   1Ti    	RWO        	standard-rwo   <unset>             	5m27s
datadir-cluster1-pxc-2   Bound	pvc-24d3aa1a-53fa-42b1-b075-dfc9af358979   1Ti    	RWO        	standard-rwo   <unset>             	3h54m

$ kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

datadir-cluster1-pxc-0 Bound pvc-1885ffe5-d99e-40c9-a05e-3c2f1443d504 1Ti RWO standard-rwo <unset> 9d

datadir-cluster1-pxc-1 Bound pvc-d1cf2ed5-6491-4c6e-a1a2-92d41d899808 1Ti RWO standard-rwo <unset> 5m27s

datadir-cluster1-pxc-2 Bound pvc-24d3aa1a-53fa-42b1-b075-dfc9af358979 1Ti RWO standard-rwo <unset> 3h54m

And the pod state shows as running:

$ kubectl get pods cluster1-pxc-1
NAME         	READY   STATUS	RESTARTS   AGE
cluster1-pxc-1   3/3 	Running   0      	2m2s

$ kubectl get pods cluster1-pxc-1

NAME READY STATUS RESTARTS AGE

cluster1-pxc-1 3/3 Running 0 2m2s

Note that since we added the sleep-forever file, the MySQL process is not running.

We’ll need to delete the auto.cnf file, as it contains the pxc-0 MySQL server_uuid. Additionally, we must remove the gvwstate.dat file, which stores the Galera Primary Component information and the Galera node’s UUID, also inherited from pxc-0. Finally, we delete the sleep-forever file to allow the container to start the MySQL process:

$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf; rm /var/lib/mysql/gvwstate.dat; rm /var/lib/mysql/sleep-forever;'

1	$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf; rm /var/lib/mysql/gvwstate.dat; rm /var/lib/mysql/sleep-forever;'

We check the pods state:

$ kubectl get pods cluster1-pxc-1
NAME         	READY   STATUS	RESTARTS  	AGE
cluster1-pxc-1   3/3 	Running   1 (60s ago)   9m25s

$ kubectl get pods cluster1-pxc-1

NAME READY STATUS RESTARTS AGE

cluster1-pxc-1 3/3 Running 1 (60s ago) 9m25s

You can check the wsrep_cluster_status status variable to confirm the node is now part of the Primary Component:

$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'
+----------------------+---------+
| Variable_name    	| Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+

$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'

+----------------------+---------+

| Variable_name | Value |

+----------------------+---------+

| wsrep_cluster_status | Primary |

+----------------------+---------+

When using XtraBackup for SST, the process took approximately 75 minutes per instance to complete. In contrast, using volume snapshots, a node with a 500 GiB database was fully synced to the cluster in just 10 minutes.

You can reuse the same snapshot to repeat the process and add more Joiner nodes if necessary. This allows for efficient scaling without the overhead of creating new backups for each node.

Once the procedure is complete, we need to revert all the changes made to the cluster and to the pod pxc-0:

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global innodb_flush_log_at_trx_commit=2; select @@innodb_flush_log_at_trx_commit;"'
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global wsrep_provider_options="gcache.freeze_purge_at_seqno = -1""'
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'
$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": false}}}'
perconaxtradbcluster.pxc.percona.com/cluster1 patched

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global innodb_flush_log_at_trx_commit=2; select @@innodb_flush_log_at_trx_commit;"'

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global wsrep_provider_options="gcache.freeze_purge_at_seqno = -1""'

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'

$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": false}}}'

perconaxtradbcluster.pxc.percona.com/cluster1 patched

Finally, we delete the volume snapshot:

$ kubectl delete VolumeSnapshot pxc-pvc-snapshot
volumesnapshot.snapshot.storage.k8s.io "pxc-pvc-snapshot" deleted

1 2	$ kubectl delete VolumeSnapshot pxc-pvc-snapshot volumesnapshot.snapshot.storage.k8s.io "pxc-pvc-snapshot" deleted

Offline:

In case there is a remaining node in the cluster other than pxc-0 or in case we want to perform a safer procedure without taking into account durability or IST, we can perform the offline method, which has the following steps:

Scale down the cluster so that no PXC pods are running (0 replicas).
Take a snapshot of the healthy pod’s PVC.
Delete the PVCs associated with the Joiner nodes you want to recreate.
Restore each Joiner PVC from the snapshot, ensuring the new volumes are fully populated before scaling the cluster back up.

In this scenario, let’s assume that pxc-2 is the only node currently part of the Primary Component, while pxc-0 and pxc-1 require SST to rejoin the cluster.

Similar to the online method, we’ll need to create the sleep-forever file inside the healthy node’s data directory. This file will be present in the restored PVCs, allowing us to pause the Joiner nodes on startup and perform any necessary adjustments before they attempt to join the cluster.

$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'touch /var/lib/mysql/sleep-forever; sync;'

1	$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'touch /var/lib/mysql/sleep-forever; sync;'

We will need to set the spec.unsafeFlags.pxcSize to true to allow the cluster to scale down.

$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": true}}}'
perconaxtradbcluster.pxc.percona.com/cluster1 patched

1 2	$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": true}}}' perconaxtradbcluster.pxc.percona.com/cluster1 patched

Once done, we scaled down the replicas to 0 for the PXC cluster:

$ kubectl scale --replicas=0 pxc/cluster1
perconaxtradbcluster.pxc.percona.com/cluster1 scaled

1 2	$ kubectl scale --replicas=0 pxc/cluster1 perconaxtradbcluster.pxc.percona.com/cluster1 scaled

We check that all PXC pods are stopped:

$ kubectl get pods
NAME                                           	READY   STATUS	RESTARTS   	AGE
cluster1-haproxy-0                             	1/2 	Running   22 (16m ago)   85m
cluster1-haproxy-1                             	1/2 	Running   16 (16m ago)   70m
cluster1-haproxy-2                             	1/2 	Running   12 (16m ago)   70m
percona-xtradb-cluster-operator-549d44ddf7-dqzd5   1/1 	Running   0          	3d2h

$ kubectl get pods

NAME READY STATUS RESTARTS AGE

cluster1-haproxy-0 1/2 Running 22 (16m ago) 85m

cluster1-haproxy-1 1/2 Running 16 (16m ago) 70m

cluster1-haproxy-2 1/2 Running 12 (16m ago) 70m

percona-xtradb-cluster-operator-549d44ddf7-dqzd5 1/1 Running 0 3d2h

Then, we take a snapshot from the healthy pod PVC. Since the database is stopped, this will be a database-consistent snapshot:

$ cat snapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: pxc-pvc-snapshot
spec:
  volumeSnapshotClassName: snapshot-class
  source:
	persistentVolumeClaimName: datadir-cluster1-pxc-2

$ kubectl apply -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/pxc-pvc-snapshot created

$ cat snapshot.yaml

apiVersion: snapshot.storage.k8s.io/v1

kind: VolumeSnapshot

metadata:

spec:

volumeSnapshotClassName: snapshot-class

source:

persistentVolumeClaimName: datadir-cluster1-pxc-2

$ kubectl apply -f snapshot.yaml

volumesnapshot.snapshot.storage.k8s.io/pxc-pvc-snapshot created

We’ll wait until the snapshot is ready to restore:

$ kubectl get VolumeSnapshot -w
NAME           	READYTOUSE   SOURCEPVC            	SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS	SNAPSHOTCONTENT                                	CREATIONTIME   AGE
pxc-pvc-snapshot   false    	datadir-cluster1-pxc-2                       	1Ti       	snapshot-class   snapcontent-76c31d76-6845-4238-ae33-a86f3fb0a61b   7s         	8s
pxc-pvc-snapshot   true     	datadir-cluster1-pxc-2                       	1Ti       	snapshot-class   snapcontent-76c31d76-6845-4238-ae33-a86f3fb0a61b   3m34s      	3m35s

$ kubectl get VolumeSnapshot -w

NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE

pxc-pvc-snapshot false datadir-cluster1-pxc-2 1Ti snapshot-class snapcontent-76c31d76-6845-4238-ae33-a86f3fb0a61b 7s 8s

pxc-pvc-snapshot true datadir-cluster1-pxc-2 1Ti snapshot-class snapcontent-76c31d76-6845-4238-ae33-a86f3fb0a61b 3m34s 3m35s

We check the PVC status:

$ kubectl get pvc
NAME                 	STATUS   VOLUME                                 	CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
datadir-cluster1-pxc-0   Bound	pvc-2ef7b661-0f19-47cf-ad32-23d2faa19da1   1Ti    	RWO        	standard-rwo   <unset>             	32m
datadir-cluster1-pxc-1   Bound	pvc-002bc256-4093-4888-aabb-93d31e9e3d52   1Ti    	RWO        	standard-rwo   <unset>             	32m
datadir-cluster1-pxc-2   Bound	pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df   1Ti    	RWO        	standard-rwo   <unset>             	32m

$ kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

datadir-cluster1-pxc-0 Bound pvc-2ef7b661-0f19-47cf-ad32-23d2faa19da1 1Ti RWO standard-rwo <unset> 32m

datadir-cluster1-pxc-1 Bound pvc-002bc256-4093-4888-aabb-93d31e9e3d52 1Ti RWO standard-rwo <unset> 32m

datadir-cluster1-pxc-2 Bound pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df 1Ti RWO standard-rwo <unset> 32m

We delete the PVCs from the Joiner pods, in this case, pxc-0 and pxc-1:

$ kubectl delete pvc datadir-cluster1-pxc-0 datadir-cluster1-pxc-1
persistentvolumeclaim "datadir-cluster1-pxc-0" deleted
persistentvolumeclaim "datadir-cluster1-pxc-1" deleted

$ kubectl delete pvc datadir-cluster1-pxc-0 datadir-cluster1-pxc-1

persistentvolumeclaim "datadir-cluster1-pxc-0" deleted

persistentvolumeclaim "datadir-cluster1-pxc-1" deleted

We restore the snapshot into pxc-0 and pxc-1 PVCs:

$ cat restore0.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: datadir-cluster1-pxc-0
spec:
  accessModes:
	- ReadWriteOnce
  resources:
	requests:
  	storage: 1024Gi
  dataSource:
	name: pxc-pvc-snapshot
	kind: VolumeSnapshot
	apiGroup: snapshot.storage.k8s.io

$ kubectl apply -f restore0.yaml
persistentvolumeclaim/datadir-cluster1-pxc-0 created

$ cat restore1.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: datadir-cluster1-pxc-1
spec:
  accessModes:
	- ReadWriteOnce
  resources:
	requests:
  	storage: 1024Gi
  dataSource:
	name: pxc-pvc-snapshot
	kind: VolumeSnapshot
	apiGroup: snapshot.storage.k8s.io

$ kubectl apply -f restore1.yaml
persistentvolumeclaim/datadir-cluster1-pxc-1 created

$ cat restore0.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

spec:

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 1024Gi

dataSource:

kind: VolumeSnapshot

apiGroup: snapshot.storage.k8s.io

$ kubectl apply -f restore0.yaml

persistentvolumeclaim/datadir-cluster1-pxc-0 created

$ cat restore1.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

spec:

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 1024Gi

dataSource:

kind: VolumeSnapshot

apiGroup: snapshot.storage.k8s.io

$ kubectl apply -f restore1.yaml

persistentvolumeclaim/datadir-cluster1-pxc-1 created

We check that the PVCs are created but pending binding:

$ kubectl get pvc
NAME                 	STATUS	VOLUME                                 	CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
datadir-cluster1-pxc-0   Pending                                                                    	standard-rwo   <unset>             	11s
datadir-cluster1-pxc-1   Pending                                                                    	standard-rwo   <unset>             	8s
datadir-cluster1-pxc-2   Bound 	pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df   1Ti    	RWO        	standard-rwo   <unset>             	32m

$ kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

datadir-cluster1-pxc-0 Pending standard-rwo <unset> 11s

datadir-cluster1-pxc-1 Pending standard-rwo <unset> 8s

datadir-cluster1-pxc-2 Bound pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df 1Ti RWO standard-rwo <unset> 32m

We scale up the cluster to start all PXC pods:

$ kubectl scale --replicas=3 pxc/cluster1
perconaxtradbcluster.pxc.percona.com/cluster1 scaled

1 2	$ kubectl scale --replicas=3 pxc/cluster1 perconaxtradbcluster.pxc.percona.com/cluster1 scaled

We wait until all PVCs are bound:

$ kubectl get pvc
NAME                 	STATUS   VOLUME                                 	CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
datadir-cluster1-pxc-0   Bound	pvc-63b25d46-983f-49b9-bffa-d10dff3ed861   1Ti    	RWO        	standard-rwo   <unset>             	5m50s
datadir-cluster1-pxc-1   Bound	pvc-5aeea987-3a25-44d8-8776-723414dc1ee3   1Ti    	RWO        	standard-rwo   <unset>             	5m47s
datadir-cluster1-pxc-2   Bound	pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df   1Ti    	RWO        	standard-rwo   <unset>             	38m

$ kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

datadir-cluster1-pxc-0 Bound pvc-63b25d46-983f-49b9-bffa-d10dff3ed861 1Ti RWO standard-rwo <unset> 5m50s

datadir-cluster1-pxc-1 Bound pvc-5aeea987-3a25-44d8-8776-723414dc1ee3 1Ti RWO standard-rwo <unset> 5m47s

datadir-cluster1-pxc-2 Bound pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df 1Ti RWO standard-rwo <unset> 38m

And wait until all PXC pods state is Running:

$ kubectl get pods
NAME                                           	READY   STATUS	RESTARTS     	AGE
cluster1-haproxy-0                             	1/2 	Running   26 (2m55s ago)   98m
cluster1-haproxy-1                             	1/2 	Running   21 (109s ago)	83m
cluster1-haproxy-2                             	1/2 	Running   16 (2m53s ago)   83m
cluster1-pxc-0                                 	3/3 	Running   0            	7m5s
cluster1-pxc-1                                 	3/3 	Running   0            	3m43s
cluster1-pxc-2                                 	3/3 	Running   0            	60s
percona-xtradb-cluster-operator-549d44ddf7-dqzd5   1/1 	Running   0            	3d3h

$ kubectl get pods

NAME READY STATUS RESTARTS AGE

cluster1-haproxy-0 1/2 Running 26 (2m55s ago) 98m

cluster1-haproxy-1 1/2 Running 21 (109s ago) 83m

cluster1-haproxy-2 1/2 Running 16 (2m53s ago) 83m

cluster1-pxc-0 3/3 Running 0 7m5s

cluster1-pxc-1 3/3 Running 0 3m43s

cluster1-pxc-2 3/3 Running 0 60s

percona-xtradb-cluster-operator-549d44ddf7-dqzd5 1/1 Running 0 3d3h

Since we added the sleep-forever file, the pods did not start the MySQL process.

Next, we need to check the grastate.dat to know if it flags it as safe to bootstrap.

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'cat /var/lib/mysql/grastate.dat'
# GALERA saved state
version: 2.1
uuid:	b28cd6f8-0f65-11f0-a4bc-aaaa4d86036e
seqno:   2422856
safe_to_bootstrap: 0

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'cat /var/lib/mysql/grastate.dat'

# GALERA saved state

version: 2.1

uuid: b28cd6f8-0f65-11f0-a4bc-aaaa4d86036e

seqno: 2422856

safe_to_bootstrap: 0

In this case, the safe_to_bootstrap value is set to 0, because when scaling down, the pxc-2 pod, the last active member of the Primary Component, was the first to be stopped. Meanwhile, the other pods (pxc-0 and pxc-1) were still connected and requesting SST, which prevents the cluster from marking any node as safe to bootstrap.

We’ll set safe_to_bootstrap to 1 on the pxc-0 node. This ensures that when the pod starts, it will bootstrap the cluster and become the primary member, allowing the cluster to initiate faster without waiting for other nodes. We’re also removing the auto.cnf file, since it contains the same server_uuid as the original pxc-2 node, from which the snapshot was taken.

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf; sed -i "s/safe_to_bootstrap: 0/safe_to_bootstrap: 1/g" /var/lib/mysql/grastate.dat;'

1	$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf; sed -i "s/safe_to_bootstrap: 0/safe_to_bootstrap: 1/g" /var/lib/mysql/grastate.dat;'

On pxc-1, we only remove the auto.cnf file:

$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf;'

1	$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf;'

As for pxc-2, we don’t need to modify anything.

We remove the sleep-forever file in all pods to restart them:

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'
$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'
$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'

$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'

$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'

We check the pods until all are in Running state:

$ kubectl get pods
NAME                                           	READY   STATUS	RESTARTS     	AGE
cluster1-haproxy-0                             	2/2 	Running   28 (4m9s ago)	105m
cluster1-haproxy-1                             	2/2 	Running   22 (6m33s ago)   91m
cluster1-haproxy-2                             	2/2 	Running   18 (4m7s ago)	90m
cluster1-pxc-0                                 	3/3 	Running   1 (2m42s ago)	14m
cluster1-pxc-1                                 	3/3 	Running   2 (65s ago)  	10m
cluster1-pxc-2                                 	3/3 	Running   2 (84s ago)  	8m14s
percona-xtradb-cluster-operator-549d44ddf7-dqzd5   1/1 	Running   0            	3d3h

$ kubectl get pods

NAME READY STATUS RESTARTS AGE

cluster1-haproxy-0 2/2 Running 28 (4m9s ago) 105m

cluster1-haproxy-1 2/2 Running 22 (6m33s ago) 91m

cluster1-haproxy-2 2/2 Running 18 (4m7s ago) 90m

cluster1-pxc-0 3/3 Running 1 (2m42s ago) 14m

cluster1-pxc-1 3/3 Running 2 (65s ago) 10m

cluster1-pxc-2 3/3 Running 2 (84s ago) 8m14s

percona-xtradb-cluster-operator-549d44ddf7-dqzd5 1/1 Running 0 3d3h

We can also verify that each node has successfully joined the cluster by checking the wsrep_cluster_status status variable. A value of “Primary” confirms that the node is part of the Primary Component.

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'
+----------------------+---------+
| Variable_name    	| Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+

$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'
+----------------------+---------+
| Variable_name    	| Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+

$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'
+----------------------+---------+
| Variable_name    	| Value   |
+----------------------+---------+
| wsrep_cluster_status | Primary |
+----------------------+---------+

$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'

+----------------------+---------+

| Variable_name | Value |

+----------------------+---------+

| wsrep_cluster_status | Primary |

+----------------------+---------+

$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'

+----------------------+---------+

| Variable_name | Value |

+----------------------+---------+

| wsrep_cluster_status | Primary |

+----------------------+---------+

$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'

+----------------------+---------+

| Variable_name | Value |

+----------------------+---------+

| wsrep_cluster_status | Primary |

+----------------------+---------+

Finally, once all Joiner nodes are up and part of the Primary Component, we can safely delete the VolumeSnapshot to free up resources and avoid unnecessary storage costs.

$ kubectl delete VolumeSnapshot pxc-pvc-snapshot
volumesnapshot.snapshot.storage.k8s.io "pxc-pvc-snapshot" deleted

1 2	$ kubectl delete VolumeSnapshot pxc-pvc-snapshot volumesnapshot.snapshot.storage.k8s.io "pxc-pvc-snapshot" deleted

In just 15 minutes, we successfully joined two nodes, each with a 500 GiB dataset, using the snapshot-based restore procedure. In contrast, performing the same operation with XtraBackup SST would take several hours, due to the time required to create, transfer, and apply a full physical backup.

Conclusion:

This approach significantly reduces the time and resources required to scale the Percona Operator for MySQL based on Percona XtraDB Cluster in Kubernetes environments. By leveraging VolumeSnapshots, we eliminate the overhead of full backup and restore cycles, reduce network traffic, and accelerate adding nodes to the cluster. It’s a powerful alternative to traditional SST, especially in cloud-native deployments where time, cost, and efficiency matter.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Why Percona for MongoDB?

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Speeding Up Percona XtraDB Cluster State Transfers with Kubernetes Volume Snapshots

SST based on K8s Volume Snapshots:

Disclaimer:

Prerequisites:

Online:

Offline:

Conclusion:

Related

Related Blog Articles

RECOMMENDED ARTICLES

pgvector: The Critical PostgreSQL Component for Your Enterprise AI Strategy