Update Nvidia GPU Clusters
Upgrading a node pool involves draining the existing nodes in the node pool and replacing them with new nodes. In order to ensure minimum downtime and maintain high availability of the critical application workloads during the upgrade process, we recommend deploying Pod Disruption Budget (Disruptions) for your critical applications.
The Pod Disruption Budget will prevent any impact on critical applications as a result of misconfiguration or failures during the upgrade process.
Deploy Pod Disruption Budget (PDB)
Konvoy Image Builder (KIB)
Deploy Pod Disruption Budget for your critical applications.
If your application can tolerate only one replica to be unavailable at a time, then you can set Pod disruption budget as shown in the following example. The example below is for NVIDIA GPU node pools, but the process is the same for all node pools.
Repeat this step for each additional node pool.
Create the file:
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: nvidia-critical-app spec: maxUnavailable: 1 selector: matchLabels: app: nvidia-critical-app
Apply the YAML file above using the following command:
kubectl create -f pod-disruption-budget-nvidia.yaml
2. Prepare OS image for your node pool using Konvoy Image Builder.