AWS Cluster Autoscaler

This page explains how to configure autoscaler for node pools. Cluster Autoscaler provides the ability to automatically scale-up or scale-down the number of worker nodes in a cluster, based on the number of pending pods to be scheduled. Running the Cluster Autoscaler is optional.

Unlike Horizontal-Pod Autoscaler, Cluster Autoscaler does not depend on any Metrics server and does not need Prometheus or any other metrics source.

The Cluster Autoscaler looks at the following annotations on a MachineDeployment to determine its scale-up and scale-down ranges:

CODE

cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size
cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size

The full list of command line arguments to the Cluster Autoscaler controller is on the Kubernetes public GitHub repository.

For more information about how Cluster Autoscaler works, see these documents:

Cluster Autoscaler Prerequisites

Before you begin, you must have:

Run Cluster Autoscaler on the Management Cluster

For DKP Enterprise Management Cluster and Essential Clusters, run the following steps to enable Cluster Autoscaler on a Self-managed cluster:

Ensure the Cluster Autoscaler controller is up and running (no restarts and no errors in the logs)

CODE

kubectl --kubeconfig=${CLUSTER_NAME}.conf logs deployments/cluster-autoscaler cluster-autoscaler -n kube-system -f

Enable Cluster Autoscaler by setting the min & max ranges

CODE

kubectl --kubeconfig=${CLUSTER_NAME}.conf annotate machinedeployment ${NODEPOOL_NAME} cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size=2
kubectl --kubeconfig=${CLUSTER_NAME}.conf annotate machinedeployment ${NODEPOOL_NAME} cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size=6

The Cluster Autoscaler logs will show that the worker nodes are associated with node-groups and that pending pods are being watched.

To demonstrate that it is working properly, create a large deployment which will trigger pending pods (For this example we used AWS m5.2xlarge worker nodes. If you have larger worker-nodes, you should scale up the number of replicas accordingly).

CODE

cat <<EOF | kubectl --kubeconfig=${CLUSTER_NAME}.conf apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-deployment
  labels:
    app: busybox
spec:
  replicas: 600
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox:latest
        command:
          - sleep
          - "3600"
        imagePullPolicy: IfNotPresent
      restartPolicy: Always
EOF

Cluster Autoscaler will scale up the number of Worker Nodes until there are no pending pods.

Scale down the number of replicas for busybox-deployment.

CODE

kubectl --kubeconfig ${CLUSTER_NAME}.conf scale --replicas=30 deployment/busybox-deployment

Cluster Autoscaler starts to scale down the number of Worker Nodes after the default timeout of 10 minutes.

Run Cluster Autoscaler on a Managed(Workload) Cluster

Unlike the Management(self-managed) cluster instructions above, to run autoscaler on a managed cluster an additional instance of autoscaler is required. This instance is run on the management cluster, but must be pointed at the managed cluster. The dkp create cluster command for building a managed cluster would have to be run against the Management cluster so that the clusterresourcset for that cluster’s autoscaler is modified to deploy the autoscaler on the management cluster itself. The flags for cluster-autoscaler are changed as well.

Create a secret with a kubeconfig file of the master cluster in the managed cluster with limited user permissions to only modify resources for the given cluster.
Mount the secret into the cluster-autoscaler deployment.
Add the following flag to the cluster-autoscaler command so that /mnt//masterconfig/value is the path where the master cluster’s kubeconfig is loaded via the secret created.

CODE

--cloud-config=/mnt//masterconfig/value

Next Topic

Manage AWS Node Pools