Skip to main content
Skip table of contents

AWS Air-gapped GPU: Create the Management Cluster

Use this procedure to create a self-managed air-gapped AWS GPU Management cluster with DKP. A self-managed cluster refers to one in which the CAPI resources and controllers that describe and manage it are running on the same cluster they are managing. To create a cluster in an AWS Air-gapped environment using GPUs, execute the following:

To increase Docker Hub's rate limit, use your Docker Hub credentials when creating the cluster by setting the following flag --registry-mirror-url=https://registry-1.docker.io --registry-mirror-username= --registry-mirror-password= on the dkp create cluster command.

  1. Give your cluster a unique name suitable for your environment.

    In AWS it is critical that the name is unique, as no two clusters in the same AWS account can have the same name.

  2. Set the environment variable to the name you assigned this cluster:

    CODE
    export CLUSTER_NAME=<aws-example>

    (info) NOTE: The cluster name may only contain the following characters: a-z, 0-9, ., and -. Cluster creation will fail if the name has capital letters. See Kubernetes for more naming information.

  3. Export variables for the existing infrastructure details:

    CODE
    export AWS_VPC_ID=<vpc-...>
    export AWS_SUBNET_IDS=<subnet-...,subnet-...,subnet-...>
    export AWS_ADDITIONAL_SECURITY_GROUPS=<sg-...>
    export AWS_AMI_ID=<ami-...>
    • AWS_VPC_ID: the VPC ID where the cluster will be created. The VPC requires the following AWS VPC Endpoints to be already present:

      • ec2 - com.amazonaws.{region}.ec2

      • elasticloadbalancing - com.amazonaws.{region}.elasticloadbalancing

      • secretsmanager - com.amazonaws.{region}.secretsmanager

      • autoscaling - com.amazonaws.{region}.autoscaling

      • ecr - com.amazonaws.{region}.ecr.api - (authentication)

      • ecr - com.amazonaws.{region}.ecr.dkr - (data trasfer)

      More details about AWS service using an interface VPC endpoint and AWS VPC endpoints list.

    • AWS_SUBNET_IDS: a comma-separated list of one or more private Subnet IDs with each one in a different Availability Zone. The cluster control-plane and worker nodes will automatically be spread across these Subnets.

    • AWS_ADDITIONAL_SECURITY_GROUPS: a comma-seperated list of one or more Security Groups IDs to use in addition to the ones automatically created by CAPA.

    • AWS_AMI_ID: the AMI ID to use for control-plane and worker nodes. The AMI must be created by the konvoy-image-builder.

In previous DKP releases, AMI images provided by the upstream CAPA project would be used if you did not specify an AMI. However, the upstream images are not recommended for production and may not always be available.   Therefore, DKP now requires you to specify an AMI when creating a cluster. To create an AMI, use Konvoy Image Builder.

There are two approaches to supplying the ID of your AMI while creating your cluster. Option One is to provide the ID of the AMI. Option Two is provide a way for DKP to discover the AMI using location, format and OS information using flags. Examples of these choices are shown in the dkp create cluster awscode snippets below.

⚠️ IMPORTANT: You must tag the subnets as described below to allow for Kubernetes to create External Load Balancers (ELBs) for services of type LoadBalancer in those subnets. If the subnets are not tagged, they will not receive an ELB and the following error displays: Error syncing load balancer, failed to ensure load balancer; could not find any suitable subnets for creating the ELB..

The tags should be set as follows, where <CLUSTER_NAME> corresponds to the name set in CLUSTER_NAME environment variable:

CODE
kubernetes.io/cluster = <CLUSTER_NAME>
kubernetes.io/cluster/<CLUSTER_NAME> = owned
kubernetes.io/role/internal-elb = 1
  1. (Optional) Configure your cluster to use an existing container registry as a mirror when attempting to pull images. The example below is for AWS ECR:
    ⚠️ If you do not already have a local registry set up, please refer to Local Registry Tools page for more information.

    ⚠️ IMPORTANT: The AMI must be created by the konvoy-image-builder project in order to use the registry mirror feature.

    CODE
    export REGISTRY_MIRROR_URL=<your_registry_url>
    • REGISTRY_MIRROR_URL: the address of an existing registry accessible in the VPC that the new cluster nodes will be configured to use a mirror registry when pulling images.

    • NOTE: Other local registries may use the options below:

      • JFrog - REGISTRY_CA: (optional) the path on the bastion machine to the registry CA. This value is only needed if the registry is using a self-signed certificate and the AMIs are not already configured to trust this CA.

      • REGISTRY_USERNAME: optional, set to a user that has pull access to this registry.

      • REGISTRY_PASSWORD: optional if username is not set.

  2. Create a Kubernetes cluster. The following example shows a common configuration. See dkp create cluster aws reference for the full list of cluster creation options:

DKP uses AWS CSI as the default storage provider. You can use a Kubernetes CSI compatible storage solution that is suitable for production. See the Kubernetes documentation called Changing the Default Storage Class for more information.

  1. Run the Option One command as explained above to create a cluster with a GPU AMI and --self-managed flag:

    CODE
    dkp create cluster aws 
    --cluster-name=${CLUSTER_NAME} \
    --additional-tags=owner=$(whoami) \
    --with-aws-bootstrap-credentials=true \
    --vpc-id=${AWS_VPC_ID} \
    --ami=${AWS_AMI_ID} \
    --subnet-ids=${AWS_SUBNET_IDS} \
    --internal-load-balancer=true \
    --additional-security-group-ids=${AWS_ADDITIONAL_SECURITY_GROUPS} \
    --registry-mirror-url=${REGISTRY_URL} \
    --registry-mirror-cacert=${REGISTRY_CA} \
    --registry-mirror-username=${REGISTRY_USERNAME} \
    --registry-mirror-password=${REGISTRY_PASSWORD} \
    --self-managed

    OR
    Run the Option Two command as explained above to create a cluster with a GPU AMI providing the location, format and base OS:

    CODE
    dkp create cluster aws 
    --cluster-name=${CLUSTER_NAME} \
    --additional-tags=owner=$(whoami) \
    --with-aws-bootstrap-credentials=true \
    --vpc-id=${AWS_VPC_ID} \
    --ami-owner AWS_ACCOUNT_ID \
    --ami-base-os ubuntu-20.04 \
    --ami-format 'example-{{.BaseOS}}-?{{.K8sVersion}}-*' \
    --subnet-ids=${AWS_SUBNET_IDS} \
    --internal-load-balancer=true \
    --additional-security-group-ids=${AWS_ADDITIONAL_SECURITY_GROUPS} \
    --registry-mirror-url=${REGISTRY_URL} \
    --registry-mirror-cacert=${REGISTRY_CA} \
    --registry-mirror-username=${REGISTRY_USERNAME} \
    --registry-mirror-password=${REGISTRY_PASSWORD} \
    --self-managed

If your environment uses HTTP/HTTPS proxies, you must include the flags --http-proxy, --https-proxy, and --no-proxy and their related values in this command for it to be successful. More information is available in Configuring an HTTP/HTTPS Proxy.

  1. After cluster creation, create the node pool after cluster creation:

    CODE
    dkp create nodepool aws -c ${CLUSTER_NAME} \
    --instance-type p2.xlarge \
    --ami-id=${AMI_ID_FROM_KIB} \
    --replicas=1 ${NODEPOOL_NAME} \
    --kubeconfig=${CLUSTER_NAME}.conf

To understand how this process works step by step, you can find a customizable Create a New AWS Cluster under Custom Installation and Additional Infrastructure Tools .

Cluster Verification

If you want to monitor or verify the installation of your clusters, refer to:

Verify your Cluster and DKP Installation.

Next Step:

AWS Air-gapped GPU: Install Kommander

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.