Instructions on enabling the NVIDIA platform application on attached or managed clusters

Enable NVIDIA Platform Application on Attached or Managed Clusters

If you intend to run applications that utilize GPU’s on Attached or Managed clusters, you must enable the nvidia-gpu-operator platform application in the workspace.

To use the UI to enable the application, refer to the Platform Applications | Customize-a-workspace’s-applications page.

To use the CLI, refer to the Deploy Platform Applications via CLI page.

If only a subset of attached or managed clusters in the workspace are utilizing GPU’s, refer to Enable an Application per Cluster on how to only enable the nvidia-gpu-operator on specific clusters.

After you have enabled the nvidia-gpu-operator app in the workspace on the necessary clusters, proceed to the next section.

Select the Correct Toolkit Version for your NVIDIA GPU Operator

The NVIDIA Container Toolkit allows users to run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPU and must be configured correctly according to your base operating system.

Workspace (Attached and Managed clusters) Customization

Refer to AppDeployment resources for how to use the CLI to customize the platform application on a workspace.

If specific attached/managed clusters in the workspace require different configurations, refer to Customize an Application per Cluster for how to do this.

  1. Select the correct Toolkit version based on your OS and create a ConfigMap with these configuration override values:

    Centos 7.9/RHEL 7.9:
    If you’re using Centos 7.9 or RHEL 7.9 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:

    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: ${WORKSPACE_NAMESPACE}
      name: nvidia-gpu-operator-overrides-attached
    data:
      values.yaml: |
        toolkit:
          version: v1.10.0-centos7
    EOF
    CODE

    RHEL 8.4/8.6 and SLES 15 SP3
    If you’re using RHEL 8.4/8.6 or SLES 15 SP3 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:

    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: ${WORKSPACE_NAMESPACE}
      name: nvidia-gpu-operator-overrides-attached
    data:
      values.yaml: |
        toolkit:
          version: v1.10.0-ubi8
    EOF
    CODE

    Ubuntu 18.04 and 20.04
    If you’re using Ubuntu 18.04 or 20.04 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:

    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: ${WORKSPACE_NAMESPACE}
      name: nvidia-gpu-operator-overrides-attached
    data:
      values.yaml: |
        toolkit:
          version: v1.11.0-ubuntu20.04
    EOF
    CODE
  2. Note the name of this ConfigMap (nvidia-gpu-operator-overrides-attached) and use it to set the necessary nvidia-gpu-operator AppDeployment spec fields depending on the scope of the override. Alternatively, you can also use the UI to pass in the configuration overrides for the app per workspace or per cluster.