Skip to main content
Skip table of contents

NVIDIA Platform Application Attached or Managed Cluster

Instructions on enabling the NVIDIA platform application on attached or managed clusters

Enable NVIDIA Platform Application on Attached or Managed Clusters

If you intend to run applications that utilize GPU’s on Attached or Managed clusters, you must enable the nvidia-gpu-operator platform application in the workspace.

To use the UI to enable the application, refer to the Platform Applications | Customize-a-workspace’s-applications page.

To use the CLI, refer to the Deploy Platform Applications via CLI page.

If only a subset of attached or managed clusters in the workspace are utilizing GPU’s, refer to Enable an Application per Cluster on how to only enable the nvidia-gpu-operator on specific clusters.

After you have enabled the nvidia-gpu-operator app in the workspace on the necessary clusters, proceed to the next section.

Select the Correct Toolkit Version for your NVIDIA GPU Operator

The NVIDIA Container Toolkit allows users to run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPU and must be configured correctly according to your base operating system.

Workspace (Attached and Managed clusters) Customization

Refer to AppDeployment resources for how to use the CLI to customize the platform application on a workspace.

If specific attached/managed clusters in the workspace require different configurations, refer to Customize an Application per Cluster for how to do this.

  1. Select the correct Toolkit version based on your OS and create a ConfigMap with these configuration override values:

    Centos 7.9/RHEL 7.9:
    If you’re using Centos 7.9 or RHEL 7.9 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:

    CODE
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: ${WORKSPACE_NAMESPACE}
      name: nvidia-gpu-operator-overrides-attached
    data:
      values.yaml: |
        toolkit:
          version: v1.10.0-centos7
    EOF

    RHEL 8.4/8.6 and SLES 15 SP3
    If you’re using RHEL 8.4/8.6 or SLES 15 SP3 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:

    CODE
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: ${WORKSPACE_NAMESPACE}
      name: nvidia-gpu-operator-overrides-attached
    data:
      values.yaml: |
        toolkit:
          version: v1.10.0-ubi8
    EOF

    Ubuntu 18.04 and 20.04
    If you’re using Ubuntu 18.04 or 20.04 as the base operating system for your GPU enabled nodes, set the toolkit.version parameter in your install.yaml to the following:

    CODE
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: ${WORKSPACE_NAMESPACE}
      name: nvidia-gpu-operator-overrides-attached
    data:
      values.yaml: |
        toolkit:
          version: v1.11.0-ubuntu20.04
    EOF
  2. Note the name of this ConfigMap (nvidia-gpu-operator-overrides-attached) and use it to set the necessary nvidia-gpu-operator AppDeployment spec fields depending on the scope of the override. Alternatively, you can also use the UI to pass in the configuration overrides for the app per workspace or per cluster.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.