Pre-provisioned GPU: Install Kommander
Prerequisites
Ensure you have reviewed all Prerequisites for Install.
Ensure you have a default StorageClass.
Note down the name of the cluster, where you want to install Kommander. If you do not know it, use
kubectl get clusters -A
to display it.
Create your Kommander Installer Configuration File
Set the environment variable for your cluster:
CODEexport CLUSTER_NAME=<your-management-cluster-name>
Copy the
kubeconfig
file of your Management cluster to your local directory:CODEdkp get kubeconfig -c ${CLUSTER_NAME} >> ${CLUSTER_NAME}.conf
Create a configuration file for the deployment:
CODEdkp install kommander --init > kommander.yaml
Edit the installer file to include configuration overrides for the
rook-ceph-cluster
. DKP’s default configuration ships Ceph with PVC based storage which requires your CSI provider to support PVC with typevolumeMode: Block
. As this is not possible with the default local static provisioner, you can install Ceph in host storage mode.You can choose whether Ceph’s object storage daemon (osd) pods should consume all or just some of the devices on your nodes. Include one of the following Overrides:
To automatically assign all raw storage devices on all nodes to the Ceph cluster:
CODErook-ceph-cluster: enabled: true values: | cephClusterSpec: storage: storageClassDeviceSets: [] useAllDevices: true useAllNodes: true deviceFilter: "<<value>>"
To assign specific storage devices on all nodes to the Ceph cluster:
CODErook-ceph-cluster: enabled: true values: | cephClusterSpec: storage: storageClassDeviceSets: [] useAllNodes: true useAllDevices: false deviceFilter: "^sdb."
Note: If you want to assign specific devices to specific nodes using the
deviceFilter
option, refer to Specific Nodes and Devices. For general information on thedeviceFilter
value, refer to Storage Selection Settings.
If required: Customize your
kommander.yaml
.
See Kommander Customizations for customization options. Some of them include:
Custom Domains and Certificates, HTTP proxy, External Load Balancer, GPU utilization, etc.
Enable GPU Resources
In the same
kommander.yaml
file, enable Nvidia platform services.CODEapps: nvidia-gpu-operator: enabled: true
Append the correct Toolkit version based on your OS:
The NVIDIA Container Toolkit allows users to run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPU and must be configured correctly according to your base operating system.
Centos 7.9/RHEL 7.9:
If you’re using Centos 7.9 or RHEL 7.9 as the base operating system for your GPU enabled nodes, set thetoolkit.version
parameter in your Kommander Installer Configuration file or<kommander.yaml>
to the following:CODEkind: Installation apps: nvidia-gpu-operator: enabled: true values: | toolkit: version: v1.13.1-centos7
RHEL 8.4/8.6 and SLES 15 SP3
If you’re using RHEL 8.4/8.6 or SLES 15 SP3 as the base operating system for your GPU enabled nodes, set thetoolkit.version
parameter in your Kommander Installer Configuration file or<kommander.yaml>
to the following:CODEkind: Installation apps: nvidia-gpu-operator: enabled: true values: | toolkit: version: v1.13.1-ubi8
Ubuntu 18.04 and 20.04
If you’re using Ubuntu 18.04 or 20.04 as the base operating system for your GPU enabled nodes, set thetoolkit.version
parameter in your Kommander Installer Configuration file or<kommander.yaml>
to the following:CODEkind: Installation apps: nvidia-gpu-operator: enabled: true values: | toolkit: version: v1.13.1-ubuntu20.04
Enable DKP Catalog Applications and Install Kommander
If you want to enable DKP Catalog applications after installing DKP, see Enable DKP Catalog Applications after Installing DKP.
In the same
kommander.yaml
of the previous section, add these values fordkp-catalog-applications
:CODEapiVersion: config.kommander.mesosphere.io/v1alpha1 kind: Installation catalog: repositories: - name: dkp-catalog-applications labels: kommander.d2iq.io/project-default-catalog-repository: "true" kommander.d2iq.io/workspace-default-catalog-repository: "true" kommander.d2iq.io/gitapps-gitrepository-type: "dkp" gitRepositorySpec: url: https://github.com/mesosphere/dkp-catalog-applications ref: tag: v2.6.1
If you only want to enable catalog applications to an existing configuration, add these values to an existing installer configuration file to maintain your Management cluster’s settings.
Use the customized
kommander.yaml
to install DKP:CODEdkp install kommander --installer-config kommander.yaml --kubeconfig=${CLUSTER_NAME}.conf
Tips and recommendations
The
--kubeconfig=${CLUSTER_NAME}.conf
flag ensures that you install Kommander on the correct cluster. For alternatives, see Provide Context for Commands with a kubeconfig File.Applications can take longer to deploy, and time out the installation. Add the
--wait-timeout <time to wait>
flag and specify a period of time (for example,1h
) to allocate more time to the deployment of applications.If the Kommander installation fails, or you wish to reconfigure applications, rerun the
install
command to retry.