D2iQ® Kaptain® version 2.1.0 was released on August 31, 2022.

To get started with Kaptain, download and install the latest version of Kaptain.

Release Summary

Kaptain 2.1 includes exciting new features for model development, including integration with Nvidia’s catalog of pre-trained GPU-optimized models (NGC) and experiment tracking with MLflow, as well as security enhancements and bug fixes. Kaptain 2.1 focuses on building great models: Nvidia’s NGC catalog contains many best-of-breed models, and MLflow tracking ensures developers don’t have to leave their notebooks to log experiment metadata for comparing models. When you have great models, you get them into production quicker. Kaptain 2.1 supports Kubeflow 1.5 and new versions of the Kaptain SDK, 1.1.x and 1.2.x. For more information on SDK, see the Kaptain SDK documentation.

New features and capabilities

NVIDIA’s GPU Container Catalog (NGC) support

Nvidia's NGC is a catalog of containers, charts, pre-trained models, toolkits and more, which are optimized for GPU deployment. Kaptain supports the usage of Nvidia's NGC catalog in networked environments. For more information on how to use NGC’s resources with Kaptain, refer to Kaptain’s documentation section for NGC, and the NGC documentation site.

MLflow support

You can use the MLflow platform with Kaptain to collect data and manage the lifecycle of your Machine Learning and Artificial Intelligence models and experiments. You can now log hyperparameters and metrics directly from your notebooks to the MLflow instance that is bundled with Kaptain. For more information on how MLflow is integrated to Kaptain and how to use it, refer to the MLflow documentation.

Restrict access to Kaptain by establishing a list of allowed groups

You can authenticate and enable users and user groups to access Kaptain by linking your Kaptain’s Dex instance to an identity provider of your choice. From this release on, you have the option of adding or removing groups (that are established in your identity provider) to the AllowList to further restrict access to your Kaptain instance.

Support for Kubernetes 1.23

Kaptain 2.1 also supports Kubernetes 1.23 which contains upstream improvements such as security enhancements.

Improved Documentation Site

Kaptain 2.1 comes with a new D2iQ Help Center, which provides you with improved search functionality and gives us the future ability to add multimedia content. Explore the new D2iQ Help Center at the same site as our previous documentation.

Previous versions of the documentation remain available on our Archived Docs site.

Fixes and Improvements

  • UX improvements and bug fixes

  • KServe and Kubeflow Pipelines upgrades

Software updates

This version of Kaptain includes the following software versions:

Package Name

Current Version

Argo Workflows

3.3.8

CUDA

11.4

Katib

0.13.0

KServe

0.9.0

Kubeflow Pipelines

1.8.4

Kubeflow

1.5

MLflow

1.25.1

MXNet

1.9.0

Percona Kubernetes Operator

1.10.0

PyTorch

1.11.0

PyTorch Model Archiver

0.4.2

Spark Operator

1.1.17

Tensorflow

2.9.1

Training Operator

1.5.0

Known issues

cert-manager workaround for Kaptain

Some Kommander versions do not properly handle certificate renewal for the Cluster CA and certificates that are created for Kommander applications, which also affects Kaptain. While the effects can vary, the most common failure is the inability to launch Kaptain notebooks in Jupyter.

Regenerate the secrets in DKP

A permanent fix for the issue requires upgrading to Kommander 2.2.1 or higher. If you are running other versions of DKP, refer to the cert-manager expiration workaround documentation for DKP 2.1.0, 2.1.1 or 2.1.2 to run a docker container that extends the validity of the Cluster CA to 10 years and fixes the certificate reload issue.

Once this is done, you can fix the issue on Kaptain’s side.

Regain access to Kaptain

This gives you back the capability of launching notebooks in Jupyter:

  1. Kaptain has one certificate that you have to delete to force a refresh, and one that you can update manually for Istio:

    kubectl delete secrets kubeflow-gateway-certs -n kaptain-ingress --force
    CODE
  2. Obtain the CA from one of the other recreated certs:

    kubectl get secret kommander-traefik-certificate -n kommander -o jsonpath='{.data.ca\.crt}' > ca.crt
    CODE
  3. Use this CA and apply it to the Istio CA:

    kubectl delete secret kubeflow-oidc-ca-bundle -n kaptain-ingress --force
    kubectl -n kaptain-ingress create secret generic kubeflow-oidc-ca-bundle --from-file=oidcCABundle\.crt=ca.crt
    CODE

Running this command reloads the pod automatically. Wait a few minutes until you attempt to log in to DKP and Kaptain again.

Test by logging into both and launch a new notebook in Jupyter.