Upgrade Kaptain

Upgrade Kaptain on your cluster

Learn how to upgrade the existing Kaptain installation to a newer version.

Prerequisites

  • Kaptain 1.2.0-1.1.0 is installed on a Konvoy cluster.
  • The existing cluster meets the criteria listed in the installation documentation.

Upgrading Kaptain

  • Ensure the following base addons that are needed by Kaptain are enabled in your Konvoy cluster:

    - configRepository: https://github.com/mesosphere/kubernetes-base-addons
      configVersion: stable-1.20-4.1.0
      addonsList:
        - name: istio
          enabled: true
        - name: dex
          enabled: true
        - name: cert-manager
          enabled: true
        - name: prometheus
          enabled: true
    
  • Ensure the Kaptain addon repository is present in your Konvoy cluster.yaml:

        - configRepository: https://github.com/mesosphere/kubeaddons-kaptain
          configVersion: stable-1.20-1.3.0
          addonsList:
            - name: knative
              enabled: true
    
  • Download kubeflow-1.3.0_1.2.0.tgz tarball.

  • Upgrade Kaptain:

    kubectl kudo upgrade --instance kaptain --namespace kubeflow ./kubeflow-1.3.0_1.2.0.tgz
    
  • Monitor the upgrade process by running:

    kubectl kudo plan status --instance kaptain --namespace kubeflow
    

Once the upgrade plan completes, you can log in to Kaptain:

  • Discover the cluster endpoint and copy it to the clipboard. If you are running Kaptain on-premises, use this command:
    kf_uri=$(kubectl get svc kubeflow-ingressgateway --namespace kubeflow -o jsonpath="{.status.loadBalancer.ingress[*].ip}") && echo "https://${kf_uri}"
    
    Or if you are running Kaptain on AWS, use this command:
    kf_uri=$(kubectl get svc kubeflow-ingressgateway --namespace kubeflow -o jsonpath="{.status.loadBalancer.ingress[*].hostname}") && echo "https://${kf_uri}"
    
  • Get the login credentials from Konvoy to authenticate:
    konvoy get ops-portal
    

Workloads behavior during the upgrades

  • When upgrading from Kaptain version 1.2.0-1.1.0 to 1.2.0 the following workloads do not require stopping and can proceed without interruption during the upgrade: Jupyter Notebooks, Training Jobs (TFJob, PyTorchJob, MXNetJob), Katib Experiments and Trials, and SparkApplications), Kubeflow Pipelines.