Kaptain offers several ways to train models (incl. distributed), tune hyperparameters, and deploy optimized models that autoscale.
The Kaptain SDK is the best choice for a data science-friendly user experience. It is designed to be a great first experience with Kaptain.
If you prefer to have full control and are familiar and comfortable with Kubeflow SDKs, or YAML specifications in Kubernetes, then we suggest you consult the other tutorials.
Note that everything can be done from within notebooks, thanks to Kaptain’s notebooks-first approach to machine learning.
How to Navigate the User Interface
The Kubeflow central dashboard is the main entry point to Kaptain after logging in:
The central area shows recent pipelines, pipeline runs, and notebooks as well as links to documentation.
The namespace is shown at the top:
demo in the image above.
The menu on the left has the following entries
- Home, which is shown in the image
- Notebook Servers
These are discussed in more detail below.
Pipelines and runs with their logged artifacts are available from the Pipelines menu. Details on how to create pipelines are in the pipelines tutorial.
A list of pipeline runs is available in the Experiments menu. It shows a list of runs along with their status, duration, and model performance metrics. As an example, the accuracy and loss are shown in the image below.
Pipeline Run Logs
After selecting a single run, logs for individual steps of a pipeline can be displayed:
This is particularly helpful when debugging pipeline steps.
Each step logs its inputs and outputs, which can be accessed via the Input/Output tab.
Input and outputs of steps, also known as artifacts, are stored in the Artifacts Store. These are available from the Artifacts menu. The lineage of pipeline artifacts is displayed in the Lineage Explorer tab:
Notebook servers can be set up from the Notebook Servers menu on the central dashboard. From there, users can choose a quick-start image for any of the supported deep learning frameworks: TensorFlow, PyTorch, and MXNet. Each quick-start image comes in two flavors: CPU and GPU. The latter has all the drivers needed for training on GPUs included. Custom images can also be provided.
Each notebook server allows secrets and volumes to be mounted.
Once a notebook server has been set up, a familiar Jupyter notebook environment is available:
The numbered sections are as follows:
- Directory tree on the notebook server
- Visual git module
- Table of contents for the currently visible notebook
- Notebook diff viewer
- Notebook cells with embedded output
Additional details on the JupyterLab environment can be found in the JupyterLab documentation.
Katib is the hyperparameter tuner and neural architecture search module in Kaptain. To learn how to create hyperparameter tuning experiments, read the tutorial.
These experiments can be accessed through the HP → Monitor submenu:
For each experiment a chart of the main objective and different hyperparameter values is shown:
The View Experiment button shows the details of the experiment itself. The View Suggestion button yields the hyperparameters of the best trial in the experiment.
At the bottom of the chart is a list of all trials, their statuses, objective values, and hyperparameters.
Tutorial for Kaptain SDK…Read More
Tutorials for model development and distributed training with TensorFlow, PyTorch, and MXNet…Read More
Tutorial for Hyperparameter Tuning…Read More
End-to-end Pipeline with KFServing
Tutorial for End-to-end Pipeline with KFServing…Read More
Tutorial for Kubeflow Fairing…Read More