Kaptain is a general cloud native, enterprise-grade, and end-to-end AI/ML platform. The product is a set of open source products, including Kubeflow, with optimized configurations that supports end-to-end machine learning workflows.
Kaptain empowers Data Scientists and ML Engineers to run and scale their entire ML stack with much higher velocity on Kubernetes and Cloud-Native ecosystems.
Kaptain natively integrates Horovod - an open source distributed training framework - to support distributed deep learning across multi-GPU and multi-node clusters. Horovod is compatible with the existing TensorFlow, PyTorch, and MXNet deep learning frameworks and makes distributed Deep Learning super fast and easy.
Kaptain is also pre-configured with Apache Spark, providing the ability to tap into large pools of CPUs and GPUs, on demand.
Deploy and manage machine learning models with ease…Read More
Release notes for Kaptain 2.0…Read More
Learn how to download Kaptain…Read More
Install Kaptain on your cluster…Read More
Fresh install of Kaptain
Do a fresh install of Kaptain on your DKP cluster…Read More
Configuration settings for Kaptain…Read More
Operations and maintenance for Kaptain components…Read More
End-to-end tutorials for model development, distributed training, pipelines and metadata management…Read More
Monitor ML-workloads and resources utilization in Kaptain…Read More
Manage Users and Permissions
Manage Kubeflow users and permissions…Read More
Distribute a sensible configuration securely…Read More
Kaptain SDK Documentation…Read More
Troubleshooting Guide for Kaptain…Read More
Support and Services
Support and Services for Kaptain…Read More
Version Support Policy
Kaptain's supported version policy…Read More
List of Third-party trademarks mentioned in the Kaptain documentation…Read More
Access Documentation Archives
access older versions of Kaptain documentation…Read More