Kubeflow Fairing

Tutorial for Kubeflow Fairing

NOTE: All tutorials are available in Jupyter Notebook format. To download the tutorials run curl -L https://downloads.mesosphere.io/kudo-kubeflow/d2iq-tutorials-1.0.1-0.3.1.tar.gz | tar xz from a Jupyter Notebook Terminal running in your KUDO Kubeflow installation.

NOTE: Please note that these notebook tutorials have been built for and tested on D2iQ's KUDO for Kubeflow. Without the requisite Kubernetes operators and custom Docker images, these notebook will likely not work.

Kubeflow Fairing: Build Docker Images from within Jupyter Notebooks

Introduction

Although you can build Docker images by downloading files to your local machine and subsequently pushing the images to a container registry, it is much faster to do so without leaving Jupyter! Kubeflow Fairing makes that possible.

What You’ll Learn

In this notebook we’ll go through the steps involved in building a Docker image from a base image (e.g. TensorFlow or PyTorch) and a custom trainer file that defines your machine learning model. This image can be used for distributed training or hyperparameter tuning. You can the model you generated with %%writefile in MNIST with TensorFlow or MNIST with PyTorch or a file of your own choosing.

The Docker image builder process stores (temporary) files in MinIO. MinIO, an open-source S3-compliant object storage tool, is already included with your Kubeflow installation.

What You’ll Need

  • An executable Python file (e.g. an mnist.py trainer);
  • A container registry to which you have push access.

Please note that this notebook is interactive!

Prerequisites

Kubeflow Fairing must be installed, so let’s check that it is:

! pip show kubeflow-fairing

We’re ready to go!

How to Create a Docker Credentials File and Kubernetes Secret

We shall also require getpass, so that you can provide your password interactively without it being immediately visible. It’s a standard Python library, so there is no need to install it. A simple import will suffice.

We do not recommend you store passwords directly in notebooks. Ideally, credentials are stored safely inside secrets management solutions or provided with service accounts. This notebook should be used for demonstration purposes only!
import json
import getpass

Please type in the container registry username by running the next cell:

docker_user = input()

Please enter the password for the container registry’s by executing the following cell:

docker_password = getpass.getpass()

With these details, we can base-64-encode the username and password and create a Kubernetes secret.

%%capture creds --no-stderr
! echo -n "$docker_user:$docker_password" | base64
docker_credentials = creds.stdout.rstrip()
js = {"auths": {"https://index.docker.io/v1/": {"auth": docker_credentials}}}

%store json.dumps(js) >config.json
! kubectl create configmap docker-config --from-file=config.json

How to Set up MinIO

from kubeflow.fairing import constants
from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource

s3_endpoint = "minio-service.kubeflow:9000"
s3_endpoint_url = f"http://{s3_endpoint}"
s3_secret_id = "minio"
s3_secret_key = "minio123"
s3_region = "us-east-1"

# The default Kaniko version (0.14.0) does not work with Kubeflow Fairing
constants.constants.KANIKO_IMAGE = "gcr.io/kaniko-project/executor:v0.19.0"

minio_context_source = MinioContextSource(
    endpoint_url=s3_endpoint_url,
    minio_secret=s3_secret_id,
    minio_secret_key=s3_secret_key,
    region_name=s3_region,
)

How to Build a Docker Image

If you have your own container registry, please set it in REGISTRY. The IMAGE_NAME contains the name of the image that will be built and pushed to the REGISTRY.

REGISTRY = "docker.io"
IMAGE_NAME = "mesosphere/kubeflow"

If your goal is to run a distributed training job immediately from a notebook, we recommend the Option 1. With it, you build (and push) the image as a part of a deployment (e.g. distributed training job).

If your goal is to provide a Docker image that includes the code for distributed training or hyperparameter tuning, Option 2 is more appropriate. It does not run the job (with pre-defined arguments) but merely pushes the image to the container registry.

Both options automatically push the image to the registry specified.

The Kubeflow Fairing API does not set the Docker image's entrypoint or command. You can check that neither the ENTRYPOINT nor the CMD is not set with docker inspect on your local machine. This means that you have to add the command you want to run to your Kubernetes resource specification (YAML)! Without this modification to the YAML your pods will fail to run their containerized workloads. You can do this by adding the key command above the args specification: containers: - name: image: command: - python - -u - mnist.py args: - --epochs - "7" ...

Option 1: Build-Push-Run

Multiple input files (e.g. a trainer and utilities) can be provided in the input_files list. There can be only one executable file. With the command we must include all the mandatory arguments (i.e. epochs):

from kubeflow import fairing

fairing.config.set_preprocessor(
    "python",
    command=["python", "-u", "mnist.py", "--epochs=3"],
    input_files=["mnist.py"],
    executable="mnist.py",
)

TensorFlow

If your mnist.py file contains a TensorFlow model, you must specify the appropriate base image for TensorFlow. In case you want to run the model on CPUs only, you ought to drop the -gpu suffix from the base image name. The primary configuration options are the chief and worker counts, but feel free to peruse all available parameters here.

If your model is based on PyTorch, please skip this section!

BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.3.1-tensorflow-2.2.0-gpu"

fairing.config.set_builder(
    name="cluster",
    registry=REGISTRY,
    base_image=BASE_IMAGE,
    image_name=IMAGE_NAME,
    context_source=minio_context_source,
)

fairing.config.set_deployer(name="tfjob", worker_count=2, chief_count=1)

fairing.config.run()

PyTorch

For a PyTorch-based mnist.py model, you must specify the appropriate base image for PyTorch. In case you want to run the model on CPUs and not GPUs, you simplify leave off the -gpu suffix from the base image’s name. The main configuration options are the master and worker counts, but you can see all options here.

BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.3.1-pytorch-1.5.0-gpu"

fairing.config.set_builder(
    name="cluster",
    registry=REGISTRY,
    base_image=BASE_IMAGE,
    image_name=IMAGE_NAME,
    context_source=minio_context_source,
)

fairing.config.set_deployer(
            name="pytorchjob", 
            worker_count=2,
            master_count=1,
)

fairing.config.run()

Option 2: Build-and-Push

You can ‘just’ build a Docker image, that is, without running it by plugging it into a Kubeflow Fairing workflow, with the following snippet. Please choose the appropriate BASE_IMAGE based on whether your mnist.py file is a TensorFlow or PyTorch model.

from kubeflow.fairing.builders import cluster
from kubeflow.fairing.preprocessors import base as base_preprocessor

# Choose which base image your executable mnist.py file requires
BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.3.1-tensorflow-2.2.0-gpu"
# BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.3.1-pytorch-1.5.0-gpu"

preprocessor = base_preprocessor.BasePreProcessor(
    input_files=["mnist.py"], executable="mnist.py"
)

cluster_builder = cluster.cluster.ClusterBuilder(
    registry=REGISTRY,
    base_image=BASE_IMAGE,
    preprocessor=preprocessor,
    image_name=IMAGE_NAME,
    context_source=minio_context_source,
)

cluster_builder.build()
image_tag = cluster_builder.image_tag
print(f"Published Docker image with tag: {image_tag}")

Since the image is not run immediately, there is no need to specify a deployer. That is done with a YAML specification. Moreover, we also leave out the command in the preprocessor since Kubeflow Fairing does not set the entrypoint or executable command in the Docker image anyway. We have to manually do that in the specification.