Kubeflow Fairing: Build Docker Images from within Jupyter Notebooks
Introduction
Although you can build Docker images by downloading files to your local machine and subsequently pushing the images to a container registry, it is much faster to do so without leaving Jupyter! Kubeflow Fairing makes that possible.
What You’ll Learn
In this notebook we’ll go through the steps involved in building a Docker image from a base image (e.g. TensorFlow or PyTorch) and a custom trainer file that defines your machine learning model.
This image can be used for distributed training or hyperparameter tuning.
You can use the model code you generated with %%writefile
in MNIST with TensorFlow tutorial or MNIST with PyTorch tutorial or a file of your own choosing.
The Docker image builder process stores (temporary) files in MinIO. MinIO, an open-source S3-compliant object storage tool, is already included with your Kubeflow installation.
What You’ll Need
- An executable Python file (e.g. an
mnist.py
trainer); - A container registry to which you have push access.
Please note that this notebook is interactive!
Prerequisites
Kubeflow Fairing must be installed, so let’s check that it is:
! pip show kubeflow-fairing
Prepare the training code and datasets
The examples in this tutorial require a trainer code file mnist.py
and a dataset to be present in the current folder.
The code and datasets are already available in MNIST with TensorFlow
or MNIST with PyTorch tutorials and can be reused here. Run one of the following shortcuts to copy the required files.
TensorFlow
! jq -j '.cells[] | select(.metadata.tags!= null) | select (.metadata.tags[] | contains("trainer_code")) | .source | .[]' ../training/tensorflow/MNIST\ with\ TensorFlow.ipynb | sed '1d' > mnist.py
! cp -R ../training/tensorflow/datasets .
PyTorch
! jq -j '.cells[] | select(.metadata.tags!= null) | select (.metadata.tags[] | contains("trainer_code")) | .source | .[]' ../training/pytorch/MNIST\ with\ PyTorch.ipynb | sed '1d' > mnist.py
! cp -R ../training/pytorch/datasets .
We’re ready to go!
How to Create a Docker Credentials File and Kubernetes Secret
We shall also require getpass
, so that you can provide your password interactively without it being immediately visible.
It’s a standard Python library, so there is no need to install it.
A simple import
will suffice.
import json
import getpass
Please type in the container registry username by running the next cell:
docker_user = input()
Please enter the password for the container registry by executing the following cell:
docker_password = getpass.getpass()
With these details, we can base-64-encode the username and password and create a Kubernetes configmap with a name expected by the builder’s context source.
%%capture creds --no-stderr
! echo -n "$docker_user:$docker_password" | base64
docker_credentials = creds.stdout.rstrip()
js = {"auths": {"https://index.docker.io/v1/": {"auth": docker_credentials}}}
%store json.dumps(js) >config.json
! kubectl create configmap docker-config --from-file=config.json
How to Set up MinIO
from kubeflow.fairing import constants
from kubeflow.fairing.builders.cluster.minio_context import MinioContextSource
s3_endpoint = "minio-service.kubeflow:9000"
s3_endpoint_url = f"http://{s3_endpoint}"
s3_secret_id = "minio"
s3_secret_key = "minio123"
s3_region = "us-east-1"
# The default Kaniko version (0.14.0) does not work with Kubeflow Fairing
constants.constants.KANIKO_IMAGE = "gcr.io/kaniko-project/executor:v0.19.0"
minio_context_source = MinioContextSource(
endpoint_url=s3_endpoint_url,
minio_secret=s3_secret_id,
minio_secret_key=s3_secret_key,
region_name=s3_region,
)
How to Build a Docker Image
If you have your own container registry, please prepend it in REGISTRY
.
The IMAGE_NAME
contains the name of the image that will be built and pushed to the REGISTRY
.
REGISTRY = "mesosphere"
IMAGE_NAME = "kubeflow"
If your goal is to run a distributed training job immediately from a notebook, we recommend the Option 1. With it, you build (and push) the image as a part of a deployment (e.g. distributed training job).
If your goal is to provide a Docker image that includes the code for distributed training or hyperparameter tuning, Option 2 is more appropriate. It does not run the job (with pre-defined arguments) but merely pushes the image to the container registry.
Both options automatically push the image to the registry specified.
ENTRYPOINT
nor the CMD
is not set with docker inspect
on your local machine.
This means that you have to add the command you want to run to your Kubernetes resource specification (YAML)!
Without this modification to the YAML your pods will fail to run their containerized workloads.
You can do this by adding the key command
above the args
specification:
containers:
- name:
image:
command:
- python
- -u
- mnist.py
args:
- --epochs
- "7"
...
Option 1: Build-Push-Run
Multiple input files (e.g. a trainer and utilities) can be provided in the input_files
list.
There can be only one executable
file.
With the command
we must include all the mandatory arguments (i.e. epochs
):
from kubeflow import fairing
import glob
fairing.config.set_preprocessor(
"python",
command=["python", "-u", "mnist.py", "--epochs=3"],
input_files=["mnist.py"] + glob.glob("datasets/**", recursive=True),
path_prefix="/",
executable="mnist.py",
)
TensorFlow
If your mnist.py
file is based on TensorFlow, you must specify the appropriate base image for TensorFlow.
In case you want to run the model on CPUs only, you ought to drop the -gpu
suffix from the base image name.
The primary configuration options are the chief and worker counts, but feel free to peruse all available parameters here.
If your model code is based on PyTorch, please skip this section!
from kubeflow.fairing.kubernetes import utils as k8s_utils
BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.5.0-tensorflow-2.2.0-gpu"
fairing.config.set_builder(
name="cluster",
registry=REGISTRY,
base_image=BASE_IMAGE,
image_name=IMAGE_NAME,
context_source=minio_context_source,
)
fairing.config.set_deployer(
name="tfjob",
worker_count=2,
chief_count=1,
# remove this parameter if the cluster doesn't have GPUs
pod_spec_mutators=[k8s_utils.get_resource_mutator(gpu=1)],
)
fairing.config.run()
PyTorch
For a PyTorch-based mnist.py
model, you must specify the appropriate base image for PyTorch.
In case you want to run the model on CPUs and not GPUs, you simplify leave off the -gpu
suffix from the base image’s name.
The main configuration options are the master and worker counts, but you can see all options here.
from kubeflow.fairing.kubernetes import utils as k8s_utils
BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.5.0-pytorch-1.5.0-gpu"
fairing.config.set_builder(
name="cluster",
registry=REGISTRY,
base_image=BASE_IMAGE,
image_name=IMAGE_NAME,
context_source=minio_context_source,
)
fairing.config.set_deployer(
name="pytorchjob",
worker_count=2,
master_count=1,
# remove this parameter if the cluster doesn't have GPUs
pod_spec_mutators=[k8s_utils.get_resource_mutator(gpu=1)],
)
fairing.config.run()
Option 2: Build-and-Push
You can ‘just’ build a Docker image, that is, without running it by plugging it into a Kubeflow Fairing workflow, with the following snippet.
Please choose the appropriate BASE_IMAGE
based on whether your mnist.py
file is a TensorFlow or PyTorch model.
from kubeflow.fairing.builders import cluster
from kubeflow.fairing.preprocessors import base as base_preprocessor
import glob
# Choose which base image your executable mnist.py file requires
BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.5.0-tensorflow-2.2.0-gpu"
# BASE_IMAGE = "mesosphere/kubeflow:1.0.1-0.5.0-pytorch-1.5.0-gpu"
preprocessor = base_preprocessor.BasePreProcessor(
input_files=["mnist.py"] + glob.glob("datasets/**", recursive=True), path_prefix="/", executable="mnist.py"
)
cluster_builder = cluster.cluster.ClusterBuilder(
registry=REGISTRY,
base_image=BASE_IMAGE,
preprocessor=preprocessor,
image_name=IMAGE_NAME,
context_source=minio_context_source,
)
cluster_builder.build()
image_tag = cluster_builder.image_tag
print(f"Published Docker image with tag: {image_tag}")
Since the image is not run immediately, there is no need to specify a deployer
.
That is done with a YAML specification.
Moreover, we also leave out the command
in the preprocessor since Kubeflow Fairing does not set the entrypoint or executable command in the Docker image anyway.
We have to manually do that in the specification.