Metadata SDK

Tutorial for Metadata SDK

NOTE: All tutorials in Jupyter Notebook format are available for download. You can either download them to a local computer and upload to the running Jupyter Notebook or run wget -O - | tar xz from a Jupyter Notebook Terminal running in your Kaptain installation.

NOTE: Please note that these notebook tutorials have been built for and tested on D2iQ's Kaptain. Without the requisite Kubernetes operators and custom Docker images, these notebook will likely not work.

Metadata SDK


All information about executions, models, data sets as well as the files and objects that are a part of a machine learning workflow are referred to as metadata. The Metadata SDK allows you to manage all ML assets:

  • An Execution captures metadata of a single run of an ML workflow, which can be either a pipeline or a notebook. Any derived data that is used or produced in the context of a single execution is referred to as an artifact.
  • Metadata of a Model includes a URI to its location, a name and description, training framework (e.g. TensorFlow, PyTorch, MXNet), hyperparameters and their values, and so on.
  • Metrics collect evaluation metrics of the model
  • A DataSet describes the data that is either the input or output of a component within an ML workflow.

Behind the scenes, the Metadata SDK uses the gRPC service of MLMD, the ML Metadata library, which was originally designed for TFX (TensorFlow eXtended) and offers both implementations for SQLite and MySQL.

With the Metadata SDK you can also add so-called metadata watchers to check up on Kubernetes resource changes and to save the related data in the metadata service.

What You Will Learn

In this notebook, you will learn how to use the Metadata SDK to display information about executions and interact with the metadata available within Kubeflow.

What You Need

Nothing except this notebook.

How to Create a Workspace

A workspace is a grouping of pipelines, notebooks, and their artifacts. A single workspace can hold multiple executions.

To define various objects (e.g. executions, runs, models) you therefore need to create a workspace. Unless you define multiple workspaces within the same context, you do not have to specify it after you have created

Import the metadata modules and store the default DNS for the host as well as the port for the metadata store in a couple of variables:

from kubeflow.metadata import metadata
METADATA_STORE_HOST = "metadata-grpc-service.kubeflow"

METADATA_STORE = metadata.Store(
ws = metadata.Workspace(
    name="demo workspace",
    description="A workspace for the demo",
    labels={"some_key": "a-value"},

This creates a demo workspace with a custom label some_key that holds the a-value. Labels are typically used to enable easier filtering. These are (as of yet) not part of the Kubeflow central dashboard, but they can be used to filter by means of the SDK.

How to Create a Run in a Workspace

The difference between runs and executions is subtle: an execution records the run of a component or step in a machine learning workflow (along with its runtime parameters).

A run is an instance of an executable step.

An execution therefore always refers to a run.

Here is a helper function:

from uuid import uuid4

def add_suffix(name: str) -> str:
    Appends an underscore and hexidecimal UUID to `name`

    :param str name: String to be suffixed
    :return: Suffixed string
    :rtype: str
    return f"{name}_{uuid4().hex}"

The run itself is then defined as follows:

run = metadata.Run(
    workspace=ws, name=add_suffix("run"), description="A run in the workspace",

How to Create an Execution of a Run

exec = metadata.Execution(
    description="An execution of the run",

print(f"Execution ID: {}")
Execution ID: 19

How to Log Artifacts for an Execution

An execution can have both input and output artifacts. Artifacts that can be logged for executions are Model, DataSet, Metrics, or a custom artifact type.

You can see defined artifacts by navigating to the Kubeflow Central Dashboard’s Artifact Store.

How to Log a Data Set

A data set that is used by the model itself is an input artifact. It can be registered as follows:

date_set_version = add_suffix("ds")

data_set = exec.log_input(
        description="Sample data",
        query="SELECT * FROM mytable",

print(f"Data set ID:      {}")
print(f"Data set version: {data_set.version}")
Data set ID:      25
Data set version: ds_2f9dd04cf4584b87bd670ad474879b85

The data itself is available at the specified uri. The query is optional and documents how this data is fetched from the source. It is not used to retrieve it. After all, the data does not have to live in a relational database at all.

How to Log a Model

If a step of a machine learning workflow generates a model, it is logged as an output artifact:

model_version = add_suffix("model")

model = exec.log_output(
        description="Model to recognize handwritten digits",
        model_type="neural network",
        training_framework={"name": "tensorflow", "version": "v1.0"},
            "learning_rate": 0.5,
            "layers": [10, 3, 1],
            "early_stop": True,
        labels={"a_label": "some-value"},

print(f"Model ID:      {}")
print(f"Model version: {model.version}")
Model ID:      26
Model version: model_fb6da2a4ea0c4bb5be7a42fc09221797

The reason it is an output artifact is simple yet perhaps not obvious: until a model has been run (i.e. trained) its weights and values are not yet computed. The trained model is therefore the output of a step in the workflow.

Please note that the model type is a free-form text field. Only uri and name are required, although version is highly recommended.

Models as Input Artifacts
You may wonder whether a model can ever be an input artifact.
The answer is: Yes!

Model serving is probably the most common case for a model to be listed as an input artifact to an execution, but there is another possibility. In transfer learning, the knowledge gained from a base model is 'transferred' to another problem that is related but different. Suppose you have to build an application to classify pictures of drinks into four categories: tea, coffee, soft drinks, and alcohol. The basic task of identifying cups, mugs, glasses, liquids, and so on requires a lot of data and hardware, so it makes sense to rely on a pre-trained network for feature extraction. Since the (dense) layers near the output of the model are more specific to the task at hand than the (convolutional) layers near the input, you cut the base network after the convolutional layers and add in your own dense layers to perform the necessary task-dependent classification. The problem of classifying drinks is related to image recognition, and the knowledge gained from the latter (i.e. features extracted that are needed to classify general images) is transferred to the former. If your own data set is huge, the recommendation is to train the base model. If, however, your own data set is small, it is advantageous to freeze the base model. The base model is then an input artifact to an execution.

Examples of classifiers based on pre-trained base models (e.g. cats vs dogs (in TensorFlow), ants vs bees (in PyTorch), various materials (in MXNet)) are available in case you want to know more.

How to Log the Evaluation of a Model

metrics = exec.log_output(
        name="MNIST evaluation",
        description="Evaluation of the MNIST model",
        values={"accuracy": 0.95},
        labels={"mylabel": "l1"},

print(f"Metrics ID: {}")
Metrics ID: 27

Possible values for metrics_type:


If you are not familiar with the distinction between validation and training, please check out the notebook on hyperparameter tuning, which explains the difference and the need for an additional evaluation step.

How to Add Metadata for Serving the Model

Once you are satisfied with the model, you want to serve it. The model server is an execution with a model as input artifact:

app = metadata.Execution(
    name="Serving the MNIST model",
    description="An execution to represent the model serving component",

served_model = metadata.Model(
    name="MNIST", uri="gcs://my-bucket/mnist", version=model.version,

m = app.log_input(served_model)

print(f"Serving model with ID:      {}")
print(f"Serving model with version: {m.version}")
Serving model with ID:      26
Serving model with version: model_fb6da2a4ea0c4bb5be7a42fc09221797

The name, uri, and version identify the model. As stated before, only the first two are required, but it is a good practice to also include the version.

How to List All Models in a Workspace

The Artifact Store is user interface that displays artifacts across all workspaces. Not all fields are available, which means you cannot filter easily on, say, custom labels.

Fortunately, you can request all artifacts of a certain type: Model, Metrics, DataSet, or a custom artifact. Here’s how to list all models:

artifacts = ws.list(metadata.Model.ARTIFACT_TYPE_NAME)
[{'id': 26,
  'workspace': 'demo workspace',
  'run': 'run_5faf20cfbba941c9a400fbfbe02cd654',
  'model_type': 'neural network',
  'create_time': '2020-04-20T13:21:12.320745Z',
  'version': 'model_fb6da2a4ea0c4bb5be7a42fc09221797',
  'owner': '',
  'description': 'Model to recognize handwritten digits',
  'name': 'MNIST',
  'uri': 'gcs://my-bucket/mnist',
  'training_framework': {'name': 'tensorflow', 'version': 'v1.0'},
  'hyperparameters': {'learning_rate': 0.5,
   'layers': [10, 3, 1],
   'early_stop': True},
  'labels': {'a_label': 'some-value'},
  'kwargs': {}}]

The output is not exactly fabulous for humans, so you can restructure it to make it easier to manipulate:

import pandas as pd

id workspace run model_type create_time version owner description name uri training_framework hyperparameters labels kwargs
0 26 demo workspace run_5faf20cfbba941c9a400fbfbe02cd654 neural network 2020-04-20T13:21:12.320745Z model_fb6da2a4ea0c4bb5be7a42fc09221797 Model to recognize handwritten digits MNIST gcs://my-bucket/mnist {'name': 'tensorflow', 'version': 'v1.0'} {'learning_rate': 0.5, 'layers': [10, 3, 1], '... {'a_label': 'some-value'} {}

You can see the output includes the labels. Labels are particularly helpful when monitoring many (versions of) models in production, both with regard to system and model performance, as both can affect the overall user experience; a bad prediction (e.g. recommendation) from a responsive service negatively affects the user experience, as does an unresponsive service with good predictions. Both model and system performance metrics need to be tracked over time and across versions to ensure a solid user experience. With (shared) labels it is possible to monitor both simultaneously.

How to Track Lineage

The same is true of executions and artifacts that belong to certain models

model_events =[])

execution_ids = set(e.execution_id for e in model_events)
print(f"Executions related to the model: {execution_ids}")

trainer_events =[])
artifact_ids = set(e.artifact_id for e in trainer_events)
print(f"Artifacts related to the trainer: {artifact_ids}")
Executions related to the model: {19, 20}
Artifacts related to the trainer: {25, 26, 27}