API Documentation

Kaptain SDK Python API Documentation

Table of Contents

kaptain.config

Config Objects

class Config()

__init__

 | __init__(docker_config_provider: ConfigurationProvider, storage_config_provider: ConfigurationProvider, docker_registry_url: Optional[str] = None, docker_registry_certificate_provider: Optional[ConfigurationProvider] = None, base_dir: str = os.getcwd(), base_model_storage_uri: str = "s3://kaptain/models")

Encapsulates platform-specific configuration such as access credentials or AWS endpoints. Config is provided as an argument to the Model and is used to instantiate concrete implementations of lower-level components based on its properties so that users work with a configuration-based API when it comes to fine-tuning the workloads.

Arguments:

  • docker_config_provider: the configuration provider for Docker registry.
  • storage_config_provider: the configuration provider for blob storage access. Currently, only S3 and MinIO are supported.
  • docker_registry_url: private custom Docker registry URL to use with provided TLS certificates.
  • docker_registry_certificate_provider: the configuration provider for Docker registry certificate.
  • base_dir: base directory to use for referencing relative file paths of model files. Defaults to current working directory.
  • base_model_storage_uri: name of a bucket in the remote storage (MinIO or S3) to store the model. Defaults to ‘s3://kaptain/models’

kaptain.envs

_M Objects

class _M(types.ModuleType)

the environment variables that can change anytime by the user

VERBOSE

 | @property
 | VERBOSE() -> bool

this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not

VERBOSE

 | @VERBOSE.setter
 | VERBOSE(value: bool) -> None

this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not

DEBUG

 | @property
 | DEBUG() -> bool

this environment variable (KAPTAIN_SDK_DEBUG) will show stacktrace for uncaught exceptions

LOG_TIMEFORMAT

 | @property
 | LOG_TIMEFORMAT() -> str

this environment variable (KAPTAIN_SDK_LOG_TIMEFORMAT) will set the time format to show in logs

kaptain.utilities

blank

blank(s: str) -> bool

Checks if a string is blank.

Arguments:

  • s: Input string

Returns:

True if blank, False otherwise

none_or_blank

none_or_blank(s: Optional[str]) -> bool

Checks if an optional string is either None or blank.

Arguments:

  • s: Input string

Returns:

True if None or blank, False otherwise

MyFormatter Objects

class MyFormatter(logging.Formatter)

formatter for logger to allow for millisecond format via %f

set_logging_format

set_logging_format(datefmt: str) -> None

Helper to change logging format when the environment variable KAPTAIN_SDK_LOG_TIMEFORMAT changes or when starting

kaptain.exceptions

InvalidModelProperty Objects

class InvalidModelProperty(Exception)

Raised when a model property is None or blank.

UndefinedModelProperty Objects

class UndefinedModelProperty(Exception)

Raised when a model property is not defined.

UnsupportedModelFrameworkException Objects

class UnsupportedModelFrameworkException(Exception)

Raised when a model framework is not supported.

UnsupportedAlgorithmException Objects

class UnsupportedAlgorithmException(Exception)

Raised when a hyperparameter tuning algorithm is not supported.

UnsupportedModelDeploymentException Objects

class UnsupportedModelDeploymentException(Exception)

Raised when a model deployment is not supported.

UnsupportedMetricsTypeException Objects

class UnsupportedMetricsTypeException(Exception)

Raised when a metric type is not supported.

ModelDeploymentException Objects

class ModelDeploymentException(Exception)

Raised in case of a model deployment failure.

ModelValidationException Objects

class ModelValidationException(Exception)

Raised in case the model configuration properties are missing or model is in a state that is unsuitable for the operation invoked on the model.

ImageBuildException Objects

class ImageBuildException(Exception)

Raised in case of a image build failure.

WorkloadDeploymentError Objects

class WorkloadDeploymentError(Exception)

Raised in case of a workload deployemnt failure, e.g. failed scheduling

kaptain.model

kaptain.model.models

Model Objects

class Model()

__init__

 | __init__(id: str, name: str, description: str, version: str, framework: str, framework_version: str, main_file: str, image_name: str, base_image: str, extra_files: Optional[List[str]] = None, requirements: Optional[str] = None, labels: Optional[List[str]] = None, config: Optional[Config] = None)

A representation of a machine learning model.

When the model is created for the first time, its internal revision is set to a random UUID and its internal state is “untrained”. Once the model is trained or tuned, its state will be updated accordingly, hyperparameter values set, its revision refreshed, and it can be saved or deployed. Each action (train, tune, deploy) alters the revision and is stored in the model tracking database.

Arguments:

  • id: Unique identifier of model, e.g. “dev/mnist”. It is recommended to include the stage of the model (e.g. dev/prod) in the name to make it easier to filter models under active development and in production.
  • name: Short name of the model, e.g. “MNIST”. This name is visible in the model tracking database.
  • description: Description of the model, e.g. “Digit recognition for MNIST data set”. This description is visible in the model tracking database.
  • version: Model version, e.g. “4.5”
  • main_file: Main (Python) file that contains the executable model code, e.g. “trainer.py”.
  • image_name: Name of the repository to push the resulting image, e.g. ‘kaptain/mnist’ Can also contain image tag, e.g. “kaptain/mnist:0.0.1-tensorflow-2.2.0”.
  • extra_files: Auxiliary files, e.g. [“utils.py”, “data_loader.py”].
  • requirements: Additional pip requirements, e.g. [“numpy”, “nltk==3.5”]
  • framework: Machine learning library or framework used for the model, e.g. “tensorflow”.
  • framework_version: Machine learning library or framework version used by model, e.g. “2.3.2”
  • base_image: Base container image, e.g. “tensorflow-2.3.2”
  • labels: Custom labels for deployment-related metadata, e.g. “dev/mnist-tensorflow”
  • config: Configuration object used for configuring access to Docker registries and blob storage.

hyperparameters

 | @property
 | hyperparameters() -> Optional[Dict[str, Any]]

Hyperparameters of the model as defined through an action:

  • Train: uses the static values provided to the training procedure.
  • Tune: extracts the recommended values after running multiple experiments.

build

 | build(verbose: Optional[bool] = None) -> None

Builds a Docker image with the model training code and dependencies and publishes it to the registry specified in the configuration. Label with checksum of the model’s content will be included in the image. Image rebuilding is triggered only if an image with the same name and checksum is not already present in the registry.

Arguments:

  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).

train

 | train(*args: str, *, hyperparameters: Dict[str, Any], gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, workers: int = 2, verbose: Optional[bool] = None, **kwargs: str, ,) -> bool

Train a model in a distributed manner.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters::

model.train(workers=1, cpu=1, memory="2G", gpus=0)

… the model training process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter::

model.train(workers=workers, resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

  • args: Arguments to be passed to the training function.
  • hyperparameters: Dictionary of hyperparameter values.
  • workers: Number of parallel workers to use (default: 2).
  • gpus: Number of GPUs to use (default: 0).
  • memory: Amount of memory for each worker (optional),
  • cpu: Number of CPUs to use for each worker (optional).
  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).
  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).
  • kwargs: Keyword arguments to be passed to the training function.

Returns:

True if successful, otherwise False

tune

 | tune(*args: str, *, hyperparameters: Dict[str, Domain], objectives: List[str], objective_goal: Optional[float] = None, objective_type: str = "maximize", workers: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, trials: int = 16, parallel_trials: int = 2, failed_trials: int = 4, algorithm: Optional[str] = Algorithm.RANDOM.value, algorithm_setting: Optional[dict] = None, verbose: Optional[bool] = None, **kwargs: str, ,) -> bool

Tunes a model with parallel trials and possibly distributed trials.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters::

model.tune(hyperparameters=params, objectives=objectives, cpu=1, memory="2G", gpus=0)

… the deployed tuning process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter::

model.tune(
  hyperparameters=params,
  objectives=objectives,
  resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

  • args: Arguments to be passed to the training/tuning function.
  • hyperparameters: Dictionary of hyperparameters and their specified domains.
  • objectives: List of metrics to track in order of importance. The first one listed is used in conjunction with the objective goal and type.
  • objective_goal: Main objective’s goal, which when reached causes the tuning to stop. The main objective is the first element in objectives. If None, the tuning will continue until the maximum number of trials has been reached.
  • objective_type: Whether to “maximize” or “minimize” the main objective’s value (default: maximize).
  • workers: Number of parallel workers to use for each trial (default: 2).
  • gpus: Number of GPUs to use (default: 0).
  • memory: Amount of memory for each worker (optional),
  • cpu: Number of CPUs to use for each worker (optional).
  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).
  • trials: Maximum number of trials (default: 16).
  • parallel_trials: Maximum number of trials to run in parallel (default: 2).
  • failed_trials: Maximum number of failed trials before hyperparameter tuning stops (default: 4).
  • algorithm: Algorithm to use for hyperparameter search (default: random).
  • algorithm_setting: Algorithm settings. Please see https://www.kubeflow.org/docs/components/hyperparameter-tuning/experiment/ for details.
  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).
  • kwargs: Keyword arguments to be passed to the training/tuning function.

Returns:

True if successful, otherwise False

deploy

 | deploy(model_uri: Optional[str] = None, autoscale: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, replace: bool = False, **kwargs: str, ,) -> bool

Deploys a model.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters::

model.deploy(model_uri=uri, cpu=1, memory=“2G”, gpus=0)

… the deployed model process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter::

model.deploy(model_uri=uri, resources=Resources(cpu_request=1, memory_limit=“2G”, gpu_limit=gpus))

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

  • model_uri: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.
  • autoscale: Target concurrency (default: 2).
  • gpus: Number of GPUs to use (default: 0).
  • memory: Amount of memory for each worker (optional),
  • cpu: Number of CPUs to use for each worker (optional).
  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).
  • replace: Safety flag to avoid accidental redeployment of the model. If True, the previously deployed model will be replaced. If False, an error will be logged in case the model had been previously deployed.
  • kwargs: Keyword arguments for the deployment.

Returns:

True if successful, otherwise False

deploy_canary

 | deploy_canary(canary_traffic_percentage: int, model_uri: Optional[str] = None, **kwargs: str, ,) -> None

Deploys a model in a canary with a pre-determined percentage of traffic. A canary deployment allows a model to be run in parallel with a baseline or previous model revision. This allows traffic to be split, so the latest revision can be checked for possible issues with model (e.g. compared to the baseline) or system (e.g. latency) performance. To deploy a model to the canary, a previously deployed model revision must exist.


To deploy canary with 30 percent traffic:

model.deploy_canary(canary_traffic_percentage=30)

To change the canary traffic percentage to 50 (half the traffic):

model.deploy_canary(canary_traffic_percentage=50)

To deploy canary with 30 percent traffic and specified saved model location:

model.deploy_canary(canary_traffic_percentage=30, model_uri=uri)

To change the canary traffic percentage to 50 (half the traffic) for a model deployed from a specified saved location:

model.deploy_canary(canary_traffic_percentage=50, model_uri=uri)

Arguments:

  • canary_traffic_percentage: the percentage of traffic to route to the canary model.
  • model_uri: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.

rollback_canary

 | rollback_canary() -> None

Undeploy the model from canary and switch 100% of traffic to the previously deployed baseline model.

:raises: ModelDeploymentException if canary deployment doesn’t exist.

promote_canary

 | promote_canary() -> None

Promote the model from canary to server 100% of traffic.

:raises: ModelDeploymentException if canary deployment doesn’t exist.

undeploy

 | undeploy() -> None

Removes existing deployment and canary deployment of a model.

:raises: ModelDeploymentException in case the model was not previously deployed

log_data

 | log_data(name: str, uri: str, description: Optional[str] = None, features: Optional[List[str]] = None, version: Optional[str] = None) -> None

Logs an input data set to a model execution.

Arguments:

  • name: Name of the data set.
  • uri: URI of the data set.
  • description: Optional description.
  • features: List of features used.
  • version: Optional version of the data set.

log_metrics

 | log_metrics(metrics: dict, metrics_type: str, uri: Optional[str] = None) -> None

Logs model evaluation metrics to a model execution.

Arguments:

  • metrics: A dictionary of metrics names and their values, e.g. {“accuracy”, 0.95, “auc”: 0.975}.
  • metrics_type: Evaluation type of the metric: training, testing, validation, or production (for deployed models).
  • uri: Optional URI to the metrics (e.g. log directory).

kaptain.model.frameworks

ModelFramework Objects

class ModelFramework(Enum)

of

 | @staticmethod
 | of(framework: Optional[str]) -> Optional["ModelFramework"]

Converts a framework (string) to a ModelFramework enum.

Arguments:

  • framework: Model framework or library.

Returns:

ModelFramework enum if the framework is supported.

kaptain.model.states

kaptain.hyperparameter

kaptain.hyperparameter.algorithms

Algorithm Objects

class Algorithm(Enum)

of

 | @staticmethod
 | of(algorithm: Optional[str]) -> Optional["Algorithm"]

Converts a hyperparameter tuning algorithm (string) to an Algorithm enum.

Arguments:

  • algorithm: Model framework or library.

Returns:

Algorithm enum if the algorithm is supported.

kaptain.hyperparameter.domains

Double Objects

class Double(Domain)

__init__

 | __init__(min: float, max: float)

Defines a floating-point (double) hyperparameter with domain [min, max]

Arguments:

  • min: Minimum value
  • max: Maximum value

Integer Objects

class Integer(Domain)

__init__

 | __init__(min: int, max: int)

Defines an integer (int) hyperparameter with domain [min, max]

Arguments:

  • min: Minimum value
  • max: Maximum value

Discrete Objects

class Discrete(Domain)

Defines an discrete hyperparameter with a list of possible values of floats

Arguments:

  • values: List of allowed floating-point values

Categorical Objects

class Categorical(Domain)

Defines an integer hyperparameter with a list of possible values of strings

Arguments:

  • values: List of allowed string values

kaptain.platform.config

kaptain.platform.config.provider

ConfigurationProvider Objects

class ConfigurationProvider(ABC)

The ConfigurationProvider interface defines high-level functions for translating user-provided credentials for a Docker registry or cloud buckets into Kubernetes Secrets required for distributed building, training, tuning, and serving components.

FileBasedConfigurationProvider Objects

class FileBasedConfigurationProvider(ConfigurationProvider)

The FileBasedConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from provided configuration file specific for the concrete implementation.

EnvironmentVariableConfigurationProvider Objects

class EnvironmentVariableConfigurationProvider(ConfigurationProvider)

The EnvironmentVariableConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from environment variables specific for the concrete implementation.

kaptain.platform.config.certificates

DockerRegistryCertificateProvider Objects

class DockerRegistryCertificateProvider(FileBasedConfigurationProvider)

__init__

 | __init__(certificate_body: str, ceritifcate_path: Optional[str] = None)

Docker Registry Certificate Provider is a container for private Docker registries running with custom/self-signed TLS certificates which are required for pushing Docker images containing model training code.

Docker Registry Certificate Provider by default loads the configuration from $HOME/.tls/certificate.crt. It is also possible to specify a custom registry certificate.crt location using DockerRegistryCertificateProvider.from_file(path=/path/to/certificate.crt).

Docker Registry certificate.crt file can be created ad-hoc while using a notebook or mounted to the notebook from a Secret. To support mounting of a shared Docker certificate.crt as a volume, the system administrator must create the PodDefault resource with a certificate file to make it available for the user.

Arguments:

  • certificate_body: The configuration string in json format
  • certificate_path: Path to the certificate file (optional)

kaptain.platform.config.docker

DockerConfigurationProvider Objects

class DockerConfigurationProvider(FileBasedConfigurationProvider)

__init__

 | __init__(config_json: str)

Docker Configuration Provider is a container for user Docker configuration which are required for pulling and pushing images used in training and tuning jobs.

Docker Configuration Provider supports standard Docker config.json file of the following format:

    {
        "auths": {
                "https://index.docker.io/v1/": {
                        "auth": "<username and password in base64>"
                }
        }
    }

The auth field is a base64-encoded string of the form “:” where and are the actual username and password used to login to Docker registry. To generate value for auth field, use the following command: echo -n "<username>:<password>" | base64.

Docker Configuration Provider by default loads the configuration from $HOME/.docker/config.json. It is also possible to specify a custom config.json location using DockerConfigurationProvider.from_file(path=/path/to/config.json).

Docker config.json file can be created ad-hoc while using a notebook or mounted to the notebook from a Secret. To support mounting of a shared Docker config.json as a volume, the system administrator must create the PodDefault resource with a pre-populated file to make it available for the user.

Arguments:

  • config_json: The configuration string in json format

kaptain.platform.config.defaults

kaptain.platform.config.s3

S3ConfigurationProvider Objects

class S3ConfigurationProvider(FileBasedConfigurationProvider,  EnvironmentVariableConfigurationProvider)

__init__

 | __init__(aws_access_key_id: str, aws_secret_access_key: str, aws_session_token: Optional[str] = None, region_name: str = _DEFAULT_REGION, s3_endpoint: Optional[str] = None, s3_signature_version: Optional[str] = None, s3_force_path_style: bool = False)

S3-specific configuration provider which supports reading configuration from AWS configuration file and from environment variables. The provider can be used as a configuration object, or for convenience resolution of the configuration both on the development side and in containers when configuration is passed in form of environment variables from Kubernetes Secrets.

Constructor arguments represent a subset of [boto3 configuration properties] (https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html) sufficient for kaptain.

Arguments:

  • aws_access_key_id: The access key to authenticate with S3.
  • aws_secret_access_key: The secret key to authenticate with S3.
  • aws_session_token: The session token to authenticate with S3.
  • region_name: The name of AWS region.
  • s3_endpoint: The complete URL of S3 endpoint. This parameter is required when working with non-standard, S3-compatible storage solutions such as MinIO. It should be set to a resolvable address of the running server.
  • s3_signature_version: The signature version when signing requests
  • s3_force_path_style: When enabled, the clients will use path style instead of URL style for accessing buckets

get_secret_body

 | get_secret_body() -> Dict[str, str]

Transforms the configuration properties into a dict of environment variables. The resulting dict will be used for creating Kubernetes Secret to securely share access credentials between containers.

Returns:

dict of environment variables with associated values