kaptain.config

Config Objects

class Config()
CODE
__init__
 | __init__(docker_config_provider: ConfigurationProvider, storage_config_provider: ConfigurationProvider, docker_registry_url: Optional[str] = None, docker_registry_certificate_provider: Optional[ConfigurationProvider] = None, base_dir: str = os.getcwd(), base_model_storage_uri: str = "s3://kaptain/models")
CODE

Encapsulates platform-specific configuration such as access credentials or AWS endpoints. Config is provided as an argument to the Model and is used to instantiate concrete implementations of lower-level components based on its properties so that users work with a configuration-based API when it comes to fine-tuning the workloads.

Arguments:

  • docker_config_provider: the configuration provider for Docker registry.

  • storage_config_provider: the configuration provider for blob storage access. Currently, only S3 and MinIO are supported.

  • docker_registry_url: private custom Docker registry URL to use with provided TLS certificates.

  • docker_registry_certificate_provider: the configuration provider for Docker registry certificate.

  • base_dir: base directory to use for referencing relative file paths of model files. Defaults to current working directory.

  • base_model_storage_uri: name of a bucket in the remote storage (MinIO or S3) to store the model. Defaults to ‘s3://kaptain/models’

kaptain.envs

_M Objects

class _M(types.ModuleType)
CODE

the environment variables that can change anytime by the user

VERBOSE
 | @property
 | VERBOSE() -> bool
CODE

this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not

VERBOSE
 | @VERBOSE.setter
 | VERBOSE(value: bool) -> None
CODE

this environment variable (KAPTAIN_SDK_VERBOSE) will enable showing pod logs unless overridden to not

DEBUG
 | @property
 | DEBUG() -> bool
CODE

this environment variable (KAPTAIN_SDK_DEBUG) will show stacktrace for uncaught exceptions

LOG_TIMEFORMAT
 | @property
 | LOG_TIMEFORMAT() -> str
CODE

this environment variable (KAPTAIN_SDK_LOG_TIMEFORMAT) will set the time format to show in logs

DOCKER_BUILDER_CPU_LIMIT
 | @DOCKER_BUILDER_CPU_LIMIT.setter
 | DOCKER_BUILDER_CPU_LIMIT(cpu_limit: str) -> None
CODE

set cpu limit resource for image builder job

DOCKER_BUILDER_MEM_LIMIT
 | @DOCKER_BUILDER_MEM_LIMIT.setter
 | DOCKER_BUILDER_MEM_LIMIT(mem_limit: str) -> None
CODE

set memory limit resource for image builder job

DOCKER_BUILDER_CPU_REQUEST
 | @DOCKER_BUILDER_CPU_REQUEST.setter
 | DOCKER_BUILDER_CPU_REQUEST(cpu_request: str) -> None
CODE

set cpu request resource for image builder job

DOCKER_BUILDER_MEM_REQUEST
 | @DOCKER_BUILDER_MEM_REQUEST.setter
 | DOCKER_BUILDER_MEM_REQUEST(mem_request: str) -> None
CODE

set memory request resource for image builder job

KAPTAIN_SDK_DELETE_EXPERIMENT
 | @KAPTAIN_SDK_DELETE_EXPERIMENT.setter
 | KAPTAIN_SDK_DELETE_EXPERIMENT(value: bool) -> None
CODE

Delete the experiment resource upon the completion of the tuning step. Note: once the experiment is deleted, it won’t be available for viewing in the Katib UI

KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED
 | @KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED.setter
 | KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED(ttl_seconds: int) -> None
CODE

Number of seconds after which a completed training job gets automatically deleted.

KAPTAIN_SDK_FORCE_CLEANUP
 | @KAPTAIN_SDK_FORCE_CLEANUP.setter
 | KAPTAIN_SDK_FORCE_CLEANUP(value: bool) -> None
CODE

If set to True, delete completed training jobs automatically ignoring the TTL.

kaptain.exceptions

InvalidModelProperty Objects

class InvalidModelProperty(Exception)
CODE

Raised when a model property is None or blank.

UndefinedModelProperty Objects

class UndefinedModelProperty(Exception)
CODE

Raised when a model property is not defined.

UnsupportedAlgorithmException Objects

class UnsupportedAlgorithmException(Exception)
CODE

Raised when a hyperparameter tuning algorithm is not supported.

UnsupportedModelDeploymentException Objects

class UnsupportedModelDeploymentException(Exception)
CODE

Raised when a model deployment is not supported.

UnsupportedMetricsTypeException Objects

class UnsupportedMetricsTypeException(Exception)
CODE

Raised when a metric type is not supported.

ModelDeploymentException Objects

class ModelDeploymentException(Exception)
CODE

Raised in case of a model deployment failure.

ModelValidationException Objects

class ModelValidationException(Exception)
CODE

Raised in case the model configuration properties are missing or model is in a state that is unsuitable for the operation invoked on the model.

ModelAlreadyExistsException Objects

class ModelAlreadyExistsException(Exception)
CODE

Raised in case the model configuration file already exists in the location.

ImageBuildException Objects

class ImageBuildException(Exception)
CODE

Raised in case of a image build failure.

WorkloadDeploymentError Objects

class WorkloadDeploymentError(Exception)
CODE

Raised in case of a workload deployment failure, e.g. failed scheduling

kaptain.utils

diagnose
diagnose() -> None
CODE

List all managed resources for the current namespace: TfJobs, PyTorchJobs, Experiments and Inference Services, Secrets, Pods, and Service Accounts

list_jobs
list_jobs() -> None
CODE

List all training jobs in current namespace.

delete_job
delete_job(name: str, kind: Optional[str] = None) -> None
CODE

Deletes a training job based on provided name and kind.

Arguments:

  • name: job name

  • kind: job kind (optional), e.g. “tfjob”, “pytorchjob” of “job”.

list_experiments
list_experiments() -> None
CODE

Lists Katib experiments.

delete_experiment
delete_experiment(name: str) -> None
CODE

Deletes Katib experiment.

Arguments:

  • name: Name of the experiment

list_inference_services
list_inference_services() -> None
CODE

Lists deployed inference services.

delete_inference_service
delete_inference_service(name: str) -> None
CODE

Deletes inference service.

Arguments:

  • name: Name of the inference service

delete_jobs
delete_jobs(force: bool = False) -> None
CODE

Deletes all training jobs created by the SDK.

Arguments:

  • force: If True, delete all (even running) jobs created by the SDK, otherwise, delete only completed jobs.

delete_experiments
delete_experiments(force: bool = False) -> None
CODE

Deletes all experiments created by the SDK.

Arguments:

  • force: If True, delete all (even running) experiments created by the SDK, otherwise, delete only completed experiments.

delete_inference_services
delete_inference_services(force: bool = False) -> None
CODE

Delete all inference services created by the SDK.

Arguments:

inference services with ‘NotReady’ status.

  • force: If True, delete all (even already deployed) inference services created by the SDK, otherwise, delete

Returns:

clean
clean(force: bool = False) -> None
CODE

Deletes stale resources (such as Secrets and ServiceAccounts which are not used by any workloads).

WARNING: Use with caution! To prevent data loss, please first run this method without any arguments set and check whether the resources proposed by the method can be safely deleted.

Arguments:

  • force: If False, method only prints unused resource names without actually removing them.

clean_all
clean_all(force: bool = False) -> None
CODE

Deletes all completed workloads and stale Kubernetes resources created by the SDK.

Arguments:

delete only completed workloads and prints stale resources (secrets and service accounts).

  • force: If True, delete all (even running) workloads and resources created by the SDK, otherwise

Returns:

list_all_resources
list_all_resources() -> None
CODE

Lists all deployed resources: TfJobs, PyTorchJobs, Experiments and Inference Services.

delete_resource
delete_resource(kind: str, name: str) -> None
CODE

Deletes inference service.

Arguments:

  • name: Name of the resource

  • kind: Kind of the resource - one of “tfjob”, “pytorchjob”, “experiment” or “inferenceservice”.

kaptain.model

kaptain.model.models

Model Objects

class Model()
CODE
__init__
 | __init__(id: str, name: str, description: str, version: str, framework: ModelFramework, framework_version: str, main_file: str, image_name: str, base_image: str, extra_files: Optional[List[str]] = None, requirements: Optional[str] = None, labels: Optional[List[str]] = None, config: Optional[Config] = None, serving_config: Optional[Dict[str, str]] = None)
CODE

A representation of a machine learning model.

When the model is created for the first time, its internal revision is set to a random UUID and its internal state is “untrained”. Once the model is trained or tuned, its state will be updated accordingly, hyperparameter values set, its revision refreshed, and it can be saved or deployed. Each action (train, tune, deploy) alters the revision and is stored in the model tracking database.

Arguments:

Details on the format can be found here: https://pip.pypa.io/en/stable/cli/pip_install/ requirements-file-format.

  • id: Unique identifier of model, e.g. “dev/mnist”. It is recommended to include the stage of the model (e.g. dev/prod) in the name to make it easier to filter models under active development and in production.

  • name: Short name of the model, e.g. “MNIST”. This name is visible in the model tracking database.

  • description: Description of the model, e.g. “Digit recognition for MNIST data set”. This description is visible in the model tracking database.

  • version: Model version, e.g. “4.5”

  • main_file: Main (Python) file that contains the executable model code, e.g. “trainer.py”.

  • image_name: Name of the repository to push the resulting image, e.g. ‘kaptain/mnist’ Can also contain image tag, e.g. “kaptain/mnist:0.0.1-tensorflow-2.2.0”.

  • extra_files: Auxiliary files, e.g. [“utils.py”, “data_loader.py”].

  • requirements: Path to the file with additional python packages to install into the image in pip compatible format (e.g. “requirements.txt”).

  • framework: Machine learning library or framework used for the model, e.g. “tensorflow”.

  • framework_version: Machine learning library or framework version used by model, e.g. “2.3.2”

  • base_image: Base container image, e.g. “tensorflow-2.3.2”

  • labels: Custom labels for deployment-related metadata, e.g. “dev/mnist-tensorflow”

  • config: Configuration object used for configuring access to Docker registries and blob storage.

  • serving_config: Configuration specific to model servers

hyperparameters
 | @property
 | hyperparameters() -> Optional[Dict[str, Any]]
CODE

Hyperparameters of the model as defined through an action:

  • Train: uses the static values provided to the training procedure.

  • Tune: extracts the recommended values after running multiple experiments.

metrics
 | @property
 | metrics() -> Optional[Dict[str, Dict[str, float]]]
CODE

Metrics of the model when tuning is run

build
 | build(verbose: Optional[bool] = None) -> bool
CODE

Builds a Docker image with the model training code and dependencies and publishes it to the registry specified in the configuration. Label with checksum of the model’s content will be included in the image. Image rebuilding is triggered only if an image with the same name and checksum is not already present in the registry.

Arguments:

  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).

Returns:

True if successful, otherwise False

train
 | train(hyperparameters: Dict[str, Any], args: Optional[Dict[str, Any]] = None, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, workers: int = 2, verbose: Optional[bool] = None, ttl_seconds_after_finished: Optional[int] = None, force_cleanup: Optional[bool] = None, timeout: Optional[int] = constants.DEFAULT_TIMEOUT_SECONDS) -> bool
CODE

Train a model in a distributed manner.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters::

model.train(workers=1, cpu=1, memory="2G", gpus=0)
CODE

… the model training process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter::

model.train(workers=workers, resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))
CODE

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

Can be set via ‘KAPTAIN_SDK_TTL_SECONDS_AFTER_FINISHED’ environment variable.

  • args: Arguments to be passed to the training function.

  • hyperparameters: Dictionary of hyperparameter values.

  • workers: Number of parallel workers to use (default: 2).

  • gpus: Number of GPUs to use (default: 0).

  • memory: Amount of memory for each worker (optional),

  • cpu: Number of CPUs to use for each worker (optional).

  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).

  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).

  • ttl_seconds_after_finished: Number of seconds after which a completed training job gets automatically deleted.

  • force_cleanup: If set to True, delete completed training jobs automatically ignoring the TTL (can also be set via ‘KAPTAIN_SDK_FORCE_CLEANUP’ environment variable).

  • timeout: Number of seconds to wait for the training job to complete before timing-out

Returns:

True if successful, otherwise False

tune
 | tune(hyperparameters: Dict[str, Domain], objectives: List[str], objective_goal: Optional[float] = None, objective_type: str = "maximize", workers: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, trials: int = 16, parallel_trials: int = 2, failed_trials: int = 4, algorithm: Optional[str] = Algorithm.RANDOM.value, algorithm_setting: Optional[dict] = None, args: Optional[Dict[str, Any]] = None, verbose: Optional[bool] = None, delete_experiment: Optional[bool] = None, ttl_seconds_after_finished: Optional[int] = None, timeout: Optional[int] = constants.DEFAULT_TIMEOUT_SECONDS) -> bool
CODE

Tunes a model with parallel trials and possibly distributed trials.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters::

model.tune(hyperparameters=params, objectives=objectives, cpu=1, memory="2G", gpus=0)
CODE

… the deployed tuning process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter::

model.tune(
  hyperparameters=params,
  objectives=objectives,
  resources=Resources(cpu_request=1, memory_limit="2G", gpu_limit=gpus))
CODE

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

  • args: Arguments to be passed to the the experiment trial specification.

  • hyperparameters: Dictionary of hyperparameters and their specified domains.

  • objectives: List of metrics to track in order of importance. The first one listed is used in conjunction with the objective goal and type.

  • objective_goal: Main objective’s goal, which when reached causes the tuning to stop. The main objective is the first element in objectives. If None, the tuning will continue until the maximum number of trials has been reached.

  • objective_type: Whether to “maximize” or “minimize” the main objective’s value (default: maximize).

  • workers: Number of parallel workers to use for each trial (default: 2).

  • gpus: Number of GPUs to use (default: 0).

  • memory: Amount of memory for each worker (optional),

  • cpu: Number of CPUs to use for each worker (optional).

  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).

  • trials: Maximum number of trials (default: 16).

  • parallel_trials: Maximum number of trials to run in parallel (default: 2).

  • failed_trials: Maximum number of failed trials before hyperparameter tuning stops (default: 4).

  • algorithm: Algorithm to use for hyperparameter search (default: random).

  • algorithm_setting: Algorithm settings. Please see https://www.kubeflow.org/docs/components/katib/experiment/ for details.

  • verbose: Enable verbose output (can also be set via environment variable KAPTAIN_SDK_VERBOSE).

  • delete_experiment: Delete the experiment resource upon the completion of the tuning step. Can be set via ‘KAPTAIN_SDK_DELETE_EXPERIMENT’ environment variable. Note: once the experiment is deleted, it won’t be available for viewing in the Katib UI.

  • ttl_seconds_after_finished: Number of seconds after which a completed training job gets automatically deleted.

  • timeout: Number of seconds to wait for the experiment to complete before timing-out.

Returns:

True if successful, otherwise False

deploy
 | deploy(model_uri: Optional[str] = None, autoscale: int = 2, gpus: Optional[int] = None, cpu: Optional[str] = None, memory: Optional[str] = None, resources: Optional[Resources] = None, replace: bool = False, transformer: Optional[Transformer] = None, **kwargs: str, ,) -> bool
CODE

Deploys a model.

Simple / advanced resource API

Resources may be specified via the ‘simple’ resource parameters:

model.deploy(model_uri=uri, cpu=1, memory=“2G”, gpus=0)
CODE

… the deployed model process will have both the request and limit set for all resource parameters.

More fine-grained resource specification is possible via the ‘resources’ parameter:

model.deploy(model_uri=uri, resources=Resources(cpu_request=1, memory_limit=“2G”, gpu_limit=gpus))
CODE

It is illegal to specify both the ‘resources’ parameter or any ‘simple’ resource parameters (gpus, memory, cpu).

Arguments:

  • model_uri: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.

  • autoscale: Target concurrency (default: 2).

  • gpus: Number of GPUs to use (default: 0).

  • memory: Amount of memory for each worker (optional),

  • cpu: Number of CPUs to use for each worker (optional).

  • resources: Advanced API for resource specification. Do not use in tandem with the parameters gpus, memory and cpu (optional).

  • replace: Safety flag to avoid accidental redeployment of the model. If True, the previously deployed model will be replaced. If False, an error will be logged in case the model had been previously deployed.

  • transformer: Transformer is an component which does pre/post processing alongside with model inference. It usually takes raw input and transforms them to the input tensors model server expects.

  • kwargs: Keyword arguments for the deployment.

Returns:

True if successful, otherwise False

deploy_canary
 | deploy_canary(canary_traffic_percentage: int, model_uri: Optional[str] = None, **kwargs: str, ,) -> None
CODE

Deploys a model in a canary with a pre-determined percentage of traffic. A canary deployment allows a model to be run in parallel with a baseline or previous model revision. This allows traffic to be split, so the latest revision can be checked for possible issues with model (e.g. compared to the baseline) or system (e.g. latency) performance. To deploy a model to the canary, a previously deployed model revision must exist.


To deploy canary with 30 percent traffic:

model.deploy_canary(canary_traffic_percentage=30)

To change the canary traffic percentage to 50 (half the traffic):

model.deploy_canary(canary_traffic_percentage=50)

To deploy canary with 30 percent traffic and specified saved model location:

model.deploy_canary(canary_traffic_percentage=30, model_uri=uri)

To change the canary traffic percentage to 50 (half the traffic) for a model deployed from a specified saved location:

model.deploy_canary(canary_traffic_percentage=50, model_uri=uri)

Arguments:

  • canary_traffic_percentage: the percentage of traffic to route to the canary model.

  • model_uri: URI of the saved model to be loaded. If None, the default location managed by Kaptain is chosen based on the most recent state of the model.

rollback_canary
 | rollback_canary() -> None
CODE

Undeploy the model from canary and switch 100% of traffic to the previously deployed baseline model.

:raises: ModelDeploymentException if canary deployment doesn’t exist.

promote_canary
 | promote_canary() -> None
CODE

Promote the model from canary to server 100% of traffic.

:raises: ModelDeploymentException if canary deployment doesn’t exist.

undeploy
 | undeploy() -> None
CODE

Removes existing deployment and canary deployment of a model.

:raises: ModelDeploymentException in case the model was not previously deployed

log_data
 | log_data(name: str, uri: str, description: Optional[str] = None, features: Optional[List[str]] = None, version: Optional[str] = None) -> None
CODE

Logs an input data set to a model execution.

Arguments:

  • name: Name of the data set.

  • uri: URI of the data set.

  • description: Optional description.

  • features: List of features used.

  • version: Optional version of the data set.

log_metrics
 | log_metrics(metrics: dict, metrics_type: str, uri: Optional[str] = None) -> None
CODE

Logs model evaluation metrics to a model execution.

Arguments:

  • metrics: A dictionary of metrics names and their values, e.g. {“accuracy”, 0.95, “auc”: 0.975}.

  • metrics_type: Evaluation type of the metric: training, testing, validation, or production (for deployed models).

  • uri: Optional URI to the metrics (e.g. log directory).

meta
 | meta() -> ModelMeta
CODE

Creates an immutable snapshot of model properties.

Returns:

ModelMeta data class with a copy of all current model field values

save_as_json
 | save_as_json(path: str = "", overwrite: bool = False) -> None
CODE

Provides a way to save the model as JSON. Uses the storage_config_provider ConfigurationProvider from the model’s config to save to.

Arguments:

default location. will raise an exception if the model already exists.

  • path: Optional path to save the model to. If not provided, the model will be saved to the

  • overwrite: If True, will overwrite the model json file if it already exists. If False,

Returns:

None

load_from_json
 | @classmethod
 | load_from_json(cls, model_uri: str, config: Config) -> "Model"
CODE

Load the instance of the Model class from model_uri location.

Arguments:

  • model_uri: model location in the object storage, e.g. ‘s3://kaptain/models/mnist’.

  • config: model configuration, such as S3 credentials.

Returns:

new instance of the Model class.

kaptain.model.frameworks

kaptain.model.states

kaptain.hyperparameter

kaptain.hyperparameter.algorithms

Algorithm Objects

class Algorithm(Enum)
CODE
of
 | @staticmethod
 | of(algorithm: Optional[str]) -> Optional["Algorithm"]
CODE

Converts a hyperparameter tuning algorithm (string) to an Algorithm enum.

Arguments:

  • algorithm: Model framework or library.

Returns:

Algorithm enum if the algorithm is supported.

kaptain.hyperparameter.domains

Double Objects

class Double(Domain)
CODE
__init__
 | __init__(min: float, max: float)
CODE

Defines a floating-point (double) hyperparameter with domain [min, max]

Arguments:

  • min: Minimum value

  • max: Maximum value

Integer Objects

class Integer(Domain)
CODE
__init__
 | __init__(min: int, max: int)
CODE

Defines an integer (int) hyperparameter with domain [min, max]

Arguments:

  • min: Minimum value

  • max: Maximum value

Discrete Objects

class Discrete(Domain)
CODE

Defines an discrete hyperparameter with a list of possible values of floats

Arguments:

  • values: List of allowed floating-point values

Categorical Objects

class Categorical(Domain)
CODE

Defines an integer hyperparameter with a list of possible values of strings

Arguments:

  • values: List of allowed string values

kaptain.platform.config

kaptain.platform.config.provider

ConfigurationProvider Objects

class ConfigurationProvider(ABC)
CODE

The ConfigurationProvider interface defines high-level functions for translating user-provided credentials for a Docker registry or cloud buckets into Kubernetes Secrets required for distributed building, training, tuning, and serving components.

FileBasedConfigurationProvider Objects

class FileBasedConfigurationProvider(ConfigurationProvider)
CODE

The FileBasedConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from provided configuration file specific for the concrete implementation.

EnvironmentVariableConfigurationProvider Objects

class EnvironmentVariableConfigurationProvider(ConfigurationProvider)
CODE

The EnvironmentVariableConfigurationProvider defines a factory method for creating instances of ConfigurationProvider from environment variables specific for the concrete implementation.

kaptain.platform.config.certificates

DockerRegistryCertificateProvider Objects

class DockerRegistryCertificateProvider(FileBasedConfigurationProvider)
CODE
__init__
 | __init__(certificate_body: str, certificate_path: Optional[str] = None)
CODE

Docker Registry Certificate Provider is a container for private Docker registries running with custom/self-signed TLS certificates which are required for pushing Docker images containing model training code.

Docker Registry Certificate Provider by default loads the configuration from $HOME/.tls/certificate.crt. It is also possible to specify a custom registry certificate.crt location using DockerRegistryCertificateProvider.from_file(path=/path/to/certificate.crt).

Docker Registry certificate.crt file can be created ad-hoc while using a notebook or mounted to the notebook from a Secret. To support mounting of a shared Docker certificate.crt as a volume, the system administrator must create the PodDefault resource with a certificate file to make it available for the user.

Arguments:

  • certificate_body: The configuration string in json format

  • certificate_path: Path to the certificate file (optional)

kaptain.platform.config.docker

DockerConfigurationProvider Objects

class DockerConfigurationProvider(FileBasedConfigurationProvider)
CODE
__init__
 | __init__(config_json: str)
CODE

Docker Configuration Provider is a container for user Docker configuration which are required for pulling and pushing images used in training and tuning jobs.

Docker Configuration Provider supports standard Docker config.json file of the following format:

    {
        "auths": {
                "https://index.docker.io/v1/": {
                        "auth": "<username and password in base64>"
                }
        }
    }
CODE

Copy

The auth field is a base64-encoded string of the form “:” where and are the actual username and password used to login to Docker registry. To generate value for auth field, use the following command: echo -n "<username>:<password>" | base64.

Docker Configuration Provider by default loads the configuration from $HOME/.docker/config.json. It is also possible to specify a custom config.json location using DockerConfigurationProvider.from_file(path=/path/to/config.json).

Docker config.json file can be created ad-hoc while using a notebook or mounted to the notebook from a Secret. To support mounting of a shared Docker config.json as a volume, the system administrator must create the PodDefault resource with a pre-populated file to make it available for the user.

Arguments:

  • config_json: The configuration string in json format

kaptain.platform.config.defaults

kaptain.platform.config.s3

S3ConfigurationProvider Objects

class S3ConfigurationProvider(FileBasedConfigurationProvider,  EnvironmentVariableConfigurationProvider)
CODE

__init__
 | __init__(aws_access_key_id: str, aws_secret_access_key: str, aws_session_token: Optional[str] = None, region_name: str = _DEFAULT_REGION, s3_endpoint: Optional[str] = None, s3_signature_version: Optional[str] = None, s3_force_path_style: bool = False)
CODE

S3-specific configuration provider which supports reading configuration from AWS configuration file and from environment variables. The provider can be used as a configuration object, or for convenience resolution of the configuration both on the development side and in containers when configuration is passed in form of environment variables from Kubernetes Secrets.

Constructor arguments represent a subset of [boto3 configuration properties] (https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html ) sufficient for kaptain.

Arguments:

  • aws_access_key_id: The access key to authenticate with S3.

  • aws_secret_access_key: The secret key to authenticate with S3.

  • aws_session_token: The session token to authenticate with S3.

  • region_name: The name of AWS region.

  • s3_endpoint: The complete URL of S3 endpoint. This parameter is required when working with non-standard, S3-compatible storage solutions such as MinIO. It should be set to a resolvable address of the running server.

  • s3_signature_version: The signature version when signing requests

  • s3_force_path_style: When enabled, the clients will use path style instead of URL style for accessing buckets

get_secret_body
 | get_secret_body() -> Dict[str, str]
CODE

Transforms the configuration properties into a dict of environment variables. The resulting dict will be used for creating Kubernetes Secret to securely share access credentials between containers.

Returns:

dict of environment variables with associated values