Deploy Kaptain for model inferencing

Kubeflow provides various tools and operators that simplify machine learning workflows. All these components require additional cluster resources to install and operate properly. In some cases, models can be trained and tuned on one cluster (or training cluster), and then deployed on other clusters (or deployment clusters). For example, you can deploy to a cluster that runs your business-specific applications, or a cluster that stores data locally. 

Alternatively, you can deploy a minimal installation of Kaptain on IoT/Egde environments, where resources are limited.

Thanks to a highly-flexible modular architecture, Kaptain components can be disabled based on the target use-case or environment, which allows Kaptain to be deployed in the model inferencing mode by disabling the Kubeflow core components. To minimize the amount of dependencies, KServe can be configured to run in the RawDeployment mode to enable InferenceService deployments with Kubernetes resources instead of using Knative for deploying models.

In this tutorial, you will learn how to install a lightweight version of Kaptain for model inference, as well as deploying the model and making a prediction using either internal or external ingress services.

Prerequisites

Before installing Kaptain, make sure you have the following applications installed on the target cluster:

  • Istio

  • cert-manager

You can choose between two model inferencing methods: Model Inferencing With Local Cluster Gateway if you only need your model to be accessible within the cluster, and Model Inferencing via the External Ingress if you need your model to be accessible within the cluster and from outside the cluster.

Model inferencing with a local cluster gateway

Follow this tutorial if you only need your model to be accessible within the cluster via a local cluster gateway. 

  1. Deploy Kaptain with a customized configuration by enabling KServe only:

    core:
      enabled: false
    ingress:
      enabled: false
    kserve:
      controller:
        deploymentMode: RawDeployment
    CODE
  2. Create a namespace and deploy the example:

    kubectl create ns kserve-test 
    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
    spec:
      predictor:
        model:
          modelFormat:
            name: sklearn
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
    EOF
    CODE
  3. Run the inference from another pod:

    kubectl run curl -n kserve-test --image=curlimages/curl -i --tty -- sh
    # the following commands are run in the “curl” pod
    cat <<EOF > "/tmp/iris-input.json"
    {
      "instances": [
        [6.8,  2.8,  4.8,  1.4], 
        [6.0,  3.4,  4.5,  1.6]
      ]
    }
    EOF
    curl -v http://sklearn-iris-predictor-default/v1/models/sklearn-iris:predict -d @./tmp/iris-input.json
    CODE

    The output should look similar to this:

    *   Trying 10.109.166.118:80...
    * Connected to sklearn-iris-predictor-default (10.109.166.118) port 80 (#0)
    > POST /v1/models/sklearn-iris:predict HTTP/1.1
    > Host: sklearn-iris-predictor-default
    > User-Agent: curl/7.85.0-DEV
    > Accept: */*
    > Content-Length: 76
    > Content-Type: application/x-www-form-urlencoded
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Server: TornadoServer/6.2
    < Content-Type: application/json; charset=UTF-8
    < Date: Thu, 20 Oct 2022 22:12:06 GMT
    < Content-Length: 23
    <
    * Connection #0 to host sklearn-iris-predictor-default left intact
    {"predictions": [1, 1]}
    CODE

Model inferencing via the external ingress

Follow this tutorial if you need your model to be accessible within the cluster with a local cluster gateway, and from outside the cluster via the external load balancer.

  1. Deploy Kaptain with a customized configuration by enabling KServe only:

    core:
      enabled: false
    ingress:
      enabled: false
    kserve:
      controller:
        deploymentMode: RawDeployment
        gateway:
          ingressClassName: istio
    CODE
  2. Create an IngressClass resource. The name should match the ingressClassName set in the previous step:

    kubectl apply -f - <<EOF
    apiVersion: networking.k8s.io/v1
    kind: IngressClass
    metadata:
      name: istio
    spec:
      controller: istio.io/ingress-controller
    EOF
    CODE
  3. Deploy the example:

    kubectl create ns kserve-test 
    kubectl apply -n kserve-test -f - <<EOF
    apiVersion: "serving.kserve.io/v1beta1"
    kind: "InferenceService"
    metadata:
      name: "sklearn-iris"
    spec:
      predictor:
        model:
          modelFormat:
            name: sklearn
          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
    EOF
    CODE
  4. From your local machine, discover the ingress host and port:

    export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath="{.status.loadBalancer.ingress[*]['ip', 'hostname']}")
    export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
    CODE
  5. Run the inference by setting the Host header in the request:

    cat <<EOF > "./iris-input.json"
    {
      "instances": [
        [6.8,  2.8,  4.8,  1.4], 
        [6.0,  3.4,  4.5,  1.6]
      ]
    }
    EOF
    SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./iris-input.json
    CODE

    The output should look similar to this:

    *   Trying 54.148.92.116:80...
    * Connected to af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com (54.148.92.116) port 80 (#0)
    > POST /v1/models/sklearn-iris:predict HTTP/1.1
    > Host: sklearn-iris-kserve-test.example.com
    > User-Agent: curl/7.79.1
    > Accept: */*
    > Content-Length: 76
    > Content-Type: application/x-www-form-urlencoded
    >
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < server: istio-envoy
    < content-type: application/json; charset=UTF-8
    < date: Thu, 20 Oct 2022 22:09:04 GMT
    < content-length: 23
    < x-envoy-upstream-service-time: 2
    <
    * Connection #0 to host af70d19e9*************************-1815165949.us-west-2.elb.amazonaws.com left intact
    {"predictions": [1, 1]}%
    CODE