Feature Store with Feast
Kaptain comes pre-configured with Feast, a feature store that helps bridge the gap between models and data, therefore facilitating the development of ML/AI models and features. Feast is compatible with all types of Kaptain clusters, regardless of the license type, cluster type (managed, attached, Essential), or environment (networked, air-gapped, on-prem).
In Kaptain, Feast is available in your Jupyter notebooks and, among others, it allows you to:
Consolidate and feed data from different data sources into your notebooks and production environments
Ensure consistency between training and serving data
Transform data and share features across teams
When using Feast as a data store, you are able to feed data from different source types to your models for training or serving.
The configuration of Feast as a data store takes place in your notebook directly and is different for each type of data source. For online or streaming data, set up and customize Redis as a key object store. For offline or batch data, set up any data warehouse like BigQuery, an S3 bucket or GCS. Alternatively, you can also use a scalable database-backed registry like MySQL (which is available with Kaptain per default).
Set up Feast to use it in production for Kaptain
Prerequisites
You have deployed Kaptain into a cluster.
Depending on the type of data storage, ensure the data is: accessible from your notebook, the notebook has read access, and that the env variables have been set up correctly in your notebook.
You have defined entities for each one of your data sources. Refer to the Feast tutorial for more information on how to do this.
You obtained MySQL and Redis credentials from the administrator.
OR
The administrator has pre-configured the secrets required in the Inject the Feast configuration into a notebook server section.
Set up a database-backed registry
Kaptain provides a consolidated MySQL cluster with primary-primary replication for storage of the Pipelines execution history and artifacts, and Katib experiment results. It also can be used as a scalable registry for Feast projects. Kaptain notebook images come with pre-installed libraries, allowing Feast to integrate with MySQL registries.
To enable SQL registry for your Feast project, set the following configuration in the feature_store.yaml
file:
project: my_project
registry:
registry_type: sql
path: mysql+pymysql://<user>:<password>@<mysql_host>:<mysql_port>/feast
provider: local
online_store:
type: sqlite
path: data/online_store.db
entity_key_serialization_version: 2
Use Redis as the online store
Kaptain includes a fault-tolerant, distributed, highly-available Redis cluster, that can be used as an online store in Feast. Redis cluster configuration can be found in the Configuration page.
For security reasons, D2iQ recommends changing the default password of the Redis cluster during the installation of Kaptain.
To configure a Feast project to use Redis as an online store, set the following configuration in the feature_store.yaml
file:
project: my_project
registry: data/registry.db
provider: local
online_store:
type: redis
redis_type: redis_cluster
connection_string: "<redis_host>:<redis_port>,password=<password>"
entity_key_serialization_version: 2
Inject the Feast configuration into a notebook server
To connect Feast to your environment as a Feature Store, create Secrets and distribute them to Redis, MySQL, (and the respective user namespaces and Kubeflow profiles). To do so, you require admin rights or knowledge of the Secret’s credentials.
For security and traceability reasons, it is best practice for the Kaptain administrator to be the only entity generating user credentials to databases in MySQL. For Redis, said admin should be the only person distributing the shared password with users.
Create a
Secret
containing the configuration properties for the MySQL and Redis clusters, and update the credentials, if necessary:export USER_NAMESPACE=<user namespace> cat << EOF | kubectl apply -n ${USER_NAMESPACE} -f - apiVersion: v1 kind: Secret metadata: name: feast-conf type: Opaque stringData: MYSQL_HOST: kaptain-mysql-store-haproxy.kubeflow MYSQL_PORT: "3306" REDIS_HOST: redis.kubeflow REDIS_PORT: "6379" FEAST_USAGE: "False" data: MYSQL_USER: <MySQL user name> MYSQL_PASSWORD: <MySQL password> REDIS_PASSWORD: <Redis password> EOF
CODECreate a Feast repository configuration file and store it in a
ConfigMap
:apiVersion: v1 kind: ConfigMap metadata: name: feature-store data: feature_store.yaml: | project: wine registry: registry_type: sql path: mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@${MYSQL_HOST}:${MYSQL_PORT}/feast provider: local online_store: type: redis redis_type: redis_cluster connection_string: "${REDIS_HOST}:${REDIS_PORT},password=${REDIS_PASSWORD}" entity_key_serialization_version: 2
CODECreate a
PodDefault
referencing the previously createdSecret
:cat << EOF | kubectl apply -n ${USER_NAMESPACE} -f - apiVersion: "kubeflow.org/v1alpha1" kind: PodDefault metadata: name: feast-conf spec: selector: matchLabels: feast-conf: "true" desc: "Inject Feast configuration" envFrom: - secretRef: name: feast-conf volumeMounts: - name: feature-store mountPath: <feast-project-dir, e.g. /home/kubeflow/wine> volumes: - name: feature-store configMap: name: feature-store EOF
CODEWhen you create a new notebook server, make the previous Feast configuration properties available via environment variables. To do so, select Configuration and Inject Feast configuration in the Jupyter notebook UI.
The Feature Store is configured and ready to use in the notebook.
More on Feast
For more information on how to use Feast, refer to the Feast documentation.
For an example of how to use Feast with Kaptain in your ML/AI environment, refer to the Feast tutorial.