Spark Operator in a Workspace
How to spin up your Spark Operator
The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, refer to the API Definition. For details on its design, refer to the design documentation. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
The default installation is basic, provide your override configmap to enable desired Spark Operator features.
Install Spark Operator
You can find generic installation instructions for workspace catalog applications on the Application Deployment topic.
Only install the Spark operator once per workspace.
For details on custom configuration for the operator, refer to the Spark Operator Helm Chart documentation.
After you finish the installation, see Spark Operator in a Project custom resource documentation for more information about how to submit your Spark jobs.
Sample Override Configuration File
Ensure you configure the AppDeployment with the appropriate override configmap.
Using UI
CODEpodLabels: owner: john team: operations
Using CLI
See Application Deployment for details.
CODEcat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: namespace: ${WORKSPACE_NAMESPACE} name: spark-operator-overrides data: values.yaml: | configInline: podLabels: owner: john team: operations EOF
Uninstall via the CLI
Uninstalling the Spark Operator does not affect existing SparkApplication and ScheduledSparkApplication custom resources. You need to manually remove any leftover custom resources and CRDs from the operator. refer to deleting Spark Operator custom resources.
Follow these steps:
Uninstall the Spark Operator
AppDeployment
:CODEkubectl -n <your workspace namespace> delete AppDeployment <name of AppDeployment>
Remove the Spark Operator Service Account:
CODE# <name of service account> is spark-operator-service-account if you didn't override the RBAC resources. kubectl -n <your workspace namespace> delete serviceaccounts <name of service account>
Remove the Spark Operator CRDs:
NOTE: The CRDs are not finalized for deletion until you delete the associated custom resources.
CODEkubectl delete crds scheduledsparkapplications.sparkoperator.k8s.io sparkapplications.sparkoperator.k8s.io
Resources
Here are some resources to learn more about Spark Operator: