Skip to main content
Skip table of contents

Spark Operator in a Workspace

How to spin up your Spark Operator

The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design documentation. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.

The default installation is basic, please provide your override configmap to enable desired Spark Operator features.

Install Spark Operator

You can find generic installation instructions for workspace catalog applications on the Application Deployment topic.

Only install the Spark operator once per workspace.

For details on custom configuration for the operator, refer to the Spark Operator Helm Chart documentation.

After you finish the installation, see Spark Operator in a Project custom resource documentation for more information about how to submit your Spark jobs.

Sample Override Configuration File

Ensure you configure the AppDeployment with the appropriate override configmap.

  • Using UI

    CODE
    podLabels:
      owner: john
      team: operations
  • Using CLI

    See Application Deployment for details.

    CODE
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: ${WORKSPACE_NAMESPACE}
      name: spark-operator-overrides
    data:
      values.yaml: |
        configInline:
          podLabels:
            owner: john
            team: operations
    EOF

Uninstall via the CLI

Uninstalling the Spark Operator does not affect existing SparkApplication and ScheduledSparkApplication custom resources. You need to manually remove any leftover custom resources and CRDs from the operator. Please refer to deleting Spark Operator custom resources.

Follow these steps:

  1. Uninstall the Spark Operator AppDeployment:

    CODE
    kubectl -n <your workspace namespace> delete AppDeployment <name of AppDeployment>
  2. Remove the Spark Operator Service Account:

    CODE
    # <name of service account> is spark-operator-service-account if you didn't override the RBAC resources.
    kubectl -n <your workspace namespace> delete serviceaccounts <name of service account>
  3. Remove the Spark Operator CRDs:

    NOTE: The CRDs are not finalized for deletion until you delete the associated custom resources.

    CODE
    kubectl delete crds scheduledsparkapplications.sparkoperator.k8s.io sparkapplications.sparkoperator.k8s.io

Resources

Here are some resources to learn more about Spark Operator:

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.