Spark in a Project
Deploying Spark in a project
To run your Spark workloads with Spark Operator, apply the Spark Operator specific custom resources. The Spark Operator works with the following kinds of custom resources:
SparkApplication
ScheduledSparkApplication
See Spark Operator API documentation for more details.
If you need to manage these custom resources and RBAC resources across all clusters in a project, it is recommended you use Project Deployments which enables you to leverage GitOps to deploy the resources. Otherwise, you will need to create the resources manually in each cluster.
Prerequisites
Follow these steps:
Deploy your Spark Operator. See the Spark Operator documentation for more information.
Ensure the necessary RBAC resources referenced in your custom resources exist, otherwise the custom resources can fail. See the Spark Operator documentation for details.
This is an example of commands for you to create the RBAC resources needed in your project namespace:
CODEexport PROJECT_NAMESPACE=<project namespace> kubectl apply -f - <<EOF apiVersion: v1 kind: ServiceAccount metadata: name: spark-service-account namespace: ${PROJECT_NAMESPACE} --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: ${PROJECT_NAMESPACE} name: spark-role rules: - apiGroups: [""] resources: ["pods"] verbs: ["*"] - apiGroups: [""] resources: ["services"] verbs: ["*"] - apiGroups: [""] resources: ["configmaps"] verbs: ["*"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: spark-role-binding namespace: ${PROJECT_NAMESPACE} subjects: - kind: ServiceAccount name: spark-service-account namespace: ${PROJECT_NAMESPACE} roleRef: kind: Role name: spark-role apiGroup: rbac.authorization.k8s.io EOF
Deploy a Simple SparkApplication
Follow these steps:
Create your Project if you don’t already have one.
Set the
PROJECT_NAMESPACE
environment variable to the name of your project’s namespace:CODEexport PROJECT_NAMESPACE=<project namespace>
Set the SPARK_SERVICE_ACCOUNT environment variable to one of the following:
${PROJECT_NAMESPACE}
, if you skipped the step in Prerequisites to create RBAC resources.CODE# This service account is automatically created when you create a project and has access to everything in the project namespace. export SPARK_SERVICE_ACCOUNT=${PROJECT_NAMESPACE}
Or set to
spark-service-account
CODEexport SPARK_SERVICE_ACCOUNT=spark-service-account
Apply the SparkApplication custom resource in your project namespace
CODEkubectl apply -f - <<EOF apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: pyspark-pi namespace: ${PROJECT_NAMESPACE} spec: type: Python pythonVersion: "3" mode: cluster image: "gcr.io/spark-operator/spark-py:v3.1.1" imagePullPolicy: Always mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py sparkVersion: "3.1.1" restartPolicy: type: OnFailure onFailureRetries: 3 onFailureRetryInterval: 10 onSubmissionFailureRetries: 5 onSubmissionFailureRetryInterval: 20 driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.1.1 serviceAccount: ${SPARK_SERVICE_ACCOUNT} executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.1.1 EOF
Clean up
Follow these steps:
View
SparkApplications
in all namespaces:CODEkubectl get sparkapp -A
Deleting a specific
SparkApplication
:CODEkubectl -n ${PROJECT_NAMESPACE} delete sparkapp <name of sparkapplication>