Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Passing Tolerations to executor pods #240

Open
simonhampe opened this issue May 5, 2023 · 3 comments
Open

Feature request: Passing Tolerations to executor pods #240

simonhampe opened this issue May 5, 2023 · 3 comments

Comments

@simonhampe
Copy link

Affected version

23.4.0

Current and expected behavior

We are deploying Stackable on Azure with AKS using Helm/Terraform. We have successfully run SparkApplications on the default node pool. However, we would like to be able to deploy executors in a second node pool containing only Spot instances.
In Azure, all Spot instance node pools automatically get the taint kubernetes.azure.com/scalesetpriority=spot:NoSchedule (even if we do not specify it in the Terraform file, this taint is apparently mandatory).
Now, I can specify nodeAffinity to match the spot instances' labels, but I haven't found a way to pass tolerations. The helm chart for the Spark operator has a "tolerations" variable and I tried passing the right toleration there (as specified here), but it had no effect:
The executors will not schedule, since their affinity does not match the default node pool and they have no toleration for the spot

Is there a way to pass tolerations in a SparkApplication that I have just overlooked? If not: I think this would be a fairly relevant feature for pod placement. Are there any plans to implement this?

@razvan
Copy link
Member

razvan commented May 5, 2023

Hey, thanks for your report. You are correct that it's currently not possible to define tolerations on Spark executor (or driver) pods.

The spot use-case sounds reasonable and we'll look into it.

@lfrancke
Copy link
Member

lfrancke commented May 5, 2023

As Razvan said: It's currently not possible but we track this over here stackabletech/issues#385 and we have another customer who asked for this so we will try to get it into the next release.

@razvan
Copy link
Member

razvan commented Aug 18, 2023

Hey,

starting with the release 23.7 you can specify pod overrides for all SparkApplication pods.

Below is a simple example that demonstrates how to prevent Spark processes from running on "monitoring" nodes:

  job:
    podOverrides:
      spec:
        tolerations:
          - key: "monitor"
            value: "true"
            operator: "Equal"
            effect: "NoSchedule"
  driver:
    podOverrides:
      spec:
        tolerations:
          - key: "monitor"
            value: "true"
            operator: "Equal"
            effect: "NoSchedule"
  executor:
    podOverrides:
      spec:
        tolerations:
          - key: "monitor"
            value: "true"
            operator: "Equal"
            effect: "NoSchedule"

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants