Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the History Server storage configuration shareable/reusable/discoverable #415

Open
Jimvin opened this issue Jun 20, 2024 · 1 comment

Comments

@Jimvin
Copy link
Member

Jimvin commented Jun 20, 2024

Spark jobs should be able to discover the location to store logs in the Spark History Server. Configuring this for each job is fragile and likely to lead to logs not being persisted after the job has finished. Jobs should be able to discover the location, access keys etc. for log storage.

@sbernauer
Copy link
Member

Just for the record: This is how you share the the log configuration between two SparkApplications currently:

kind: SparkApplication
metadata:
  name: spark-1
spec:
  logFileDirectory:
    s3:
      prefix: eventlogs/
      bucket: # S3BucketDef
        reference: eventlogs-bucket
---
kind: SparkApplication
metadata:
  name: spark-2
spec:
  logFileDirectory:
    s3:
      prefix: eventlogs/
      bucket: # S3BucketDef
        reference: eventlogs-bucket

IIRC in the ADR we decided against SparkApplication simply linking to SparkHistoryServer objects, because then you have problems such as how the application get's credentials to write to the bucket (and probably others).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants