Make the History Server storage configuration shareable/reusable/discoverable #415

Jimvin · 2024-06-20T07:57:29Z

Spark jobs should be able to discover the location to store logs in the Spark History Server. Configuring this for each job is fragile and likely to lead to logs not being persisted after the job has finished. Jobs should be able to discover the location, access keys etc. for log storage.

sbernauer · 2024-09-05T05:49:36Z

Just for the record: This is how you share the the log configuration between two SparkApplications currently:

kind: SparkApplication
metadata:
  name: spark-1
spec:
  logFileDirectory:
    s3:
      prefix: eventlogs/
      bucket: # S3BucketDef
        reference: eventlogs-bucket
---
kind: SparkApplication
metadata:
  name: spark-2
spec:
  logFileDirectory:
    s3:
      prefix: eventlogs/
      bucket: # S3BucketDef
        reference: eventlogs-bucket

IIRC in the ADR we decided against SparkApplication simply linking to SparkHistoryServer objects, because then you have problems such as how the application get's credentials to write to the bucket (and probably others).

maxgruber19 mentioned this issue Aug 31, 2024

dynamically create ingress for running sparkapplication to access spark-ui outside of the k8s cluster #454

Open

lfrancke added the customer-request label Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the History Server storage configuration shareable/reusable/discoverable #415

Make the History Server storage configuration shareable/reusable/discoverable #415

Jimvin commented Jun 20, 2024

sbernauer commented Sep 5, 2024

Make the History Server storage configuration shareable/reusable/discoverable #415

Make the History Server storage configuration shareable/reusable/discoverable #415

Comments

Jimvin commented Jun 20, 2024

sbernauer commented Sep 5, 2024