Skip to content

Latest commit

 

History

History
68 lines (53 loc) · 3.58 KB

README.md

File metadata and controls

68 lines (53 loc) · 3.58 KB

Spark UI

Launching the Spark History Server and Viewing the Spark UI Using Docker

If you prefer local access (not to have EC2 instance for Apache Spark history server), you can also use a Docker to start the Apache Spark history server and view the Spark UI locally. This Dockerfile is a sample that you should modify to meet your requirements.

Pre-requisite

  • Install Docker

Build docker image

  1. Download the Dockerfile and the pom file from the GitHub repository
  2. Run commands shown below
        $ docker build -t glue/sparkui:latest . 
    

Start the Spark history server

Using AWS named profile

  1. Run commands shown below
    • Set LOG_DIR by replacing s3a://path_to_eventlog with your event log directory
    • Set PROFILE_NAME with your AWS named profile
        $ LOG_DIR="s3a://path_to_eventlog/"
        $ PROFILE_NAME="profile_name"
        $ docker run -itd -v ~/.aws:/root/.aws -e AWS_PROFILE=$PROFILE_NAME -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR  -Dspark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
    

Using a pair of AWS access key and secret key

  1. Run commands shown below
    • Set LOG_DIR by replacing s3a://path_to_eventlog with your event log directory
    • Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with your valid AWS credential
        $ LOG_DIR="s3a://path_to_eventlog/"
        $ AWS_ACCESS_KEY_ID="AKIAxxxxxxxxxxxx"
        $ AWS_SECRET_ACCESS_KEY="yyyyyyyyyyyyyyy"
        $ docker run -itd -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID -Dspark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
    

Using AWS temporary credentials

  1. Run commands shown below
    • Set LOG_DIR by replacing s3a://path_to_eventlog with your event log directory
    • Set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN with your valid AWS credential
        $ LOG_DIR="s3a://path_to_eventlog/"
        $ AWS_ACCESS_KEY_ID="ASIAxxxxxxxxxxxx"
        $ AWS_SECRET_ACCESS_KEY="yyyyyyyyyyyyyyy"
        $ AWS_SESSION_TOKEN="zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
        $ docker run -itd -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID -Dspark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -Dspark.hadoop.fs.s3a.session.token=$AWS_SESSION_TOKEN -Dspark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
    

These configuration parameters come from the Hadoop-AWS Module. You may need to add specific configuration based on your use case. For example: users in isolated regions will need to configure the spark.hadoop.fs.s3a.endpoint.

For Beijing region, add following config:

-Dspark.hadoop.fs.s3a.endpoint=s3.cn-north-1.amazonaws.com.cn

For Ningxia region, add following config:

-Dspark.hadoop.fs.s3a.endpoint=s3.cn-northwest-1.amazonaws.com.cn

View the Spark UI using Docker

  1. Open http://localhost:18080 in your browser