Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade to zeppelin-0.8.0. Old one is unavailable #5

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 15 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,23 @@
[![Gitter chat](https://badges.gitter.im/gitterHQ/gitter.png)](https://gitter.im/big-data-europe/Lobby)

very-very simple project. Just a docker-compose infrastructure + hello world for Spark
# Docker Zeppelin

This repository contains [Apache Zeppelin](https://zeppelin.apache.org/) docker image, which is tuned to work with BDE clusters.

# Example Usage

For example usage see [docker-compose.yml](./docker-compose.yml) and [SANSA-Notebooks repository](https://github.com/SANSA-Stack/SANSA-Notebooks).

# Dev
Start Hadoop/Spark cluster with Zeppelin notebook:
```
make up
```
Tear down Hadoop/Spark cluster with Zeppelin notebook:
```
make down
```
Bash into Zeppelin container:
docker-compose up sbt # will compile scala project into ja file /jars/sparkjob_2.11-0.1.jar
docker-compose run spark-master /spark/bin/spark-submit --verbose --master local /jars/sparkjob_2.11-0.1.jar
```
make bash
```
Build and run Zeppelin separately:
```
make up
docker stop dockerzeppelin_zeppelin_1 && docker rm dockerzeppelin_zeppelin_1
make run
```
Build Zeppelin:
```
make build

# Output
```
...
18/12/29 23:01:41 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3af37506{/metrics/json,null,AVAILABLE,@Spark}
############################
############################
########HELLO WORLD#########
############################
############################
18/12/29 23:01:41 INFO server.ServerConnector: Stopped Spark@759fad4{HTTP/1.1}{0.0.0.0:4040}
...
```
For more details see the Makefile.
15 changes: 13 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
version: "2.1"

services:
sbt:
image: hseeberger/scala-sbt
volumes:
- ./scala/:/scala/
command: bash -c "cd /scala/sparkjob/ && sbt clean package"

namenode:
image: bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8
container_name: namenode
Expand All @@ -12,6 +18,9 @@ services:
healthcheck:
interval: 5s
retries: 100
depends_on:
sbt:
condition: service_started
networks:
- spark-net
datanode:
Expand All @@ -33,7 +42,7 @@ services:
image: bde2020/spark-master:2.1.0-hadoop2.8-hive-java8
container_name: spark-master
ports:
- "8080:8080"
- "9080:8080"
- "7077:7077"
environment:
- CORE_CONF_fs_defaultFS=hdfs://namenode:8020
Expand All @@ -42,6 +51,8 @@ services:
condition: service_healthy
datanode:
condition: service_healthy
volumes:
- ./scala/sparkjob/target/scala-2.11/:/jars/
healthcheck:
interval: 5s
retries: 100
Expand All @@ -64,7 +75,7 @@ services:
zeppelin:
build: ./zeppelin
ports:
- 80:8080
- 81:8080
volumes:
- ./notebook:/opt/zeppelin/notebook
environment:
Expand Down
1 change: 1 addition & 0 deletions scala/sparkjob/build.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sbt.version = 0.13.17
20 changes: 20 additions & 0 deletions scala/sparkjob/build.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
resolvers += Resolver.mavenLocal


name := "sparkjob"

version := "0.1"

scalaVersion := "2.11.12"
val sparkVersion = "2.4.0"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
// "org.apache.spark" %% "spark-mllib" % sparkVersion,
// "org.apache.spark" %% "spark-streaming" % sparkVersion,
// "org.apache.spark" %% "spark-hive" % sparkVersion,
"mysql" % "mysql-connector-java" % "5.1.6"
)
1 change: 1 addition & 0 deletions scala/sparkjob/project/build.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sbt.version = 1.2.7
1 change: 1 addition & 0 deletions scala/sparkjob/src/build.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sbt.version = 0.13.17
26 changes: 26 additions & 0 deletions scala/sparkjob/src/main/scala/com/scals/arbuzov/SparkJob.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
package com.scals.arbuzov
// import required spark classes
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SparkJob {

def execute() {

// initialise spark context
val conf = new SparkConf().setAppName("HelloWorld")
val sc = new SparkContext(conf)

// do stuff
println("############################")
println("############################")
println("########HELLO WORLD#########")
println("############################")
println("############################")

// terminate spark context
sc.stop()

}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package com.scals.arbuzov

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._

object SparkJobRunner {

def main(args: Array[String]) {

// Run the word count
SparkJob.execute()

// Exit with success
System.exit(0)
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
package com.scals.arbuzov

import org.apache.spark.sql.SparkSession

trait SparkSessionWrapper {

lazy val spark: SparkSession = {
SparkSession
.builder()
.master("local")
.appName("sparkjob")
.getOrCreate()
}

}
2 changes: 1 addition & 1 deletion zeppelin/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ENV ZEPPELIN_VERSION 0.7.2

RUN apt-get update && apt-get install wget
RUN set -x \
&& curl -fSL "http://www-eu.apache.org/dist/zeppelin/zeppelin-0.7.2/zeppelin-0.7.2-bin-all.tgz" -o /tmp/zeppelin.tgz \
&& curl -fSL "http://www-eu.apache.org/dist/zeppelin/zeppelin-0.8.0/zeppelin-0.8.0-bin-all.tgz" -o /tmp/zeppelin.tgz \
&& tar -xzvf /tmp/zeppelin.tgz -C /opt/ \
&& mv /opt/zeppelin-* /opt/zeppelin \
&& rm /tmp/zeppelin.tgz
Expand Down