Setting up a new cluster

Docker Host Machine

Choose a Docker Host either on Server of Cloud VM
E.g: My Docker Host is a Ubuntu 16.04 LTS VM on Microsoft Azure.

Firwall Configuration

If your Docker Host is a UI less server like mine it doesn't have a browser.You will have to configure a public IP/DNS address to access any web url's mapped to host ports from docker containers.

Make sure you open the ports 2181-2183,2888-2890,3888-3890,8042,8080,8081,8088,8188,8793,9090,9092-9094,10002,50070,50075 on host firewall so those url's are accessible.

Prerequisites

Install the following software
- Install Docker
- Docker Volume Plugin : local-persist . This enables to persist data on host even when cluster containers are deleted.
Create folders on Host to map container volumes to host .This is where the volume mountpoints are pointed in docker-compose.yml.
mkdir -p /data1/dockervolumes/postgres/pgdata
mkdir -p /data1/dockervolumes/redis/data
mkdir -p/data1/dockervolumes/airflow/dags
mkdir -p/data1/dockervolumes/airflow/logs
mkdir -p /data1/dockervolumes/hadoop/nn
mkdir -p /data1/dockervolumes/spark/master/logs
mkdir -p /data1/dockervolumes/yarn/hs
mkdir -p /data2/dockervolumes/hadoop/dn1 /data2/dockervolumes/hadoop/dn2 /data2/dockervolumes/hadoop/dn3
mkdir -p /data2/dockervolumes/spark/worker1/logs /data2/dockervolumes/spark/worker2/logs /data2/dockervolumes/spark/worker3/logs mkdir -p /data1/dockervolumes/zookeeper/zknode1/data /data1/dockervolumes/zookeeper/zknode2/data /data1/dockervolumes/zookeeper/zknode3/data
mkdir -p /data1/dockervolumes/zookeeper/zknode1/log /data1/dockervolumes/zookeeper/zknode2/log /data1/dockervolumes/zookeeper/zknode3/log
sudo chown root -R /data1/dockervolumes /data2/dockervolumes
sudo chmod 777 -R /data1/dockervolumes /data2/dockervolumes

The mounts /data1,/data2 is filesystem on your Host.

Create isolated virtual network for cluster nodes.
sudo docker network create bigdatacluster

Configuration

bigdata-cluster.env has required hostname/FQDN configuration for cluster services to talk to each other. If you change hostnames/network alias in docker-compose.yml make sure you update the configurations before launching the containers

Sometimes docker-compose.yml itself has configuration set under environment: section specific to that container.
e.g: spark_worker containers has spark_master URI set.

During launch there configurations are uploaded as Environment variables and necessary configuration file properties are updated as necessary.

Launch the Cluster

Step-1: Build base image

Switch to /base folder
Build the docker image
sudo docker build --label project=bigdata-cluster --tag bigdata-baseimg:0.2 .

Note: You can customise this image if you would like to install some additionals tools/packages. The next step uses this image as a starting point.

Step-2: Build the other images using a docker compose file

Switch back to top level folder
sudo docker-compose --file build-imgs.yml build

Step-3 : Booting up the whole cluster

Run the below command from top level folder sudo docker-compose --project-name=devcluster up -d
To shutdown the cluster the command is sudo docker-compose --projectname=devcluster down

Cluster Resource Management

My Host has 8 cores core and 64GB RAM.

Memory constraints are defined in .env file for each container role. The docker-compose file uses this while configuring the containers.You can modify the .env file as per your host specifications.
Note: Currently no resource constraints are imposed on CPU usage.
If you need to reset files in mounted named volumes during cluster redeployment you can run the script cleanup-volumes.sh.

Cluster Administration via Web UI

Hadoop,YARN, Hive,Spark all offer out of box web interfaces to ease administration and monitoring of the cluster. So I have added a Nginx Reverse proxy container to centralise access those web Url's to your docker Host.

Let's say if your Docker Host has a DNS name of myhostmachine.com now you can access the above url's as follows

Hadoop Namenode: http://myhostmachine.com:50070/dfshealth.html#tab-overview
Hadoop Datanodes:
- datanode1: http://myhostmachine.com:50075/datanode1/
- datanode2: http://myhostmachine.com:50075/datanode2/
- datanode3: http://myhostmachine.com:50075/datanode3/
YARN Resourcemanager: http://myhostmachine:8088/cluster/
YARN Historyserver: http://myhostmachine.com:8188/applicationhistory
YARN Nodes:
- nodemanager1: http://myhostmachine.com:8042/nodemanager1/node/
- nodemanager2: http://myhostmachine.com:8042/nodemanager2/node/
- nodemanager3: http://myhostmachine.com:8042/nodemanager3/node/
Spark master : http://myhostmachine.com:8080/
SparkWorkers:
- sparkworker1: http://myhostmachine.com:8081/sparkworker1/
- sparkworker2: http://myhostmachine.com:8081/sparkworker2/
- sparkworker3: http://myhostmachine.com:8081/sparkworker3/
Hive server : http://myhostmachine.com:10002/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setting up a new cluster

Docker Host Machine

Firwall Configuration

Prerequisites

Configuration

Launch the Cluster

Step-1: Build base image

Step-2: Build the other images using a docker compose file

Step-3 : Booting up the whole cluster

Cluster Resource Management

Cluster Administration via Web UI

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
airflow		airflow
base		base
config		config
hadoop		hadoop
hive		hive
kafka		kafka
postgres		postgres
redis		redis
spark		spark
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bigdata-cluster.env		bigdata-cluster.env
build-imgs.yml		build-imgs.yml
cleanup-volumes.sh		cleanup-volumes.sh
docker-compose.yml		docker-compose.yml

License

pavank/docker-bigdata-cluster

Folders and files

Latest commit

History

Repository files navigation

Setting up a new cluster

Docker Host Machine

Firwall Configuration

Prerequisites

Configuration

Launch the Cluster

Step-1: Build base image

Step-2: Build the other images using a docker compose file

Step-3 : Booting up the whole cluster

Cluster Resource Management

Cluster Administration via Web UI

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages