This repository provides the outline of a standard Hadoop cloud computing cluster, as well as configuration scripts for bootstrapping the environment.
Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
HBase is an open-source, non-relational, distributed database modeled after Google's Bigtable and written in Java. It runs on top of HDFS, providing Bigtable-like capabilities for Hadoop.
ZooKeeper is a centralized service for distributed systems to a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems.
(Source: Wikipedia & Apache)