Real-time Financial Data Processing Pipeline

This project demonstrates a real-time financial data processing pipeline using Apache Kafka, Apache Spark, MySQL, and Grafana, all orchestrated with Docker. The pipeline fetches stock data from the Financial Modeling Prep API, processes it using Spark, stores the processed data in MySQL, and visualises it using Grafana.

Project Objectives

Set up a real-time data ingestion system using Apache Kafka
Process streaming data in real-time using Apache Spark
Store processed data in a MySQL database
Visualise the processed data using Grafana
Orchestrate the entire pipeline using Docker

Project Architecture

The project consists of the following components:

Kafka Producer: A Python script that fetches real-time stock data from the Financial Modeling Prep API and publishes it to a Kafka topic.
Kafka: A distributed streaming platform that ingests real-time data from the Kafka Producer and makes it available for processing.
Spark: A distributed computing system that consumes data from Kafka, processes it in real-time, and stores the processed data in a MySQL database.
MySQL: A relational database management system used to store the processed stock data.
Grafana: An open-source platform for data visualization and monitoring, used to create dashboards and visualise the processed stock data.

The project uses requirements.txt files to manage the Python dependencies for the Kafka producer and Spark processing scripts. The dependencies are installed within the respective Docker containers during the build process.

Prerequisites

Python (version 3.12)
Docker: Install Docker and Docker Compose on your machine.
Financial Modeling Prep API Key: Sign up for a free API key at Financial Modeling Prep.

Setup Instructions

Clone the project repository:

git clone https://github.com/hawa1222/real-time-data-processing.git

Navigate to the project directory:
```
cd real-time-data-processing
```
Set up your environment:

Make the setup script executable (if it's not already):
```
chmod +x setup_environment.sh
```
Then run the setup_environment.sh script to create a virtual environment and install all necessary packages. Execute this script from the root directory of the project:
```
./setup_environment.sh
```
Create a .env file in the project root directory and provide the environment variables as specified in .env_template.

Usage

Build and run the Docker containers:
```
docker-compose up --build
```
- This command will build the Docker images and start the containers for each service (Kafka, Spark, MySQL, and Grafana).
If you wish to run the Spark and Kafka Python scripts individually without using Docker, activate the virtual environment created by setup_environment.sh, run zookeeper & kafka locally, and run the scripts from the command line.
- For Kafka:
```
python kafka/kafka_producer.py
```
- For Spark:
```
python spark/process_data.py
```
Access the Grafana dashboard:

Open your web browser and visit http://localhost:3000. Log in using the admin credentials you provided in the .env file.
- The MySQL data source should be automatically configured based on the datasource.yml file.
- The default dashboard for visualising stock data should be imported automatically based on the stock_data_dashboard.json file.

Additional Configuration

Update the .env file root directory to change the MySQL connection details if required.
Customise the stock_data_dashboard.json file in the grafana/ directory to modify the default Grafana dashboard.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time Financial Data Processing Pipeline

Project Objectives

Project Architecture

Prerequisites

Setup Instructions

Usage

Additional Configuration

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
database		database
grafana		grafana
kafka		kafka
spark		spark
.env_template		.env_template
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup_environment.sh		setup_environment.sh

License

hawa1222/real-time-data-processing

Folders and files

Latest commit

History

Repository files navigation

Real-time Financial Data Processing Pipeline

Project Objectives

Project Architecture

Prerequisites

Setup Instructions

Usage

Additional Configuration

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages