Skip to content

🤖Automates the process of downloading and processing data from the WHO Global Tuberculosis Programme website. Using Selenium for web scraping and Python's pandas library for data processing

Notifications You must be signed in to change notification settings

SyedFahad7/WebScraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

WHO Global Tuberculosis Programme Data Scraper and Processor

This repository contains a Python script that automates the process of downloading data from the WHO Global Tuberculosis Programme website and performs basic data processing techniques on the downloaded data.

Overview

The script utilizes Selenium, a web automation tool, to navigate to the WHO Global Tuberculosis Programme data page and download the data in CSV format. It then processes the downloaded CSV file using pandas, a powerful data manipulation library in Python, to perform the following data processing techniques:

  1. Handling Missing Values: Drops rows with any missing values.
  2. Data Transformation: Converts string columns to lowercase.
  3. Data Aggregation: Groups data by country and calculates the mean of numeric columns.
  4. Data Filtering: Filters rows based on a condition.

The processed data is then saved as separate CSV files in the specified output directory.

Prerequisites

To run the script, you need the following:

  • Python 3.x installed on your system.
  • The necessary Python packages installed: selenium, pandas.
  • WebDriver installed and its path configured in the script.

Usage

  1. Clone the repository to your local machine.
  2. Install the required Python packages using pip:
    pip install -r requirements.txt
    
  3. Configure the path to the WebDriver in the script
  4. Run the script web_scraping.py.
  5. The processed data will be saved as CSV files in the specified output directory.

Acknowledgements

Feel free to contribute to this project by opening issues or pull requests!

🚀 About Me

I'm a full stack Web & App Developer and an undergrad Data Science Student 👨‍💻🙌

Authors

About

🤖Automates the process of downloading and processing data from the WHO Global Tuberculosis Programme website. Using Selenium for web scraping and Python's pandas library for data processing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages