Skip to content

Southampton-RSG/github-analysis

Repository files navigation

GitHub Analysis (gha)

A Python tool for scraping a set of repositories from GitHub to a MongoDB database.

Using the Data

To use the data which has been collected by gha, you do not need to follow this readme and run it yourself, though you may still wish to if you want to collect a small local dataset for testing. Instead, please see the project wiki page on using the data.

Installing and Running

Prerequisites

  • Docker
  • Python 3.7 or greater

Install

  1. Clone this repository and cd into the cloned directory
  2. Create and activate a virtual environment
  3. Install this package (gha) into the virtual environment
git clone https://github.com/Southampton-RSG/github-analysis.git
cd github-analysis
python3 -m venv venv
source venv/bin/activate
pip install .

Configuration

  1. Create a GitHub personal access token at https://github.com/settings/tokens
    • No permissions are required
  2. Populate a .env file from .env.template

Running

  1. Start MongoDB database containers
    • docker-compose can be installed with pip if necessary
  2. Start gha scraper using a repo list file
    • Virtual environment created above must still be active
docker-compose up -d
gha fetch -f tests/data/UKRI_10.txt

The database web console can be accessed at http://localhost:8081/db/github/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published