Skip to content

g4rr3tt/siads697-capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MADS Capstone - Power Grid Protagonists

Website

Please visit https://www.pg-protagonists.com for additional background and a detailed analysis of the results.

Abstract

This repository represents a capstone project for the University of Michigan, Master of Applied Data Science program.

The U.S. power grid network is both interesting and incredibly complex. Governed by 70+ balancing authorities, 13,000+ power plants, 77,000+ substations, and 160,000+ miles of high-voltage power lines and millions of low-voltage power lines and distribution transformers connecting customers around the United States. The notebooks in this respository seek to achieve the following:

  • Visualize the U.S. network of power plants, substations, and balancing authorities
  • Identify potential vulnerabilities in the U.S. power grid.
  • Provide inherent risk assessment based on network measures, outages, weather events, etc.

Data Sources

The 01_data_collection.ipynb notebook will download and organize all data sets from their source location. The combined size of all data sets is approximately 400MB.

The primary data sets are below:

Entity Relationship Diagram

The data sources used to create the network required significant cleaning. The entity relationship diagram, post-cleaning, is included below as a reference.

Requirements

If the notebooks are run locally, the following command will install the packages according to the configuration file requirements.txt.

# install requirements
$ pip install -r requirements.txt

Notebooks

The project is designed to run the Jupyter notebooks in a specfic order to clean and enrich the original data sets as well as allow additional exploration at different stages. The notebooks can be run locally, or directly in Google Colab using the links below.

Each notebook will need access to data from the prior notebook, so if you are running in Google Colab, you will want to adjust the data storage location in the "Mount Drive" section.

  1. 01_data_collection.ipynb Open In Colab
    This notebook downloads all of the raw data sets and will create the necessary folder structure (data/raw/) in your working directory.

  2. 02_data_cleaning.ipynb Open In Colab
    This notebook performs various cleaning activities on the raw data sets, including cross-referencing to ensure there are proper primary and foreign keys amongst them.

  3. 03_network_analysis.ipynb Open In Colab
    This notebook imports the cleaned data and creates the networks for power plants and substations, and power plants and balancing authorities. It also calculates related metrics for degree centrality, betweenness centrality, and clustering coefficients, and combines those with the cleaned data. As some of the metrics take a long time to calculate, such as betweeness centrality, pickle files are provided in the models directory and leveraged by default, although the code to re-run them is avaialble in the notebook and can be uncommented.

  4. 04_electric_disturbance_events.ipynb Open In Colab
    This notebook imports the cleaned data and uses that to calculate the probability of a disturbance and/or outage at the power plant, substation, and balancing authority levels. The results are combined with the cleaned data for downstream analysis.

  5. 05_energy_forecasting.ipynb Open In Colab
    This notebook imports time series data to explore seasonality of balancing authorities and forecast energy generation and demand.

  6. 06_risk_analysis.ipynb Open In Colab This notebook leverages the metrics and analysis from prior notebooks to explore risk associated with substations and balancing authorities.

Sample Output

Some of the visuals and information from these notebooks can be found below.

Network

Forecasting

Risk

License

This project is distributed under the MIT License.

Team Members