Created by Maria Dong, Ryan Jones, Rebecca Leeds, & Amber Pizzo.
A site that analyzes and predicts climate change and its effects.
Access the deployed page here.
Figure 1: The Climateers homepage.
As our world progresses technologically, the health of our planet degrades day by day. Our group's aim was to analyze the trends seen today and model how our world could be affected by climate change in the future via machine learning.
Data sources:
- Our World in Data for:
- Datahub.io for:
- National Oceanic and Atmospheric Administration (NOAA) for:
- National Snow & Ice Data Center (NSIDC) for northern sea ice extent geoTIFFs
- Pew Research for:
- International Energy Agency (IEA) for hydropower investment data
- Environmental Protection Agency (EPA) for sea temperature data
Libraries & packages:
- amCharts for all visualizations
- scikit-learn for:
- Twint for pulling tweet information from Twitter
- fastai.text for text analysis & deep learning, pytorch for NLP models
- QGIS for vectorizing geoTIFFs & converting to geoJSON, mapshaper for simplifying & formatting JSON polygons
- pyreadstat for reading
.sav
files in Python
Tools & languages: JavaScript, HTML, CSS, Python, Pandas, Jupyter Notebook
Design & visuals:
We explored several different phenomena and questions for our models to answer. You can view the models and their brief explanations on our deployed page, while this README has more in-depth, technical information and analyses.
Figure 2: One example of our several interactive visualizations.
Is there a link between CO2 emissions and global temperature? Are they rising?
Maximum temperature data was collected from the NOAA as global summary of the year from multiple climate stations around the world. The values were then averaged together for the years 1880 to 2010. A linear regression machine learning model was used with the inputs listed below. Originally, the snow data was also used as an input for all the other outputs, but because it varied so greatly from year to year, it was not a reliable input and caused the testing scores to decrease significantly.
For the CO2 model, the selected features are related to different fossil fuel emission concentrations in the atmosphere and variations in global temperature. As a multivariate regression, correlation between these datapoints was shown to be linear.
Model Parameters:
-
Maximum temperature (TMAX):
- Inputs: historical gas fuel CO2 level, liquid fuel CO2 level, solid fuel CO2 level, cement CO2 level, gas flaring CO2 level, population, sea level, extreme precipitation, total precipitation, minimum temperature
- Testing score: 0.91
-
Atmospheric CO2:
- Inputs: historical gas fuel CO2 level, liquid fuel CO2 level, solid fuel CO2 level, cement CO2 level, gas flaring CO2 level
- Testing score: 0.96
How have the world's oceans been affected?
We gathered northern sea ice extent polygons for every 5 years in September from 1980 to 2020, and plotted them consecutively. Numerically, from September 1980 to September 2020, area of sea ice extent has decreased from 7.67 to 3.92 million sq km, leading to devastating consequences to surrounding ecosystems and wildlife. This is consistent with rising global temperatures, both atmospheric and oceanic. This also directly contributes to rising sea levels, which in turn leads to flooding and higher likelihood of hurricane formation, which we discuss further below.
A linear regression machine learning model was used to predict the future trend of sea levels, as historical sea levels displayed a linear increase over time. When plotting this model with predictions to the year 2200, a steady increase over time can be seen. Future sea levels were also predicted by fitting a trend line to historical sea level data. The result of this simplistic linear regression for the year 2200 was within 1.5in of the result for the machine learning model for the same year, which also helps validate the machine learning model.
Noting that the historical sea temperature trend is also fairly linear, a linear regression machine learning model was employed to predict the future trend for sea temperature as well. When visualizing this model with predictions to the year 2200, the sea temperature is expected to continue to increase at its current rate.
Model Parameters:
- Sea levels:
- Inputs: historical cement emissions, global temperature, population, sea temperature change, CO2 emissions
- Testing score: 0.98
- Sea temperatures:
- Inputs: historical global temperature, population, glacier mass, sea level changes
- Testing score: 0.97
How have weather and natural disasters responded to climate change? Should we expect more extreme weather systems?
Like data for maximum temperature (see above), precipitation and snow data were also collected from the NOAA as global summary of the year from multiple climate stations around the world. Those values were then averaged together for the years 1880 to 2010. A linear regression machine learning model was used with the inputs listed below. Again, snow data was originally also used as an input for all the other outputs, but because it varied so greatly from year to year, it was not a reliable input. Because it was so varied, we had to utilize snow data for the last 100 years to create our model, while for all other outputs, only the last 40 years were used as the basis for the predictions.
Model Parameters:
- Snow (SNOW):
- Inputs: historical gas fuel CO2 level, liquid fuel CO2 level, solid fuel CO2 level, cement CO2 level, gas flaring CO2 level, population, sea level, extreme precipitation, precipitation, maximum temperature, minimum temperature
- Testing score: 0.91
- Precipitation (PRCP):
- Inputs: historical gas fuel CO2 level, liquid fuel CO2 level, solid fuel CO2 level, cement CO2 level, gas flaring CO2 level, population, sea level, extreme precipitation, precipitation, maximum temperature, minimum temperature
- Testing score: 0.87
Hurricane data collected from the National Hurricane Center (NHC) ranged back to the 1850s, but did not include a storm intensity category, likely because the standard categorization criteria (Saffir-Simpson Hurricane Wind Scale) was not introduced until 1971. Therefore, unsupervised learning was utilized in the form of a K-means clustering model to group hurricanes by their minimum pressure and maximum windspeed. This model was optimized to have four categories of hurricane intensity.
From there, linear regression machine learning models were used to predict the frequency of total hurricanes, as well as the frequency of each category of hurricane. Most models prove to be relatively reliable, with testing scores at 0.75 and above. However, the model for category 4 hurricanes shows a less reliable testing score of 0.68. The K-means clustering model shows the category 4 data as the most diverse of the 4 categories, with both maximum wind speed and minimum pressure having a larger range than any of the other categories. This model could be further optimized by adding more categories, and thus reducing the large range in this particular category. This may lead to more reliable modeling of future trends.
Model Parameters:
- Total hurricanes:
- Inputs: historical gas fuel emissions, sea temperature changes, global temperature, sea level changes
- Testing score: 0.83
- Category 1 hurricanes:
- Inputs: historical gas fuel emissions, sea temperature changes, global temperature, sea level changes
- Testing score: 0.79
- Category 2 hurricanes:
- Inputs: historical gas fuel emissions, cement emissions, global temperature, sea temperature changes, sea level changes
- Testing score: 0.75
- Category 3 hurricanes:
- Inputs: historical gas fuel emissions, sea temperature changes, global temperature, sea level changes
- Testing score: 0.78
- Category 4 hurricanes:
- Inputs: historical gas fuel emissions, liquid fuel emissions, global temperature, sea temperature changes, sea level changes
- Testing score: 0.68
Linear regression models were used to predict the frequency of total tornadoes, as well as the frequency of tornado magnitudes 0 and 2 (based on the Fujita Scale). Reliable models could not be created for magnitude 1, 3, 4, or 5 tornadoes. This is likely because the historical data for tornadoes of these magnitudes shows very consistent numbers from year to year, making it difficult to predict their future with the global warming factors employed in this project that show a steady increase over time. This indicates tornadoes of these magnitudes are not reliant on the traditional indicators of global warming.
Model Parameters:
- Total tornadoes:
- Inputs: historical gas fuel emissions, cement emissions, global temperature, population, CO2 emissions, and sea level changes
- Testing score: 0.69
- Magnitude 0 tornadoes:
- Inputs: historical gas fuel emissions, liquid fuel emissions, cement emissions, global temperature, population, and CO2 emissions
- Testing score: 0.82
- Magnitude 2 tornadoes:
- Inputs: historical gas fuel emissions, liquid fuel emissions, cement emissions, global temperature, population and sea temperature changes
- Testing score: 0.74
How does the public feel about climate change? Can we replicate online opinions with deep learning?
In the U.S., the existence of climate change and the role of CO2 emissions have become a highly partisan topic. In a 2019 Pew Research survey in the U.S., Democratic-associated respondents answered that human activity contributed "a great deal" to climate change (76.8%) as opposed to the Republican-associated (20.1%).
Furthermore, in a 2018 worldwide Pew survey, among 27 surveyed countries, the U.S. ranked in the lower-half of percent of respondents who consider climate change to be a "major threat" (58%), and had the fourth-highest percentage (18%) of respondents stating that climate change is "not a threat".
To measure online opinions on climate via Twitter, we explored two particularly opinionated hashtags: #climatechangeisreal and #climatechangehoax. We compiled lists of tweets (with Twint) for each tag from January 2014 to October 2020 that were retweeted at least 100 times, to isolate the more influential tweets.
We then used the module fastAI to create a language learning model based on the tweets and the AWD-LSTM recurrent neural network. The optimal learning rate of 1E-1 was selected for one-cycle cyclical learning.
Finally, our own "tweets" were created by using the model to predict a given number of words after being given a word or phrase. We used a TF-IDF vectorizer to determine some of the most n-grams for each hashtag, and chose some of them to feed the language model. You can view the outputs on the deployed page, or run the notebook (assets/data/climate_tweets/tweet_analysis/tweet_analysis.ipynb
) yourself to return new tweets from the same model.
How are we directly impacting the CO2 concentration levels? What parts of our behavior could we change? What is the cost/effectiveness of renewable energy?
One of the major contributors to CO2 emissions is the agricultural supply chain that serves the world's population. Consumption of farm-sourced animal products has increased greatly in recent years, and land use change has increased in response in order to meet the demand of most current diets. This leads to significant amounts of waste and pollution; land use change & animal waste accounts for 83% of greenhouse gas emissions from agriculture. Furthermore, changing landscape for agricultural purposes often leads to an unstable phosphate level in surrounding waterways via eutrophication.
As a result, both our population and our rate of pollution are growing at a alarming rate, suggesting dangerous implications for the future of our environment.
While eating local and "going organic" are trendy solutions to help prevent climate change, an action with larger impact on our carbon footprint is decreasing our consumption of farmed animals, which also would decrease the rate in which land is changed for housing livestock. While it's not necessary to cut out food items such as beef and fish from our diets entirely, we can make a difference in overall demand by avoiding these items even just one day a week.
Investing in renewable energy is the one of the best chances society has to combat the alarming changes seen on the planet today. Over the last several decades the generation of renewable energy has been slowly increasing, with hydropower as a frontrunner. However, as of 2019, the energy generated by hydropower was just above 4200 TWh, which pales in comparison to the 40,000-50,000 TWh of energy consumed in 2019 by each of our main sources of non-renewable energy: gas, oil, and coal. This indicates that, while renewable energy generation is increasing over time, these technologies still need to mature before any significant global impacts can be seen.
When looking at the return on investment for hydropower, solar energy, and wind energy, it can be seen that hydropower seems to give the biggest payout. However, hydropower cannot be the only way forward in the renewable energy arena, as it can only be used in specific areas (dams and reservoirs) and can pose a threat to the habitats nearby. Therefore, investments in wind and solar energy must continue, and likely, increase, to improve their efficiencies and reliabilities.
Some tasks we'd like to build on in future commits:
- Modeling CO2 prediction on exponential trendlines (rather than linear)
- Optimizing webpage design for mobile screens
- Plotting cities prone to flooding in the next few decades
- Performing ML on sea ice polygon shrinking
- Performing ML on renewable energy growth
- Making a visualization "movie" with amCharts for the front page