Skip to content

Using a simple studio Ghibli dataset to demonstrate matplotlib and seaborn skills

Notifications You must be signed in to change notification settings

jshapi16/studio_ghbli_matplotlib_seaborn_tableau

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

studio_ghbli_matplotlib_seaborn

Using a simple studio Ghibli dataset to demonstrate matplotlib and seaborn skills

I downloaded this dataset from https://www.kaggle.com/datasets/shruthiiiee/studio-ghibli-dataset/data.

I had some questions on the quality of the revenue numbers, so I replaced them manually using figures from https://www.boxofficemojo.com. (Note: many of the most popular movies including Spirited Away and My Neighbor Totoro had multiple release dates. I did not add the revenue together, I only took the revenue from first release. In addition, I filled in missing screenwriter names with the top screenwriter for the project, which was usually the same as the director.

The data needed some cleaning. In particular, I changed the revenue figures from strings to integers, eliminated the "Genre 3" column due to NaNs, changed the "Duration" column to minutes in integers, and removed special characters from the "Name" column (movie title).

Original Dataframe original_df

Clean Dataframe clean_df

I decided to graph the revenue data visually by movie but there is a large discrepancy between the highest grossing and lowest grossing films, so I created four matplotlib graphs to display the full data and then segmented parts of the data.

ghibli_by_revenue

Next I wanted to know whether Studio Ghibli movies generated more revenue over time. To do this, I made two scatterplots plotting year(x) and revenue (y). Two plots because "The Boy and the Heron" generated an outlier amount of additional revenue from the rest of the movies. I plotted the graphs on a logarithmic scale to better see the differences between revenues over time. I annotated each point for ease of viewing and added colors. The lighest colors don't show up as well on the sns darkgrid theme, an additional modification to this graph would be to continue to tweak the colors to better see all the titles.

ghibli_revenue_time

Ultimately, there is no correlation between year and revenue generation.

To finish the project, I created a dashboard in Tableau with some additional graphs and graphics. https://public.tableau.com/shared/7K7BCRTT7?:display_count=n&:origin=viz_share_link

About

Using a simple studio Ghibli dataset to demonstrate matplotlib and seaborn skills

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published