Skip to content

Commit

Permalink
todays notes
Browse files Browse the repository at this point in the history
  • Loading branch information
brownsarahm committed Feb 15, 2023
1 parent 48468f5 commit 04e8951
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 0 deletions.
1 change: 1 addition & 0 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ parts:
- file: notes/2023-02-02
- file: notes/2023-02-07
- file: notes/2023-02-09
- file: notes/2023-02-14
- caption: Assignments
numbered: True
chapters:
Expand Down
47 changes: 47 additions & 0 deletions notes/2023-02-14.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,11 +159,58 @@ sns.displot(data=coffe_scores_df2, x='score',hue='Country.of.Origin',
```

```{code-cell} ipython3
coffee_df.describe()
```

## More manipulations

Here, we will make a tiny `DataFrame` from scratch to illustrate a couple of points

```{code-cell} ipython3
large_num_df = pd.DataFrame(data= [[730000000,392000000,580200000],
[315040009,580000000,967290000]],
columns = ['a','b','c'])
large_num_df
```

This dataet is not tidy, but making it this way was faster to set it up. We could make it tidy using melt as is.

```{code-cell} ipython3
large_num_df.melt()
```

However, I want an additional variable, so I wil reset the index, which adds an index column for the original index and adds a new index that is numerical. In this case they're the same.

```{code-cell} ipython3
large_num_df.reset_index()
```

If I melt this one, using the index as the `id`, then I get a reasonable tidy DataFrame

```{code-cell} ipython3
ls_tall_df = large_num_df.reset_index().melt(id_vars='index')
ls_tall_df
```

Now, we can plot.

```{code-cell} ipython3
sns.catplot(data = ls_tall_df,x='variable',y='value',
hue='index',kind='bar')
```

Since the numbers are so big, this might be hard to interpret. Displaying it with all the 0s would not be easier to read. The best thing to do is to add a new colum with adjusted values and a corresponding title.

```{code-cell} ipython3
ls_tall_df['value (millions)'] = ls_tall_df['value']/1000000
ls_tall_df.head()
```

Now we can plot again, with the smaller values and an updated axis label. Adding a column with the adjusted title is good practice because it does not lose any data and since we set the value and the title at the same time it keeps it clear what the values are.

```{code-cell} ipython3
sns.catplot(data = ls_tall_df,x='variable',y='value (millions)',
hue='index',kind='bar')
```

```{code-cell} ipython3
Expand Down

0 comments on commit 04e8951

Please sign in to comment.