This repo contains code for a number of helper functions mentioned in the Pandas Illustrated guide.
pip install pandas-illustrated
Basic operations:
find(s, x, pos=False)
findall(s, x, pos=False)
insert(dst, pos, value, label, axis=0, ignore_index = False, order=None, allow_duplicates=False, inplace=False)
append(dst, value, label = lib.no_default, axis=0, ignore_index = False, order=None, allow_duplicates: bool = False, inplace=False)
drop(obj, items=None, like=None, regex=None, axis=None)
move(obj, pos, label=None, column=None, index=None, axis=None, reset_index=False)
join(dfs, on=None, how="left", suffixes=None)
Visualization improvements:
patch_series_repr(footer=True)
unpatch_series_repr()
sidebyside(*dfs, names=[], index=True, valign="top")
sbs = sidebyside
MultiIndex helpers:
patch_mi_co()
from_dict(d)
from_kw(**kwargs)
Locking columns order:
locked(obj, level=None, axis=None, categories=None, inplace=False)
lock = locked with inplace=True
vis_lock(obj, checkmark="✓")
vis_patch()
vis_unpatch()
from_product(iterables, sortorder=None, names=lib.no_default, lock=True)
MultiIndex manipulations:
get_level(obj, level_id, axis=None)
set_level(obj, level_id, labels, name=lib.no_default, axis=None, inplace=False)
move_level(obj, src, dst, axis=None, inplace=False, sort=False)
insert_level(obj, pos, labels, name=lib.no_default, axis=None, inplace=False, sort=False)
drop_level(obj, level_id, axis=None, inplace=False)
swap_levels(obj, i: Axis = -2, j: Axis = -1, axis: Axis = None, inplace=False, sort=False)
join_levels(obj, name=None, sep="_", axis=None, inplace=False)
split_level(obj, names=None, sep="_", axis=None, inplace=False)
rename_level(obj, mapping, level_id=None, axis=None, inplace=False)
By default find(series, value)
looks for the first occurrence of the given value in a series and returns the corresponsing index label.
>>> import pandas as pd
>>> import pdi
>>> s = pd.Series([4, 2, 4, 6], index=['cat', 'penguin', 'dog', 'butterfly'])
>>> pdi.find(s, 2)
'penguin'
>>> pdi.find(s, 4)
'cat'
When the value is not found raises a ValueError
.
findall(series, value)
returns a (possibly empty) index of all matching occurrences:
>>> pdi.findall(s, 4)
Index(['cat', 'dog'], dtype='object')
With pos=True
keyword argument find()
and findall()
return the positional index instead:
>>> pdi.find(s, 2, pos=True)
1
>>> pdi.find(s, 4, pos=True)
0
There is a number of ways to find index label for a given value. The most efficient of them are:
— s.index[s.tolist().index(x)] # faster for Series with less than 1000 elements
— s.index[np.where(s == x)[0][0]] # faster for Series with over 1000 elements
find()
chooses optimal implementation depending on the series size; findall()
always uses the where
implementation.
Run pdi.patch_series_repr()
to make Series look better:
If you want to display several Series from one cell, call display(s)
for each.
To display several dataframes, series or indices side by side run pdi.sidebyside(s1, s2, ...)
Run pytest
in the project root.