Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: skew #1173

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

CarloLepelaars
Copy link

@CarloLepelaars CarloLepelaars commented Oct 14, 2024

This PR adds skew to Narwhals. Support is added for Polars, Pandas-like and Arrow.

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

@CarloLepelaars CarloLepelaars changed the title Skewness feat: skew Oct 14, 2024
@CarloLepelaars CarloLepelaars changed the title feat: skew feat: skew Oct 14, 2024
@github-actions github-actions bot added the enhancement New feature or request label Oct 14, 2024
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome effort, thanks @CarloLepelaars , good to have you as contributor! Looks like there's a doctest failure

@CarloLepelaars
Copy link
Author

Thanks for the kind words! Doctest should be fixed now.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating, just left some comments (i'm a little tired today though so sorry if my comments don't make sense 😅 )

narwhals/_arrow/series.py Outdated Show resolved Hide resolved
narwhals/_pandas_like/series.py Outdated Show resolved Hide resolved
narwhals/expr.py Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Member

btw, if you wanted to just fix a typo somewhere in a separate pr (or, say, take #1170), then once you're already a contributor, CI will always run automatically without me having to approve and run - just bringing this up in case it makes it easier for you

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @CarloLepelaars, thanks for the PR!

I left a few comments - the main challenge seems to be how different implementations are between pandas and polars native methods. However polars provide the formula it uses for the computation. It should be possible to reproduce that with native methods or using the series/expr methods that are already implemented in narwhals :)

narwhals/_arrow/namespace.py Outdated Show resolved Hide resolved
@@ -298,6 +299,17 @@ def std(self, ddof: int = 1) -> int:

return pc.stddev(self._native_series, ddof=ddof) # type: ignore[no-any-return]

def skew(self) -> float:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it would end up returning a pyarrow scalar, I think we should keep the implementation with native methods, or you can reuse methods implemented, such as all elementary operations

narwhals/_pandas_like/namespace.py Outdated Show resolved Hide resolved
@@ -424,6 +426,23 @@ def std(
ser = self._native_series
return ser.std(ddof=ddof)

def skew(self) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Marco pointed out in the example, polars and pandas implementation seems to differ. We should try to remap to polars behavior here as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also has the default Polars behavior (i.e. Biased skewness) now. Is that what you mean by remapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is not explained extensively, but we try to stick to the polars API - which means both signature and behavior matching.

So as mentioned, I am happy to keep the addition of bias argument as a follow up, in the meanwhile, everything else should match polars behavior and outcome

narwhals/_polars/namespace.py Outdated Show resolved Hide resolved
narwhals/expr.py Outdated Show resolved Hide resolved
@@ -433,6 +433,43 @@ def std(self, *, ddof: int = 1) -> Self:
"""
return self.__class__(lambda plx: self._call(plx).std(ddof=ddof))

def skew(self) -> Self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you fancy going one step further, polars signature has a bias argument - I am also happy to keep it as a follow up if that's too much for one PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, will check it out. Any suggestions on how to easily expose this bias argument for Narwhals as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A (maybe) sophisticated example of how such remapping would work is value_counts - which however exists only for series, but you should get the gist of it

narwhals/_pandas_like/series.py Outdated Show resolved Hide resolved
@@ -519,6 +519,40 @@ def mean(self) -> Any:
"""
return self._compliant_series.mean()

def skew(self) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as Expr.skew, polars exposes a bias parameter

@CarloLepelaars
Copy link
Author

CarloLepelaars commented Oct 14, 2024

Hey @CarloLepelaars, thanks for the PR!

I left a few comments - the main challenge seems to be how different implementations are between pandas and polars native methods. However polars provide the formula it uses for the computation. It should be possible to reproduce that with native methods or using the series/expr methods that are already implemented in narwhals :)

This is indeed challenging @FBruzzesi. I've made it so every backend returns the biased population skewness, but we can potentially include an option for the unbiased skewness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants