-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: skew #1173
base: main
Are you sure you want to change the base?
feat: skew #1173
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome effort, thanks @CarloLepelaars , good to have you as contributor! Looks like there's a doctest failure
Thanks for the kind words! Doctest should be fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating, just left some comments (i'm a little tired today though so sorry if my comments don't make sense 😅 )
btw, if you wanted to just fix a typo somewhere in a separate pr (or, say, take #1170), then once you're already a contributor, CI will always run automatically without me having to approve and run - just bringing this up in case it makes it easier for you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @CarloLepelaars, thanks for the PR!
I left a few comments - the main challenge seems to be how different implementations are between pandas and polars native methods. However polars provide the formula it uses for the computation. It should be possible to reproduce that with native methods or using the series/expr methods that are already implemented in narwhals :)
@@ -298,6 +299,17 @@ def std(self, ddof: int = 1) -> int: | |||
|
|||
return pc.stddev(self._native_series, ddof=ddof) # type: ignore[no-any-return] | |||
|
|||
def skew(self) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it would end up returning a pyarrow scalar, I think we should keep the implementation with native methods, or you can reuse methods implemented, such as all elementary operations
@@ -424,6 +426,23 @@ def std( | |||
ser = self._native_series | |||
return ser.std(ddof=ddof) | |||
|
|||
def skew(self) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Marco pointed out in the example, polars and pandas implementation seems to differ. We should try to remap to polars behavior here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also has the default Polars behavior (i.e. Biased skewness) now. Is that what you mean by remapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is not explained extensively, but we try to stick to the polars API - which means both signature and behavior matching.
So as mentioned, I am happy to keep the addition of bias
argument as a follow up, in the meanwhile, everything else should match polars behavior and outcome
@@ -433,6 +433,43 @@ def std(self, *, ddof: int = 1) -> Self: | |||
""" | |||
return self.__class__(lambda plx: self._call(plx).std(ddof=ddof)) | |||
|
|||
def skew(self) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you fancy going one step further, polars signature has a bias
argument - I am also happy to keep it as a follow up if that's too much for one PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, will check it out. Any suggestions on how to easily expose this bias
argument for Narwhals as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A (maybe) sophisticated example of how such remapping would work is value_counts
- which however exists only for series, but you should get the gist of it
@@ -519,6 +519,40 @@ def mean(self) -> Any: | |||
""" | |||
return self._compliant_series.mean() | |||
|
|||
def skew(self) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as Expr.skew
, polars exposes a bias
parameter
This is indeed challenging @FBruzzesi. I've made it so every backend returns the biased population skewness, but we can potentially include an option for the unbiased skewness. |
This PR adds
skew
to Narwhals. Support is added for Polars, Pandas-like and Arrow.Checklist