Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad developer experience - df.__dataframe_namespace__() is untyped #258

Closed
MarcoGorelli opened this issue Sep 13, 2023 · 5 comments
Closed
Labels
static typing type annotations, use of type checkers directly from the spec

Comments

@MarcoGorelli
Copy link
Contributor

The developer experience of using the api could be improved (to put it mildly) - I'll give a demo

If I stick to dataframe methods, then it's all typed and so I get nice suggestions from my IDE:

image

But if I try getting the namespace out, I don't get any suggestions, because __dataframe_namespace__() is typed to return Any

image

Before I make suggestions, can we agree that this is a bad developer experience and that it needs fixing?

@MarcoGorelli MarcoGorelli added the static typing type annotations, use of type checkers directly from the spec label Sep 13, 2023
@rgommers
Copy link
Member

This indeed would be nice to improve for devs who rely on code completion in IDEs. I'll note that the same problem for __array_namespace__ is just being tackled at data-apis/array-api#685. I suggest looking at that and the discussion in the issue linked to that PR for the solution. You're pretty good at static typing, so reviewing that PR would be great. If it works on the array side, it should work for __dataframe_namespace__ as well.

@rgommers
Copy link
Member

One other thought: for any concrete implementation I think this is a non-issue, because they should be returning a regular Python module rather than Any. The difficulty really is only in statically typing the spec itself correctly. Maybe you can try this already in one of your Pandas or Polars prototypes?

@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Sep 13, 2023

The issue I was thinking of is the one which will be faced by developers trying to use the dataframe api to write dataframe-agnostic code

Say I'm writing a package awesome-feature-engineer, and I make a function:

from typing import Protocol, Any

from dataframe_api import DataFrame


class SupportsAPIStandard(Protocol):
    def __dataframe_consortium_standard__(
        self, *, api_version: str | None = None
    ) -> DataFrame:
        ...


def min_max_scaler(df_raw: SupportsAPIStandard) -> Any:
    df= df_raw.__dataframe_consortium_standard__()
    namespace = df.__dataframe_namespace__()
    col = namespace.col

    df = df.update_columns(*[
        (col(column_name) - col(column_name).min()) / (col(column_name).max() - col(column_name).min())
    ])
    return df.dataframe

All I know about the input df_raw is that it's supposed to have an implementation of the dataframe api, i.e. that it has a __dataframe_consortium_standard__ method so that I can call it can be a standard-compliant object.
df is typed as DataFrame, so I can:

  • tab-complete to see what methods are available, and read their docstrings
  • run mypy on my code to validate that I'm staying within the scope of the Standard

But for namespace, all bets are off, because it's typed Any - I don't get handy tab completion, but more importantly, I can't validate spec compliance automatically.

And I can't use the type hints from a concrete implementation, because by definition of the exercise I'm trying to write a dataframe-agnostic function (so, I can't annotate it as def min_max_scaler(df_raw: pd.DataFrame) )


I'll take a look at the array api PR, thanks!

@rgommers
Copy link
Member

Thanks for spelling that out. It all makes sense to me, and I think the array API namespace type hinting solution will help (and is much easier to apply here, because we have very few free functions in the dataframe API standard).

@MarcoGorelli
Copy link
Contributor Author

closed by #267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
static typing type annotations, use of type checkers directly from the spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants