Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arun sqc interop #5

Open
wants to merge 47 commits into
base: Feature_Branch_for_QC
Choose a base branch
from
Open

Conversation

arunjose696
Copy link
Owner

@arunjose696 arunjose696 commented Jul 12, 2024

What do these changes do?

Adds interoperablity between different query compiler by introducing a decorator function to do type casting of all query compiler arguments to the type of left operand

result=qc_native.op(pandas_qc) # This would result in result being of type native_qc
result=pandas_qc.op(qc_native) # This would result in result being of type pandas_qc

The tests for interoperablity have been added by modifying tests in modin/tests/pandas/dataframe/ folder such that operations would be performed across dataframes with different query compilers.

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves #?
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@@ -697,6 +697,16 @@ jobs:
- run: python -m pytest modin/tests/pandas/dataframe/test_reduce.py
- run: python -m pytest modin/tests/pandas/dataframe/test_udf.py
- run: python -m pytest modin/tests/pandas/dataframe/test_window.py
- run: python -m pytest modin/tests/pandas/interop/test_binary.py
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- run: python -m pytest modin/tests/pandas/interop/test_binary.py
- run: python -m pytest modin/tests/native_df_mode/pandas/interop/test_binary.py

I envision we could implement a native df mode for polars so let's change the test directory stucture a bit.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"""
Module contains ``QueryCompilerTypeCaster`` class.

``QueryCompilerTypeCaster`` is used for va.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend the docstring?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -585,7 +586,6 @@ class NativeQueryCompiler(BaseQueryCompiler):
_shape_hint: Optional[str]

def __init__(self, pandas_frame, shape_hint: Optional[str] = None):
assert NativeDataframeMode.get() == "Pandas"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed?

Copy link
Owner Author

@arunjose696 arunjose696 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now constructor of NativeQueryCompiler can be also called when NativeDataframeMode.get() is default, Irrespective of NativeDataframeMode when we call this constructor in operations where the left operand is a NativeQueryCompiler we construct a new query compiler of type NativeQueryCompiler

Fn = TypeVar("Fn", bound=Any)


class QueryCompilerTypeCaster:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class QueryCompilerTypeCaster:
class QueryCompilerCaster:

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return arguments


def apply_argument_casting():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def apply_argument_casting():
def apply_argument_cast():

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

A decorator function.
"""

def decorator(obj: Fn) -> Fn:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use functools.wrap(obj) to keep properties of a function correct.

Copy link
Owner Author

@arunjose696 arunjose696 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now added functools to cast_args the function returned from this decorator, from what I understand functools wrapping is required only for the wrapper method returned from the decorator and the decorator by itself does not need functools.wraps()

elif isinstance(obj, staticmethod):
return staticmethod(decorator(obj.__func__))

def cast_args(*args: Tuple, **kwargs: Dict) -> Any:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2993,9 +2993,8 @@ def _create_or_update_from_compiler(
DataFrame or None
None if update was done, ``DataFrame`` otherwise.
"""
assert (
isinstance(new_query_compiler, type(self._query_compiler))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_query_compiler can also be an instance of a different query compiler(Nativequerycompiler or PandasQueryCompiler),

Eg during operations like insert in the _create_or_update_from_compiler the constructor gets called directly eg ,

Thus it would be normal for the cases where constructor may return a new query compiler for a case where user changes the query compiler mode between creating data_frame and insert operation

Comment on lines 121 to 158
current_qc = None
if isinstance(args[0], BaseQueryCompiler):
current_qc = args[0]

if current_qc:
args = cast_nested_args_to_current_qc_type(args, current_qc)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
current_qc = None
if isinstance(args[0], BaseQueryCompiler):
current_qc = args[0]
if current_qc:
args = cast_nested_args_to_current_qc_type(args, current_qc)
if isinstance(args[0], BaseQueryCompiler):
current_qc = args[0]
args = cast_nested_args_to_current_qc_type(args, current_qc)

What about kwargs?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

def decorator(obj: Fn) -> Fn:
"""Decorate function or class to cast all arguments that are query comilers to the current query compiler"""
if isinstance(obj, type):
seen: Dict[Any, Any] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this dict? Is there a case when we may not get into except KeyError branch?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to handle cases where for same function different aliases exists in basequerycompiler, This is no longer required and no cases in both the query compilers that would go to except so removing this.

ZhipengXue97 and others added 9 commits July 15, 2024 09:57
* FEAT-modin-project#7331: Initial Polars API

This commit adds a polars namespace to Modin, and the DataFrame and
Series objects and their respective APIs. This doesn't include error
handling and is still missing several polars features:

* LazyFrame
* Expressions
* String, Temporal, Struct, and other Series accessors
* Several parameters
* Operators that we don't have query compiler methods for
   * e.g. sin, cos, tan, etc.

Those will be handled in a future PR.

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Lint

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* flake8

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* isort

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* headers

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* forgot one

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Add test

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* header

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* isort

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Add to CI

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* fix name

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Update modin/polars/base.py

Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>

* address comments

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* polars 1

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Update for polars 1.x and fix some hacks

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Remove hax

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Black

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Address comments

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Lint

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

* Address comment

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>

---------

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>
Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>
Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
…odin-project#7352)

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>
…Dataset. (modin-project#7353)

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
…n-project#7354)

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
…er (modin-project#7356)

Signed-off-by: arunjose696 <arunjose696@gmail.com>
…odin-project#7358)

Signed-off-by: Devin Petersohn <devin.petersohn@snowflake.com>
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 3 times, most recently from f788a20 to 68a51ef Compare August 14, 2024 16:18
Retribution98 and others added 13 commits August 19, 2024 13:56
…amic partitioning (modin-project#7369)

Signed-off-by: Kirill Suvorov <kirill.suvorov@intel.com>
Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com>
…ervice, pin to 5.0.13 (modin-project#7374)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Co-authored-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
…ply` (modin-project#7338)

Signed-off-by: Kirill Suvorov <kirill.suvorov@intel.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
…rialized_dtypes to query compiler layer as in the code in multiple places the methods of private _modin_frame were used
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 5 times, most recently from cd70589 to 9af55f2 Compare August 26, 2024 15:29
Signed-off-by: arunjose696 <arunjose696@gmail.com>
arunjose696 and others added 2 commits August 28, 2024 10:55
Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>
Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com>
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 2 times, most recently from 94e8e48 to af0a7ba Compare August 28, 2024 10:10
Signed-off-by: arunjose696 <arunjose696@gmail.com>
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 4 times, most recently from 04b7f60 to 03d137c Compare August 29, 2024 07:15
Signed-off-by: arunjose696 <arunjose696@gmail.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
@arunjose696 arunjose696 force-pushed the arun-sqc-interop branch 2 times, most recently from 136bf1e to f488872 Compare September 2, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants