FEAT-#4605: Add native query compiler #7259

arunjose696 · 2024-05-13T18:53:19Z

What do these changes do?

first commit message and PR title follow format outlined here

NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.
passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
signed commit with git commit -s
Resolves Handle Empty/Small Data DataFrames as a separate case #4605
tests added and passing
module layout described at docs/development/architecture.rst is up-to-date

modin/experimental/core/storage_formats/pandas/small_query_compiler.py

modin/pandas/dataframe.py

modin/pandas/io.py

modin/pandas/series.py

modin/experimental/core/storage_formats/pandas/small_query_compiler.py

modin/pandas/base.py

modin/experimental/core/storage_formats/pandas/small_query_compiler.py

modin/pandas/base.py

modin/pandas/dataframe.py

modin/pandas/io.py

modin/pandas/series.py

modin/experimental/core/storage_formats/pandas/small_query_compiler.py

docs/conf.py

modin/config/envvars.py

modin/core/dataframe/algebra/default2pandas/binary.py

modin/pandas/base.py

modin/pandas/dataframe.py

modin/pandas/io.py

modin/pandas/dataframe.py

modin/pandas/series.py

setup.cfg

modin/pandas/utils.py

devin-petersohn

Great start on solving this problem! Is it possible to avoid so many of the test changes?

devin-petersohn · 2024-05-22T15:34:27Z

modin/config/envvars.py

@@ -851,4 +851,11 @@ def _check_vars() -> None:
        )


+class UsePlainPandasQueryCompiler(EnvironmentVariable, type=bool):


This name is probably a little confusing for users. I suggest something like SmallDataframeMode. This can be set to None by default, and users can set it to "pandas" or some other option in the future (we may have some other single node options coming).

@devin-petersohn, do you think VanillaPandasMode is a good option? Also, why do you think we should make this config of string type to have choices None/pandas/etc.? Wouldn't it be sufficient to have this config boolean - enable/disable?

In the future we may add polars mode. If this happens, we might also want to have an option for that. Making it a string keeps it open to other options. If we have pandas in the name, we can only use that mode for pandas execution. I'm open to other names, but I think we don't want to keep adding more and more configs if we have more options later.

Doesn't this sound like we may have multiple storage formats for a single execution? Do we really want to support this in future?

Potentially, yes I think this is something we could support in the future.

@devin-petersohn, do you think we could support automatic initialization with small qc depending on a data size threshold in future?

I propose to rename UsePlainPandasQueryCompiler to NativeDataframeMode and SmallQueryCompiler to NativeQueryCompiler by sort of analogy with HdkOnNative we had previously.

At a minimum, a more complete definition of this class in the docstring is required.

I will update the name to UsePlainPandasQueryCompiler to NativeDataframeMode and SmallQueryCompiler to NativeQueryCompiler.

arunjose696 · 2024-05-22T16:31:33Z

Great start on solving this problem! Is it possible to avoid so many of the test changes?

The most changes in tests are disabling few checks as it wont be supported without partitions, and as the current changes dont yet support IO like pd.read_csv(), Is there something specific that should be avoided?

devin-petersohn · 2024-05-22T16:45:06Z

is there something specific that should be avoided?

Nothing specific, I was just trying to understand context. Thanks!

modin/pandas/dataframe.py

Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru> Signed-off-by: arunjose696 <arunjose696@gmail.com>

modin/config/envvars.py

modin/core/execution/dispatching/factories/factories.py

modin/tests/pandas/dataframe/test_iter.py

modin/tests/test_utils.py

modin/core/storage_formats/pandas/native_query_compiler.py

Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>

YarShev

@arunjose696, please fix ci-required / lint (pydocstyle) (pull_request) job. Other than that, LGTM!

Signed-off-by: arunjose696 <arunjose696@gmail.com>

YarShev

LGTM!

arunjose696 requested review from devin-petersohn, mvashishtha, RehanSD, YarShev, vnlitvinov, anmyachev, dchigarev and a team as code owners May 13, 2024 18:53

github-advanced-security bot found potential problems May 13, 2024

View reviewed changes

arunjose696 changed the title ~~Adding small query compiler~~ FEAT-#4605: Adding small query compiler May 13, 2024

arunjose696 force-pushed the arun-sqc branch 3 times, most recently from f80e353 to 41bab97 Compare May 16, 2024 11:45

YarShev force-pushed the arun-sqc branch from 41bab97 to b6dc27c Compare May 16, 2024 12:33

github-advanced-security bot found potential problems May 16, 2024

View reviewed changes

modin/pandas/base.py Fixed Show fixed Hide fixed

YarShev reviewed May 16, 2024

View reviewed changes

modin/experimental/core/storage_formats/pandas/small_query_compiler.py Outdated Show resolved Hide resolved

YarShev force-pushed the arun-sqc branch from 8c6544e to 165360f Compare May 16, 2024 15:17

github-advanced-security bot found potential problems May 16, 2024

View reviewed changes

YarShev reviewed May 16, 2024

View reviewed changes

arunjose696 force-pushed the arun-sqc branch from b9f1dc3 to df6b6dc Compare May 22, 2024 13:11

github-advanced-security bot found potential problems May 22, 2024

View reviewed changes

modin/pandas/utils.py Fixed Show fixed Hide fixed

arunjose696 force-pushed the arun-sqc branch from df6b6dc to 1cd75e2 Compare May 22, 2024 13:15

devin-petersohn reviewed May 22, 2024

View reviewed changes

arunjose696 marked this pull request as draft May 22, 2024 19:49

arunjose696 force-pushed the arun-sqc branch 2 times, most recently from e6b035f to d406414 Compare May 23, 2024 11:08

github-advanced-security bot found potential problems May 23, 2024

View reviewed changes

modin/pandas/dataframe.py Fixed Show fixed Hide fixed

modin/pandas/dataframe.py Fixed Show fixed Hide fixed

arunjose696 force-pushed the arun-sqc branch from e9dbc16 to 631dbf2 Compare May 23, 2024 20:50

arunjose696 and others added 8 commits June 17, 2024 02:18

test_udf passing

820b399

All tests except one passing in modin/tests/pandas/dataframe

6a999aa

All tests in modin/tests/pandas/dataframe/ passing

3cf940a

PR comments

c85e708

renaming to PlainPandasQueryCompiler to NativeDataframeMode

88c4354

renaming to PlainPandasQueryCompiler to NativeDataframeMode

b09d0f7

PR comments + changes

e0590cb

Apply suggestions from code review

e8925cb

Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru> Signed-off-by: arunjose696 <arunjose696@gmail.com>

arunjose696 force-pushed the arun-sqc branch from 20c2929 to 8c6ad55 Compare June 24, 2024 06:25

fix conflict

3585c74

arunjose696 force-pushed the arun-sqc branch from 8c6ad55 to 3585c74 Compare June 24, 2024 06:37

arunjose696 marked this pull request as ready for review June 24, 2024 13:05

YarShev mentioned this pull request Jun 26, 2024

Poor performance of df.insert and df.to_parquet #7325

Open

3 tasks

YarShev reviewed Jul 2, 2024

View reviewed changes

Apply suggestions from code review

c748be6

Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>

arunjose696 force-pushed the arun-sqc branch from 3dd5412 to 3efc372 Compare July 5, 2024 06:42

YarShev reviewed Jul 5, 2024

View reviewed changes

arunjose696 force-pushed the arun-sqc branch 4 times, most recently from 3da8c6e to 17b12aa Compare July 8, 2024 10:51

PR comments

4f40c12

Signed-off-by: arunjose696 <arunjose696@gmail.com>

arunjose696 force-pushed the arun-sqc branch from 17b12aa to 4f40c12 Compare July 8, 2024 10:52

YarShev previously approved these changes Jul 8, 2024

View reviewed changes

arunjose696 force-pushed the arun-sqc branch from 96d7fca to aa25287 Compare July 11, 2024 08:00

arunjose696 dismissed YarShev’s stale review via 4f40c12 July 11, 2024 09:12

arunjose696 force-pushed the arun-sqc branch from aa25287 to 4f40c12 Compare July 11, 2024 09:12

YarShev changed the title ~~FEAT-#4605: Adding small query compiler~~ FEAT-#4605: Add native query compiler Jul 15, 2024

YarShev approved these changes Aug 26, 2024

View reviewed changes

YarShev merged commit da01571 into modin-project:main Aug 26, 2024
140 of 141 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT-#4605: Add native query compiler #7259

FEAT-#4605: Add native query compiler #7259

arunjose696 commented May 13, 2024

devin-petersohn left a comment

devin-petersohn May 22, 2024

YarShev May 22, 2024

devin-petersohn May 22, 2024

YarShev May 22, 2024

devin-petersohn May 24, 2024

YarShev May 27, 2024

YarShev May 27, 2024

anmyachev May 27, 2024

arunjose696 May 29, 2024

arunjose696 commented May 22, 2024

devin-petersohn commented May 22, 2024

YarShev left a comment

YarShev left a comment

		@@ -851,4 +851,11 @@ def _check_vars() -> None:
		)


		class UsePlainPandasQueryCompiler(EnvironmentVariable, type=bool):

FEAT-#4605: Add native query compiler #7259

FEAT-#4605: Add native query compiler #7259

Conversation

arunjose696 commented May 13, 2024

What do these changes do?

devin-petersohn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arunjose696 commented May 22, 2024

devin-petersohn commented May 22, 2024

YarShev left a comment

Choose a reason for hiding this comment

YarShev left a comment

Choose a reason for hiding this comment