Modin 0.23.0
Modin 0.23.0
This release upgrades the pandas version to 2.0. It also includes '.corr' speed-up, new
features, and bug fixes.
Key Features and Updates Since 0.22.0
- Stability and Bugfixes
- FIX-#1851: Squash multiple LogicalProject nodes (#6306)
- FIX-#3371: Remove pandas patch level pin (#6211)
- FIX-#4048: support sqlalchemy objects in
con
parameter forto_sql
(#5940) - FIX-#4485: fix 'clip' with list-like bounds and axis=None (#6344)
- FIX-#4954: defaults to pandas in
read_json
in case of rows having different columns (#5946) - FIX-#5077: fix 'Series.rename_axis' signature (#6324)
- FIX-#5461: fix groupby if dataframe has empty partitions (#6307)
- FIX-#6035: Fall back to Pandas, when merging unsupported column types (#6036)
- FIX-#6085: HDK: Implemented support for datetime64 dtypes serialization (#6086)
- FIX-#6208: HDK: Added support for median aggregation (#6209)
- FIX-#6215: Process '.corr(numeric_only=False)' parameter at the qc level (#6242)
- FIX-#6218: Fix
read_excel
and unpinopenpyxl
(#6247) - FIX-#6229: fix
Series.equals
/DataFrame.equals
with NA entries (#6270) - FIX-#6232: support DataFrame.cov(numeric_only=False) without fallback to pandas (#6262)
- FIX-#6237: Log errors only from deepest modin layer (#6238)
- FIX-#6245: support datetime64 with different resolutions types for HDK (#6255)
- FIX-#6246: fix 'groupby(..., as_index=False).agg(...)' case (#6263)
- FIX-#6258: Fix series to_dict (#6260)
- FIX-#6259: Fix astype("category") causing read-only buffer error (#6267)
- FIX-#6273: fix DataFrame.min/max/mean/median/skew/kurt with axis=None (#6275)
- FIX-#6297: fix experimental numpy.argmax/argmin with Nans in data (#6298)
- FIX-#6309: do not materialize axes for 'rank' operation (#6310)
- FIX-#6313: update MIN_RAY_VERSION var: 1.4.0 -> 1.13.0 (#6314)
- FIX-#6317: fix syntax error in 'push-to-master.yml' (#6318)
- FIX-#6336: pin 'pydantic<2' to fix CI (#6337)
- FIX-#6338: fix TypeError: WorksheetReader.init() got an unexpected keyword argument 'rich_text' (#6339)
- FIX-#6341: call _filter_empties only if shapes are different on particular axis (#6333)
- FIX-#6352: Fix the HdkOnNativeDataframePartition._width_cache property computation (#6353)
- FIX-#6354: Skip bad and pre-release versions (#6355)
- Performance enhancements
- Refactor Codebase
- Update testing suite
- New Features
- FEAT-#5684: Use TreeReduce implementation for 'pivot_table' in certain cases (#6089)
- FEAT-#5759: Implement lazy Arrow execution for the HDK engine (#6251)
- FEAT-#5936: support pandas 2.0.2 (#5995)
- FEAT-#6048: add
wait
method for Dask/Ray/Unidist wrappers (#6049) - FEAT-#6191: Implement
groupby.rolling
API (#6292) - FEAT-#6253: add 'dtype_backend' parameter support for read_parquet/read_feather (#6264)
- FEAT-#6256: HDK: Add support for DataFrameGroupBy.head/tail() (#6257)
- FEAT-#6284: Do not convert HDK query execution result to arrow. (#6286)
- FEAT-#6296: Add additional pyhdk launch parameters (#6303)
- FEAT-#6322: Give a warning only if the major or minor part of pandas version are different (#6323)
- FEAT-#6325: Add GPU execution option for HDK backend (#6326)
- FEAT-#6327: Bump pyhdk version to 0.7 (#6328)
- FEAT-#6351: Add a simple heuristic for fragment size when running on a GPU (#6346)
Contributors
@AndreyPavlenko
@YarShev
@alexbaden
@anmyachev
@dchigarev
@kurapov-peter
@mvashishtha
@vnlitvinov