Skip to content

Modin 0.23.0

Compare
Choose a tag to compare
@mvashishtha mvashishtha released this 06 Jul 15:35
· 370 commits to master since this release
6a5416c

Modin 0.23.0

This release upgrades the pandas version to 2.0. It also includes '.corr' speed-up, new
features, and bug fixes.

Key Features and Updates Since 0.22.0

  • Stability and Bugfixes
    • FIX-#1851: Squash multiple LogicalProject nodes (#6306)
    • FIX-#3371: Remove pandas patch level pin (#6211)
    • FIX-#4048: support sqlalchemy objects in con parameter for to_sql (#5940)
    • FIX-#4485: fix 'clip' with list-like bounds and axis=None (#6344)
    • FIX-#4954: defaults to pandas in read_json in case of rows having different columns (#5946)
    • FIX-#5077: fix 'Series.rename_axis' signature (#6324)
    • FIX-#5461: fix groupby if dataframe has empty partitions (#6307)
    • FIX-#6035: Fall back to Pandas, when merging unsupported column types (#6036)
    • FIX-#6085: HDK: Implemented support for datetime64 dtypes serialization (#6086)
    • FIX-#6208: HDK: Added support for median aggregation (#6209)
    • FIX-#6215: Process '.corr(numeric_only=False)' parameter at the qc level (#6242)
    • FIX-#6218: Fix read_excel and unpin openpyxl (#6247)
    • FIX-#6229: fix Series.equals/DataFrame.equals with NA entries (#6270)
    • FIX-#6232: support DataFrame.cov(numeric_only=False) without fallback to pandas (#6262)
    • FIX-#6237: Log errors only from deepest modin layer (#6238)
    • FIX-#6245: support datetime64 with different resolutions types for HDK (#6255)
    • FIX-#6246: fix 'groupby(..., as_index=False).agg(...)' case (#6263)
    • FIX-#6258: Fix series to_dict (#6260)
    • FIX-#6259: Fix astype("category") causing read-only buffer error (#6267)
    • FIX-#6273: fix DataFrame.min/max/mean/median/skew/kurt with axis=None (#6275)
    • FIX-#6297: fix experimental numpy.argmax/argmin with Nans in data (#6298)
    • FIX-#6309: do not materialize axes for 'rank' operation (#6310)
    • FIX-#6313: update MIN_RAY_VERSION var: 1.4.0 -> 1.13.0 (#6314)
    • FIX-#6317: fix syntax error in 'push-to-master.yml' (#6318)
    • FIX-#6336: pin 'pydantic<2' to fix CI (#6337)
    • FIX-#6338: fix TypeError: WorksheetReader.init() got an unexpected keyword argument 'rich_text' (#6339)
    • FIX-#6341: call _filter_empties only if shapes are different on particular axis (#6333)
    • FIX-#6352: Fix the HdkOnNativeDataframePartition._width_cache property computation (#6353)
    • FIX-#6354: Skip bad and pre-release versions (#6355)
  • Performance enhancements
    • PERF-#4560: Implement '.corr()' method using MapReduce pattern (#6193)
    • PERF-#6319: remove '__make_init_labels_args' explicit calls that materialize axes (#6312)
  • Refactor Codebase
    • REFACTOR-#0000: Remove OmnisciWorker as unused (#6278)
    • REFACTOR-#0000: rename 'exc' -> 'err' (#6252)
    • REFACTOR-#6279: HDK DataFrame should not have more than one partition (#6280)
    • REFACTOR-#6329: deprecate cloud feature (#6330)
  • Update testing suite
    • TEST-#6282: Reduce copy-pasteness in ci.yml (#6283)
    • TEST-#6308: add to_numpy ASV bench (#6305)
    • TEST-#6315: increase 'install_timeout' for ASV benchmarks: 600 -> 6000 sec (#6316)
  • New Features
    • FEAT-#5684: Use TreeReduce implementation for 'pivot_table' in certain cases (#6089)
    • FEAT-#5759: Implement lazy Arrow execution for the HDK engine (#6251)
    • FEAT-#5936: support pandas 2.0.2 (#5995)
    • FEAT-#6048: add wait method for Dask/Ray/Unidist wrappers (#6049)
    • FEAT-#6191: Implement groupby.rolling API (#6292)
    • FEAT-#6253: add 'dtype_backend' parameter support for read_parquet/read_feather (#6264)
    • FEAT-#6256: HDK: Add support for DataFrameGroupBy.head/tail() (#6257)
    • FEAT-#6284: Do not convert HDK query execution result to arrow. (#6286)
    • FEAT-#6296: Add additional pyhdk launch parameters (#6303)
    • FEAT-#6322: Give a warning only if the major or minor part of pandas version are different (#6323)
    • FEAT-#6325: Add GPU execution option for HDK backend (#6326)
    • FEAT-#6327: Bump pyhdk version to 0.7 (#6328)
    • FEAT-#6351: Add a simple heuristic for fragment size when running on a GPU (#6346)

Contributors

@AndreyPavlenko
@YarShev
@alexbaden
@anmyachev
@dchigarev
@kurapov-peter
@mvashishtha
@vnlitvinov