Skip to content

Modin 0.30.0

Compare
Choose a tag to compare
@anmyachev anmyachev released this 15 May 10:28
· 55 commits to main since this release
0.30.0
51b0a78

This release introduces support for DataFrame API standard, a distributed implementation for right merge/join,
more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions,
improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX.

Key Features and Updates Since 0.29.0

  • Stability and Bugfixes
    • FIX-#0000: Fix badge in README.md (#7213)
    • FIX-#0000: Make merge tests more stable by sorting results (#7266)
    • FIX-#6967: Remove read_pickle_distributed/to_pickle_distributed functions as deprecated (#7258)
    • FIX-#7093: Make sure idxmax and idxmin can work with string columns (#7193)
    • FIX-#7102: Remove enable_api_only mode in modin logging (#7194)
    • FIX-#7103: Move lower-level functionality logging to debug (#7184)
    • FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214)
    • FIX-#7185: Add extra check for some config classes (#7189)
    • FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209)
    • FIX-#7206: Make sure df.melt handle duplicate value_vars correctly (#7208)
    • FIX-#7219: Pin dataframe-api-compat>=0.2.7 (#7220)
    • FIX-#7221: Don't use use_legacy_dataset=False for ParquetDataset (#7222)
    • FIX-#7224: Importing modin.pandas.api.extensions overwrites re-export of pandas.api submodules (#7225)
    • FIX-#7233: Display property name in default_to_pandas error messages (#7269)
    • FIX-#7234: Deprecate HDK engine (#7235)
    • FIX-#7238: Fix docstring inheritance for cached_property and use it (#7239)
    • FIX-#7240: Allow doc_checker.py works with functools.cached_property (#7241)
    • FIX-#7246: Pin pyarrow>=10.0.1 as pandas 2.2.* does (#7247)
    • FIX-#7248: Make sure _validate_dtypes_sum_prod_mean works correctly with datetime types (#7237)
    • FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251)
  • Performance enhancements
    • PERF-#7227: Call modin_frame.combine() for merge and join only when necessary (#7228)
    • PERF-#7230: Don't preserve bad partition for merge (#7229)
  • Refactor Codebase
    • REFACTOR-#7242: Add type hints for modin/core/dataframe/algebra/ (#7243)
    • REFACTOR-#7260: Use extract_dtype internal function in more places (#7261)
  • Update testing suite
    • TEST-#7049: Add some sanity tests with pyarrow-backed pandas dataframes (#7199)
    • TEST-#7191: Fix ASV after changing default branch (#7190)
  • Documentation improvements
    • DOCS-#0000: Fix a typo with MODIN_CPUS number (#7198)
    • DOCS-#0000: Supplement Optimization Notes with a link to configs (#7197)
    • DOCS-#7217: Update docs as to when Modin operators work best (#7218)
    • DOCS-#7255: Update docs as to from_* functions (#7256)
  • New Features
    • FEAT-#5394: Reduce amount of remote calls for Map operator (#7136)
    • FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators (#7245)
    • FEAT-#6492: Add from_map feature to create dataframe (#7215)
    • FEAT-#6498: Make Fold operator more flexible (#7257)
    • FEAT-#6808: Implement __arrow_array__ for Series (#7200)
    • FEAT-#6890: Modin implementation of DataFrame API standard (#7216)
    • FEAT-#7139: Use ray-core instead of ray-default (#6955)
    • FEAT-#7187: Change master branch to main (#7188)
    • FEAT-#7202: Use custom resources for Ray (#7205)
    • FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204)
    • FEAT-#7207: Add the ability to assign a df to a columns selection without d2p (#7210)
    • FEAT-#7252: Add type hints for base.py (#7253)
    • FEAT-#7254: Support right merge/join (#7226)

Contributors

@Retribution98
@YarShev
@anmyachev
@arunjose696
@noloerino
@sfc-gh-jkew