Modin 0.30.0
This release introduces support for DataFrame API standard, a distributed implementation for right merge/join,
more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions,
improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX.
Key Features and Updates Since 0.29.0
- Stability and Bugfixes
- FIX-#0000: Fix badge in README.md (#7213)
- FIX-#0000: Make merge tests more stable by sorting results (#7266)
- FIX-#6967: Remove
read_pickle_distributed
/to_pickle_distributed
functions as deprecated (#7258) - FIX-#7093: Make sure
idxmax
andidxmin
can work with string columns (#7193) - FIX-#7102: Remove
enable_api_only
mode in modin logging (#7194) - FIX-#7103: Move lower-level functionality logging to debug (#7184)
- FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214)
- FIX-#7185: Add extra check for some config classes (#7189)
- FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209)
- FIX-#7206: Make sure
df.melt
handle duplicatevalue_vars
correctly (#7208) - FIX-#7219: Pin
dataframe-api-compat>=0.2.7
(#7220) - FIX-#7221: Don't use
use_legacy_dataset=False
forParquetDataset
(#7222) - FIX-#7224: Importing
modin.pandas.api.extensions
overwrites re-export ofpandas.api
submodules (#7225) - FIX-#7233: Display property name in
default_to_pandas
error messages (#7269) - FIX-#7234: Deprecate HDK engine (#7235)
- FIX-#7238: Fix docstring inheritance for
cached_property
and use it (#7239) - FIX-#7240: Allow
doc_checker.py
works withfunctools.cached_property
(#7241) - FIX-#7246: Pin
pyarrow>=10.0.1
aspandas 2.2.*
does (#7247) - FIX-#7248: Make sure
_validate_dtypes_sum_prod_mean
works correctly with datetime types (#7237) - FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251)
- Performance enhancements
- Refactor Codebase
- Update testing suite
- Documentation improvements
- New Features
- FEAT-#5394: Reduce amount of remote calls for
Map
operator (#7136) - FEAT-#5394: Reduce amount of remote calls for
TreeReduce
andGroupByReduce
operators (#7245) - FEAT-#6492: Add
from_map
feature to create dataframe (#7215) - FEAT-#6498: Make
Fold
operator more flexible (#7257) - FEAT-#6808: Implement
__arrow_array__
for Series (#7200) - FEAT-#6890: Modin implementation of DataFrame API standard (#7216)
- FEAT-#7139: Use
ray-core
instead ofray-default
(#6955) - FEAT-#7187: Change
master
branch tomain
(#7188) - FEAT-#7202: Use custom resources for Ray (#7205)
- FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204)
- FEAT-#7207: Add the ability to assign a df to a columns selection without d2p (#7210)
- FEAT-#7252: Add type hints for
base.py
(#7253) - FEAT-#7254: Support right
merge
/join
(#7226)
- FEAT-#5394: Reduce amount of remote calls for
Contributors
@Retribution98
@YarShev
@anmyachev
@arunjose696
@noloerino
@sfc-gh-jkew