Modin 0.29.0
This release introduces modin.pandas.testing
and modin.pandas.arrays
modules, faster implementation (range-partitioning) for
pivot_table
, unique
, drop_duplicates
, nunique
, df.resample
functions, new functions to interact with Dask: to/from_dask
,
distributed implementation for Series.case_when
, optimization for astype
function with scalar dtype.
Key Features and Updates Since 0.28.0
- Stability and Bugfixes
- FIX-#6227: Make sure
Series.unique()
with pyarrow dtype returnsArrowExtensionArray
(#7042) - FIX-#6793: Use
pandas_dtype
instead ofnp.dtype
for some more places in Modin code (#6794) - FIX-#7039: Pass scalar dtype as is to
astype
query compiler (#7152) - FIX-#7051: Update exception message for
astype
function (#7052) - FIX-#7054: Update exception message for
shift
function (#7055) - FIX-#7056: Update exception message for
iloc/loc
functions (#7057) - FIX-#7058: Update exception message for
insert
function (#7059) - FIX-#7060: Fix
pivot
when index or columns are of Index type (#7061) - FIX-#7062: Update exception message for
aggregate
function (#7063) - FIX-#7072: Replace
MaterializationHook
with the materialized object on serialization (#7075) - FIX-#7088: Make sure
rank
raisesNo axis named None...
exception (#7089) - FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116)
- FIX-#7135: Fix appending a new row (#7172)
- FIX-#7153: Fix
Series.corr
withmethod != pearson
(#7158) - FIX-#7157: Make sure
quantile
function works withnumeric_only=True
(#7160) - FIX-#7170: Don't use
MinPartitionSize
configuration variable in remote context (#7177)
- FIX-#6227: Make sure
- Performance enhancements
- PERF-#5296: Partition parquet file if it has too few row groups (#7016)
- PERF-#7068: Provide
shape_hint="column"
for some more operations with Series (#7069) - PERF-#7123: Preserve
shape_hint
for dropna (#7124) - PERF-#7130: Preserve partition lengths in
apply_full_axis
withkeep_partitioning=True
(#7131) - PERF-#7132: Preserve partition lengths in
apply_full_axis
withkeep_partitioning=False
(#7133) - PERF-#7150: Reduce peak memory consumption (#7149)
- Refactor Codebase
- Update testing suite
- TEST-#3622: Centralize tests in Modin (#7137)
- TEST-#6016: Make sure
eval_general
doesn't expect exceptions by default (#6954) - TEST-#7064: Explicitly check for exceptions in
test_groupby.py
(#7065) - TEST-#7066: Explicitly check for exceptions in
test_io.py
(#7067) - TEST-#7073: Explicitly check for exceptions in
test_default.py
(#7074) - TEST-#7076: Explicitly check for exceptions in
test_map_metadata.py
(#7077) - TEST-#7082: Explicitly check for exceptions in
test_series.py
(#7083) - TEST-#7084: Explicitly check for exceptions in
test_indexing.py
(#7085) - TEST-#7086: Explicitly check for exceptions in
test_reduce.py
(#7087) - TEST-#7094: Rename
raising_exceptions
argument ofeval_general
testing function (#7095) - TEST-#7125: Explicitly install modin in CI tests (#7126)
- TEST-#7165: Add codecov token to fix CI on master (#7175)
- TEST-#7166: Fix HDF tests in CI (#7167)
- TEST-#7173: Update github actions (#7168)
- Documentation improvements
- New Features
- FEAT-#4527: Add Modin logging to
AxisPartition
andBlockPartition
classes (#7079) - FEAT-#6783: Implement
modin.pandas.testing
module (#7045) - FEAT-#6929: Implement
Series.case_when
in a distributed way (#6972) - FEAT-#7004: Use generators when returning from
_deploy_ray_func
remote function. (#7005) - FEAT-#7021: Implement
to/from_dask
functions (#7022) - FEAT-#7047: Add range-partitioning implementation for
.pivot_table()
(#7048) - FEAT-#7070: Add
modin.pandas.arrays
module (#7071) - FEAT-#7078: Add
modin_layer
names to classes that inheritClassLogger
(#7099) - FEAT-#7090: Add range-partitioning implementation for
.unique()
and.drop_duplicates()
(#7091) - FEAT-#7100: Add range-partitioning impl for
nunique()
(#7101) - FEAT-#7102: Deprecate
enable_api_only
mode in modin logging (#7114) - FEAT-#7111: Implemented
@remote_function
decorator with cache (#7112) - FEAT-#7117: Support building range-partitioning from an index level (#7120)
- FEAT-#7118: Add range-partitioning impl for
df.resample()
(#7140) - FEAT-#7128: Update minimal supported version of Ray up to 2.1.0 (#7129)
- FEAT-#7141: Add an ability to use config variables with a context manager (#7142)
- FEAT-#7146: Use
BaseQueryCompiler
,BasePandasDataset
,DataFrame
orSeries
type hints at a high level (#7147) - FEAT-#7156: Add type hints for
Series
(#7154) - FEAT-#7178: Add type hints for
DataFrame
(#7179) - FEAT-#7180: Add type hints for
modin.pandas.[functions]
(#7181)
- FEAT-#4527: Add Modin logging to
Contributors
@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-mvashishtha