Skip to content

Modin 0.29.0

Compare
Choose a tag to compare
@anmyachev anmyachev released this 15 Apr 18:05
0.29.0
6d64e08

This release introduces modin.pandas.testing and modin.pandas.arrays modules, faster implementation (range-partitioning) for
pivot_table, unique, drop_duplicates, nunique, df.resample functions, new functions to interact with Dask: to/from_dask,
distributed implementation for Series.case_when, optimization for astype function with scalar dtype.

Key Features and Updates Since 0.28.0

  • Stability and Bugfixes
    • FIX-#6227: Make sure Series.unique() with pyarrow dtype returns ArrowExtensionArray (#7042)
    • FIX-#6793: Use pandas_dtype instead of np.dtype for some more places in Modin code (#6794)
    • FIX-#7039: Pass scalar dtype as is to astype query compiler (#7152)
    • FIX-#7051: Update exception message for astype function (#7052)
    • FIX-#7054: Update exception message for shift function (#7055)
    • FIX-#7056: Update exception message for iloc/loc functions (#7057)
    • FIX-#7058: Update exception message for insert function (#7059)
    • FIX-#7060: Fix pivot when index or columns are of Index type (#7061)
    • FIX-#7062: Update exception message for aggregate function (#7063)
    • FIX-#7072: Replace MaterializationHook with the materialized object on serialization (#7075)
    • FIX-#7088: Make sure rank raises No axis named None... exception (#7089)
    • FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116)
    • FIX-#7135: Fix appending a new row (#7172)
    • FIX-#7153: Fix Series.corr with method != pearson (#7158)
    • FIX-#7157: Make sure quantile function works with numeric_only=True (#7160)
    • FIX-#7170: Don't use MinPartitionSize configuration variable in remote context (#7177)
  • Performance enhancements
    • PERF-#5296: Partition parquet file if it has too few row groups (#7016)
    • PERF-#7068: Provide shape_hint="column" for some more operations with Series (#7069)
    • PERF-#7123: Preserve shape_hint for dropna (#7124)
    • PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True (#7131)
    • PERF-#7132: Preserve partition lengths in apply_full_axis with keep_partitioning=False (#7133)
    • PERF-#7150: Reduce peak memory consumption (#7149)
  • Refactor Codebase
    • REFACTOR-#3257: Move logging and caching to the gen_data internal function (#7046)
    • REFACTOR-#7105: Deprecate cfg.RangePartitioningGroupby (#7161)
    • REFACTOR-#7106: Rename from/to_ray_dataset to from/to_ray (#7107)
    • REFACTOR-#7109: Remove the outdated aws_example.yaml file (#7110)
  • Update testing suite
    • TEST-#3622: Centralize tests in Modin (#7137)
    • TEST-#6016: Make sure eval_general doesn't expect exceptions by default (#6954)
    • TEST-#7064: Explicitly check for exceptions in test_groupby.py (#7065)
    • TEST-#7066: Explicitly check for exceptions in test_io.py (#7067)
    • TEST-#7073: Explicitly check for exceptions in test_default.py (#7074)
    • TEST-#7076: Explicitly check for exceptions in test_map_metadata.py (#7077)
    • TEST-#7082: Explicitly check for exceptions in test_series.py (#7083)
    • TEST-#7084: Explicitly check for exceptions in test_indexing.py (#7085)
    • TEST-#7086: Explicitly check for exceptions in test_reduce.py (#7087)
    • TEST-#7094: Rename raising_exceptions argument of eval_general testing function (#7095)
    • TEST-#7125: Explicitly install modin in CI tests (#7126)
    • TEST-#7165: Add codecov token to fix CI on master (#7175)
    • TEST-#7166: Fix HDF tests in CI (#7167)
    • TEST-#7173: Update github actions (#7168)
  • Documentation improvements
    • DOCS-#2434: Clarify the use of --signoff option (#7145)
    • DOCS-#6987: Rework range-partitioning docs (#7169)
    • DOCS-#7144: Add information about logging from user defined function (#7155)
  • New Features
    • FEAT-#4527: Add Modin logging to AxisPartition and BlockPartition classes (#7079)
    • FEAT-#6783: Implement modin.pandas.testing module (#7045)
    • FEAT-#6929: Implement Series.case_when in a distributed way (#6972)
    • FEAT-#7004: Use generators when returning from _deploy_ray_func remote function. (#7005)
    • FEAT-#7021: Implement to/from_dask functions (#7022)
    • FEAT-#7047: Add range-partitioning implementation for .pivot_table() (#7048)
    • FEAT-#7070: Add modin.pandas.arrays module (#7071)
    • FEAT-#7078: Add modin_layer names to classes that inherit ClassLogger (#7099)
    • FEAT-#7090: Add range-partitioning implementation for .unique() and .drop_duplicates() (#7091)
    • FEAT-#7100: Add range-partitioning impl for nunique() (#7101)
    • FEAT-#7102: Deprecate enable_api_only mode in modin logging (#7114)
    • FEAT-#7111: Implemented @remote_function decorator with cache (#7112)
    • FEAT-#7117: Support building range-partitioning from an index level (#7120)
    • FEAT-#7118: Add range-partitioning impl for df.resample() (#7140)
    • FEAT-#7128: Update minimal supported version of Ray up to 2.1.0 (#7129)
    • FEAT-#7141: Add an ability to use config variables with a context manager (#7142)
    • FEAT-#7146: Use BaseQueryCompiler, BasePandasDataset, DataFrame or Series type hints at a high level (#7147)
    • FEAT-#7156: Add type hints for Series (#7154)
    • FEAT-#7178: Add type hints for DataFrame (#7179)
    • FEAT-#7180: Add type hints for modin.pandas.[functions] (#7181)

Contributors

@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-mvashishtha