Skip to content

Commit

Permalink
Merge branch 'modin-project:main' into arun-sqc
Browse files Browse the repository at this point in the history
  • Loading branch information
arunjose696 authored Jul 11, 2024
2 parents 4f40c12 + 4e7afa7 commit 96d7fca
Show file tree
Hide file tree
Showing 21 changed files with 224 additions and 73 deletions.
2 changes: 1 addition & 1 deletion .github/actions/python-only/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ description: "Prepare the environment to run simple tasks"
inputs:
python-version:
description: "Python version to install"
default: "3.9.x"
default: "3.9"

runs:
using: "composite"
Expand Down
10 changes: 10 additions & 0 deletions .github/dependabot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "monthly"
groups:
github-actions:
patterns:
- "*"
2 changes: 1 addition & 1 deletion .github/workflows/ci-notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
activate-environment: modin_on_unidist
if: matrix.execution == 'pandas_on_unidist'
- name: Cache datasets
uses: actions/cache@v2
uses: actions/cache@v4
with:
path: taxi.csv
# update cache only if notebooks require it to be changed
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/ci-required.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
check-pr-title:
runs-on: ubuntu-latest
steps:
- uses: Slashgear/action-check-pr-title@v3.0.0
- uses: Slashgear/action-check-pr-title@v4.3.0
with:
# NOTE: If you change the allowed prefixes here, update
# the documentation about them in /docs/development/contributing.rst
Expand All @@ -28,7 +28,7 @@ jobs:
fetch-depth: 1
- uses: actions/setup-python@v5
with:
python-version: "3.9.x"
python-version: "3.9"
architecture: "x64"
cache: "pip"
cache-dependency-path: '**/requirements-doc.txt'
Expand Down
63 changes: 49 additions & 14 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ on:
- setup.py
- versioneer.py
push:
schedule:
- cron: "30 2 * * WED"
- cron: "30 2 * * THU"
concurrency:
# Cancel other jobs in the same branch. We don't care whether CI passes
# on old commits.
Expand All @@ -26,21 +29,44 @@ env:
MODIN_GITHUB_CI: true

jobs:
python-filter:
runs-on: ubuntu-latest
outputs:
python-version: ${{ steps.choose.outputs.python-version }}
steps:
- id: choose
run: |
if [[ "${{ github.event.schedule }}" = "30 2 * * WED" ]]
then
echo "python-version=3.10" >> "$GITHUB_OUTPUT"
elif [[ "${{ github.event.schedule }}" = "30 2 * * THU" ]]
then
echo "python-version=3.11" >> "$GITHUB_OUTPUT"
else
echo "python-version=3.9" >> "$GITHUB_OUTPUT"
fi
lint-mypy:
needs: [python-filter]
name: lint (mypy)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/python-only
with:
python-version: ${{ needs.python-filter.outputs.python-version }}
- run: pip install -r requirements-dev.txt
- run: mypy --config-file mypy.ini

lint-flake8:
needs: [python-filter]
name: lint (flake8)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/python-only
with:
python-version: ${{ needs.python-filter.outputs.python-version }}
# NOTE: If you are changing the set of packages installed here, make sure that
# the dev requirements match them.
- run: pip install flake8 flake8-print flake8-no-implicit-concat
Expand All @@ -49,6 +75,7 @@ jobs:
- run: flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py

test-api-and-no-engine:
needs: [python-filter]
name: Test API, headers and no-engine mode
runs-on: ubuntu-latest
defaults:
Expand All @@ -59,14 +86,15 @@ jobs:
- uses: ./.github/actions/mamba-env
with:
environment-file: requirements/requirements-no-engine.yml
python-version: ${{ needs.python-filter.outputs.python-version }}
- run: python -m pytest modin/tests/pandas/test_api.py
- run: python -m pytest modin/tests/test_executions_api.py
- run: python -m pytest modin/tests/test_headers.py
- run: python -m pytest modin/tests/core/test_dispatcher.py::test_add_option
- uses: ./.github/actions/upload-coverage

test-clean-install:
needs: [lint-flake8]
needs: [lint-flake8, python-filter]
strategy:
matrix:
os:
Expand All @@ -80,6 +108,8 @@ jobs:
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/python-only
with:
python-version: ${{ needs.python-filter.outputs.python-version }}
- run: python -m pip install -e ".[all]"
- name: Ensure Ray and Dask engines start up
run: |
Expand All @@ -94,7 +124,7 @@ jobs:
if: matrix.os == 'ubuntu'

test-internals:
needs: [lint-flake8]
needs: [lint-flake8, python-filter]
runs-on: ubuntu-latest
defaults:
run:
Expand All @@ -105,6 +135,7 @@ jobs:
- uses: ./.github/actions/mamba-env
with:
environment-file: environment-dev.yml
python-version: ${{ needs.python-filter.outputs.python-version }}
- name: Internals tests
run: python -m pytest modin/tests/core/test_dispatcher.py
- run: python -m pytest modin/tests/config
Expand All @@ -120,7 +151,7 @@ jobs:
- uses: ./.github/actions/upload-coverage

test-defaults:
needs: [lint-flake8]
needs: [lint-flake8, python-filter]
runs-on: ubuntu-latest
defaults:
run:
Expand All @@ -130,12 +161,13 @@ jobs:
execution: [BaseOnPython]
env:
MODIN_TEST_DATASET_SIZE: "small"
name: Test ${{ matrix.execution }} execution, Python 3.9
name: Test ${{ matrix.execution }} execution, Python ${{ needs.python-filter.outputs.python-version }}"
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/mamba-env
with:
environment-file: environment-dev.yml
python-version: ${{ needs.python-filter.outputs.python-version }}
- name: Install HDF5
run: sudo apt update && sudo apt install -y libhdf5-dev
- name: xgboost tests
Expand Down Expand Up @@ -244,15 +276,15 @@ jobs:
"${{ steps.filter.outputs.ray }}" "${{ steps.filter.outputs.dask }}" >> $GITHUB_OUTPUT
test-all-unidist:
needs: [lint-flake8, execution-filter]
needs: [lint-flake8, execution-filter, python-filter]
if: github.event_name == 'push' || needs.execution-filter.outputs.unidist == 'true'
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.9"]
python-version: [ "${{ needs.python-filter.outputs.python-version }}" ]
unidist-backend: ["mpi"]
env:
MODIN_ENGINE: "Unidist"
Expand Down Expand Up @@ -318,13 +350,13 @@ jobs:
- uses: ./.github/actions/upload-coverage

test-all:
needs: [lint-flake8, execution-filter]
needs: [lint-flake8, execution-filter, python-filter]
strategy:
matrix:
os:
- ubuntu
- windows
python-version: ["3.9"]
python-version: [ "${{ needs.python-filter.outputs.python-version }}" ]
engine: ${{ fromJSON( github.event_name == 'push' && '["python", "ray", "dask"]' || needs.execution-filter.outputs.engines ) }}
test_task:
- group_1
Expand Down Expand Up @@ -450,14 +482,14 @@ jobs:
if: matrix.os == 'windows'

test-sanity:
needs: [lint-flake8, execution-filter]
needs: [lint-flake8, execution-filter, python-filter]
if: github.event_name == 'pull_request'
strategy:
matrix:
os:
- ubuntu
- windows
python-version: ["3.9"]
python-version: [ "${{ needs.python-filter.outputs.python-version }}" ]
execution:
- name: ray
shell-ex: "python -m pytest"
Expand Down Expand Up @@ -583,7 +615,7 @@ jobs:
- uses: ./.github/actions/upload-coverage

test-experimental:
needs: [lint-flake8]
needs: [lint-flake8, python-filter]
runs-on: ubuntu-latest
defaults:
run:
Expand All @@ -605,6 +637,7 @@ jobs:
- uses: ./.github/actions/mamba-env
with:
environment-file: environment-dev.yml
python-version: ${{ needs.python-filter.outputs.python-version }}
- name: Install HDF5
run: sudo apt update && sudo apt install -y libhdf5-dev
- run: python -m pytest -n 2 modin/tests/pandas/dataframe/test_map_metadata.py
Expand All @@ -614,14 +647,14 @@ jobs:
- uses: ./.github/actions/upload-coverage

test-spreadsheet:
needs: [lint-flake8]
needs: [lint-flake8, python-filter]
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.9"]
python-version: [ "${{ needs.python-filter.outputs.python-version }}" ]
engine: ["ray", "dask"]
env:
MODIN_EXPERIMENTAL: "True"
Expand Down Expand Up @@ -682,7 +715,7 @@ jobs:
delete-merged: true

upload-coverage:
needs: [merge-coverage-artifacts]
needs: [merge-coverage-artifacts, python-filter]
if: always() # we need to run it regardless of some job being skipped, like in PR
runs-on: ubuntu-latest
defaults:
Expand All @@ -691,6 +724,8 @@ jobs:
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/python-only
with:
python-version: ${{ needs.python-filter.outputs.python-version }}
- name: Download coverage data
uses: actions/download-artifact@v4
with:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,16 @@ jobs:
uses: actions/checkout@v4

- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: +security-and-quality
config-file: ./.github/workflows/codeql/codeql-config.yml

- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@v3

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{ matrix.language }}"
2 changes: 1 addition & 1 deletion NOTICE
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Modin

Copyright (c) 2018-2023 Modin Developers.
Copyright (c) 2018-2024 Modin Developers.
16 changes: 16 additions & 0 deletions docs/_static/custom.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
document.addEventListener("DOMContentLoaded", function () {
var script = document.createElement("script");
script.type = "module";
script.id = "runllm-widget-script"

script.src = "https://cdn.jsdelivr.net/npm/@runllm/search-widget@stable/dist/run-llm-search-widget.es.js";

script.setAttribute("version", "stable");
script.setAttribute("runllm-keyboard-shortcut", "Mod+j"); // cmd-j or ctrl-j to open the widget.
script.setAttribute("runllm-name", "Modin");
script.setAttribute("runllm-position", "BOTTOM_RIGHT");
script.setAttribute("runllm-assistant-id", "164");

script.async = true;
document.head.appendChild(script);
});
4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def noop_decorator(*args, **kwargs):
export_config_help(configs_file_path)

project = "Modin"
copyright = "2018-2023, Modin Developers."
copyright = "2018-2024, Modin Developers."
author = "Modin contributors"

# The short X.Y version
Expand Down Expand Up @@ -115,6 +115,8 @@ def noop_decorator(*args, **kwargs):
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path .
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
html_static_path = ["_static"]
html_js_files = ["custom.js"]

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
Expand Down
5 changes: 2 additions & 3 deletions docs/usage_guide/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,15 @@ The following tutorials cover the basic usage of Modin. `Here <https://www.youtu
The following tutorials covers more advanced features in Modin:

- Exercise 4: Experimental Features in Modin (Spreadsheet, Progress Bar) [`Source PandasOnRay <https://github.com/modin-project/modin/blob/main/examples/tutorial/jupyter/execution/pandas_on_ray/local/exercise_4.ipynb>`__, `Source PandasOnDask <https://github.com/modin-project/modin/blob/main/examples/tutorial/jupyter/execution/pandas_on_dask/local/exercise_4.ipynb>`__]
- Exercise 5: Setting up Modin in a Cluster Environment [`Source PandasOnRay <https://github.com/modin-project/modin/blob/main/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_5.ipynb>`__]
- Exercise 6: Running Modin in a Cluster Environment [`Source PandasOnRay <https://github.com/modin-project/modin/blob/main/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_6.ipynb>`__]
- Exercise 5: Setting up Modin in a Cluster Environment [`Source PandasOnRay <https://github.com/modin-project/modin/blob/main/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_5.py>`__]

How to get required dependencies for the tutorial notebooks and to run them please refer to the respective `README.md <https://github.com/modin-project/modin/tree/main/examples/tutorial/jupyter/README.md>`__ file.


Data Science Benchmarks
'''''''''''''''''''''''

- Using Modin with the NYC Taxi Dataset [`Source <https://github.com/modin-project/modin/blob/main/examples/jupyter/NYC_Taxi.ipynb>`__]
- Using Modin with the NYC Taxi Dataset [`Source <https://github.com/modin-project/modin/blob/main/examples/jupyter/Modin_Taxi.ipynb>`__]
- Using Modin with the Census Dataset (coming soon...)
- Using Modin with the Plasticc Dataset (coming soon...)

Expand Down
4 changes: 4 additions & 0 deletions modin/core/dataframe/algebra/binary.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,7 @@ def register(
cls,
func: Callable[..., pandas.DataFrame],
join_type: str = "outer",
sort: bool = None,
labels: str = "replace",
infer_dtypes: Optional[str] = None,
) -> Callable[..., PandasQueryCompiler]:
Expand All @@ -310,6 +311,8 @@ def register(
Binary function to execute. Have to be able to accept at least two arguments.
join_type : {'left', 'right', 'outer', 'inner', None}, default: 'outer'
Type of join that will be used if indices of operands are not aligned.
sort : bool, default: None
Whether to sort index and columns or not.
labels : {"keep", "replace", "drop"}, default: "replace"
Whether keep labels from left Modin DataFrame, replace them with labels
from joined DataFrame or drop altogether to make them be computed lazily later.
Expand Down Expand Up @@ -419,6 +422,7 @@ def caller(
lambda x, y: func(x, y, *args, **kwargs),
[other._modin_frame],
join_type=join_type,
sort=sort,
labels=labels,
dtypes=dtypes,
),
Expand Down
Loading

0 comments on commit 96d7fca

Please sign in to comment.