Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REFACTOR: Unite Your Requirements! #1068

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

akmalsoliev
Copy link

With the arrival of a lightning-fast package manager, mighty UV, the requirements.txt is like yesterday's news. Instead of navigating through a jungle of dependencies, like a mismatched sock drawer, everything can now unite in one glorious spot—just like Italy in 1861!

This marvelous change even makes installing dependencies blazingly fast (except for cudf—polars is better anyways). Now it's as easy as pie with just one command: uv sync --all-extras. No more typing three commands when you can do it in one! 1 > 3 — gives you more time to enjoy that delicious Italian caffè.

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

With the arrival of a lightning-fast package manager, mighty UV, the
requirements.txt is like yesterday's news. Instead of navigating through
a jungle of dependencies, like a mismatched sock drawer, everything can
now unite in one glorious spot—just like Italy in 1861!

This marvelous change even makes installing dependencies blazingly fast
(except for cudf—polars is better anyways). Now it's as easy as pie with
just one command: uv sync --all-extras.  No more typing three commands
when you can do it in one! 1 > 3 — gives you more time to enjoy that
delicious Italian caffè.
@akmalsoliev akmalsoliev marked this pull request as draft September 24, 2024 19:10
@akmalsoliev akmalsoliev marked this pull request as ready for review September 24, 2024 19:13
@akmalsoliev
Copy link
Author

P.S. I've updated pyproject.toml and banished those pesky .txt files. More changes are coming; just testing the waters before diving in!

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there! Thanks for your PR. We definitly started to be opinionated to suggest uv in our contribution guide. However I have mixed feelings about this one. Let me try to mention a few reason (sorted by dev experience):

  • If someone's workflow is not on uv, then how can he/she/they pick up dev-dependencies?
  • uv sync is too strict for my (but I guess many) development settings and to avoid that it doesn't uninstall the rest of the libraries in the env, I need to remember to run uv sync --inexact instead
  • For a few set up based on docker (e.g. devcontainers and codespaces), having a file allows to easily copy and install the dependencies before even copying the python package. This leads to easier caching (at least in my experience).

pyproject.toml Outdated Show resolved Hide resolved
modin = ["modin"]
pandas = ["pandas>=0.25.3"]
polars = ["polars>=0.20.3"]
pyarrow = ["pyarrow>=11.0.0"]
dask = ["dask[dataframe]>=2024.7"]
dask = ["dask[dataframe]>=2023.5.0"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Marco recalls why that was the min version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was resolved, but 2024.7 is indeed the min version we can support, as we need dask_expr.

@akmalsoliev
Copy link
Author

Buongiorno Francesco (@FBruzzesi),

Embracing change can indeed be challenging, and that's perfectly normal.

  • If someone's workflow is not on uv, then how can he/she/they pick up dev-dependencies?

It's understandable that this might seem like a hurdle. However, I see it as an exciting chance to introduce the efficiency of uv. For those who prefer sticking with pip (those poor souls), I've ensured that the pyproject.toml is structured to allow all optional dependencies to be installed effortlessly with the command:
pip install .[docs,modin,pandas,polars,pyarrow,dask,dev]

  • uv sync is too strict for my (but I guess many) development settings, and to avoid that it doesn't uninstall the rest of the libraries in the env, I need to remember to run uv sync --inexact instead.

I understand that might be inconvenient, but this rigidity also promotes consistency in workflows. uv sync --inexact is a helpful option to adapt it to your specific needs.

  • For a few setups based on Docker (e.g., devcontainers and Codespaces), having a file allows for easier copying and installation of dependencies before even copying the Python package, which leads to better caching (at least in my experience).

From what I understand, Docker caching should remain intact whether using pyproject.toml or requirements.txt. If the current setup is causing issues, I'm keen to explore alternatives that work better for you. Here’s an example to illustrate:

FROM python:latest

COPY ./pyproject.toml /pyproject.toml

RUN pip install .

RUN echo installing dev && pip install .[dev]

RUN echo installing polars && pip install .[polars]

Please let me know if there's anything specific you find challenging, and I’d be happy to discuss further.

@FBruzzesi
Copy link
Member

Embracing change can indeed be challenging, and that's perfectly normal.

Definitely, and we would like to let people use whatever workflow works for them (not only pip) as well. Thanks for adjusting the pyproject section.

From what I understand, Docker caching should remain intact whether using pyproject.toml or requirements.txt. If the current setup is causing issues, I'm keen to explore alternatives that work better for you. Here’s an example to illustrate:

FROM python:latest

COPY ./pyproject.toml /pyproject.toml

RUN pip install .

RUN echo installing dev && pip install .[dev]

RUN echo installing polars && pip install .[polars]

Please let me know if there's anything specific you find challenging, and I’d be happy to discuss further.

The above snippet would run from RUN pip install . each time something changes in the codebase.
What I usually do is something along the lines:

...
COPY ./requirements.txt /requirements.txt

RUN uv pip install -r requirements.txt # this will be cached until something changes in requirements.txt

COPY <rest of codebase> <rest of codebase>

RUN uv pip install .
...

There should be a way to link requirements.txt dynamically into pyproject, so personally that's how I would do it but happy to discuss

@MarcoGorelli
Copy link
Member

thanks - you'll need to update the instructions for this to work (e.g. ci, contributing)

fancy making a PR first, so that github automatically runs ci for you, and then coming back to this?

@akmalsoliev
Copy link
Author

akmalsoliev commented Sep 27, 2024

Ciao @FBruzzesi,
Sorry for late reply, what you think about this setup?

FROM python:3.12-slim-bookworm
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

COPY ./pyproject.toml pyproject.toml

RUN uv venv

RUN uv pip compile pyproject.toml --extra dev > requirements.txt
RUN uv pip install -r requirements.txt

RUN echo hello 

RUN uv pip install -r pyproject.toml --extra docs

COPY . .

RUN uv sync --all-extras 

When trying on the local machine it felt like there was a caching of

RUN uv pip compile pyproject.toml --extra dev > requirements.txt
RUN uv pip install -r requirements.txt

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @akmalsoliev thanks again for the effort. The workflow you are proposing seems to work quite well and being able to cache layers properly in docker.

Given that is sorted out, I left some comments that probably require attention and, as @MarcoGorelli mentioned, we need this to be incorporated in the ci/cd workflows properly.
If possible including nox and the contributing documentation as well!

Let me know if I can help in any way forward 😉

modin = ["modin"]
pandas = ["pandas>=0.25.3"]
polars = ["polars>=0.20.3"]
pyarrow = ["pyarrow>=11.0.0"]
dask = ["dask[dataframe]>=2024.7"]
dask = ["dask[dataframe]>=2023.5.0"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was resolved, but 2024.7 is indeed the min version we can support, as we need dask_expr.

Comment on lines +35 to +51
dev = [
"covdefaults>=2.3.0",
"duckdb>=1.1.1",
"hypothesis>=6.112.1",
"pandas>=2.0.3",
"polars>=1.8.1",
"pre-commit>=3.5.0",
"pyarrow>=17.0.0",
"pytest>=8.3.3",
"pytest-cov>=5.0.0",
"pytest-env>=1.1.5",
"pytest-randomly>=3.15.0",
"scikit-learn>=1.3.2",
"tqdm>=4.66.5",
"typing-extensions>=4.12.2",
"dask[dataframe]>=2023.5.0"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For running in ci/cd we should not pin any of these, as we want to have some old and random versions as well. Also dask requires python 3.9 or above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not an issue, was more of demonstration, will be fixed

"mkdocstrings[python]>=0.26.1",
"polars>=1.0.0",
]
#cudf = ["cudf>=23.08.00"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for commenting this one out? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the delayed response. If I recall correctly, @MarcoGorelli brought up this issue regarding installing cuDF with conda. The installation process is quite complex, and I'm not sure how to resolve it.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants