Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider publishing dask/distributed nightlies to our nightly pip index #85

Open
vyasr opened this issue Jul 22, 2024 · 3 comments
Open

Comments

@vyasr
Copy link
Contributor

vyasr commented Jul 22, 2024

Currently we use rapids-dask-dependency to manage our dask pinnings across RAPIDS both during the development cycle and at release time. Since dask does not publish nightly wheels, only conda packages, during the development cycle we point directly to git URLs in the pip metadata (for more details, see the RDD Readme). This approach has generally been working for us, but it has some serious drawbacks:

  • In general pip's dependency resolver is far less intelligent when dealing with URLs than with versioned wheels and will often reclone/rebuild a package unnecessarily.
  • DLFW's particular build pipeline, which involves building pinned versions once and then installing them later, runs into the above issue because it simultaneously sees a build dask wheel and the git dependency from rapids-dask-dependency and the former does not satisfy the latter. In the case of DLFW, because the later part is not allowed to download new wheels at all, this actually results in a failure.
  • Direct URLs are explicitly disallowed by the official spec (and the original source, PEP 440). PyPI will reject any packages containing such dependencies, which in turn means that we will be blocked from publishing our nightlies on PyPI. This will not affect our release builds, however, since at that point we do pin to a specific version instead.
  • uv does not support transitive URL dependencies, and this is documented as an intentional behavior. This was discussed in a recent issue and seems unlikely to change any time soon. Since I anticipate uv usage only growing over time, we can reasonably expect that we'll start seeing users of our nightlies (perhaps only internal users to start, but still) run into this limitation. We can observe the issue easily by attempting to install a nightly RAPIDS package that depends on rapids-dask-dependency:
(rapids) coder ➜ ~ $ uv pip install --extra-index-url https://pypi.anaconda.org/rapidsai-wheels-nightly/simple 'dask-cudf-cu12>=24.10.00a0' --dry-run --prerelease=allow
error: Package `dask` attempted to resolve via URL: git+https://github.com/dask/dask.git@main. URL dependencies must be expressed as direct requirements or constraints. Consider adding `dask @ git+https://github.com/dask/dask.git@main` to your dependencies or constraints file.

Based on the above concerns, I believe it is time for us to consider publishing dask nightly wheels to our nightly pip index. We have previously discussed having the dask project build these themselves, but the response has generally been that they would want us to maintain this since they don't see much interest in such nightlies. We can restart that discussion if we think it's beneficial, but realistically I don't anticipate anything changing. Therefore, if we are going to build these I suggest that we manage building this in our own standalone repo and publish these to our own nightly index so that it's clear that these are just for our use in nightlies and not for general use. We should never upload these to our release index (or pypi.org). We now have precedent for building a wheel for an external project with the ucx-wheels repo. dask/distributed should be far easier to handle in this respect because they're pure Python, so there's not much tricky in actually building the wheels.

@vyasr
Copy link
Contributor Author

vyasr commented Jul 22, 2024

CC @rjzamora @pentschev @trxcllnt

@vyasr
Copy link
Contributor Author

vyasr commented Jul 22, 2024

Also CC @charlesbluca

@rjzamora
Copy link
Member

We can restart that discussion if we think it's beneficial, but realistically I don't anticipate anything changing. Therefore, if we are going to build these I suggest that we manage building this in our own standalone repo and publish these to our own nightly index so that it's clear that these are just for our use in nightlies and not for general use.

Thanks for crating this issue @vyasr - I think this makes sense.

Once we decide that we will do this, we should communicate our intentions to non-nvidian dask developers in case interest in nightlies has changed. However, I'm pretty confident that your proposed plan makes the most sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants