-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split recipe in components #48
Comments
Related: conda-forge/nvcc-feedstock#35 |
Personally I'd hope that we wait until the CI infrastructure is ready to actually test GPU packages (conda-forge/conda-forge.github.io#1062)... The current approach provides a certain degree of guarantee that packages built like this will work fine, but I am less confident if all components are split. In addition, at least CuPy is designed to be released for each CUDA version, so the original concern
is not really solved unless each software changes the packaging expectation. Just my two cents. |
Agreed. Libraries would need to move towards a path of gracefully handling things at runtime. I.E. CUDA 11.2 introduced It would protect libraries from failing to load due to missing symbols though. |
Isn't nvrtc used by packages using |
I don't believe so? If I remember correctly the only thing nvcc links is |
|
As of CUDA 11.0 it has stayed at
|
So let's think about how this could be done. Nvidia maintains official RPMs for CentOS. Can we repackage them right away? |
What do you mean by that? We already have a mechanism to download cudatoolkit and repackaging them. |
I realized we are also packaging Windows here, so we can't automate the RPM->Conda extraction (which was the idea I was hinting at). In that case, I guess my proposal means "repackage in different outputs following the same split Nvidia uses for CentOS". I haven't checked if these follow the same scheme proposed in the CUDA components page. Anyway, in short. The question is which scheme we follow: A. The official one proposed by Nvidia here. |
@jakirkham it would likely be good for you to chime in here |
I'd rather us not spend too much time with the current structure of the package for the same reason Leo already outlined above ( #48 (comment) ) |
How do we proceed here? The wait for the GPU CI infrastructure that Leo mentions is potentially quite long. With 10.2-11.4 we'd be back to building 6 versions for everything (especially painful for packages like pytorch/tensorflow), where we could realistically get away with 1-2 if we build per major version - that would clearly be a big win. However, perhaps pursuing these two things is not orthogonal? We can start migrating for 11.3/11.4 while still pushing forward design & implementation for a major-based split? And as soon as the latter becomes available, we could try switching feedstocks. |
I don't think we need GPU CI infrastructure to do a split. We can split now. It's a lot to ask of maintainers of packages like tensorflow to rebuild when we can do the split and avoid any rebuilds. |
I agree that a split package is the best way forward, but we also need to decide which general strategy we are going to follow for CUDA support. Last If we do go ahead and decide for a split package, we still need to decide which subpackages will be created. This might be involve more work than it looks like, since I don't know if there's a resource where the different CUDA components list their corresponding files. Also, we need to account for this:
Which allows us to mix and match components from previous releases as part of the new CUDA "distribution" once we drop 10.2. Question is how to express all of these relationships in the meta.yaml so it works for all versions as it does now. In short, all of this is a non-trivial amount of work. |
Mix-and-match of CUDA components is problematic even within patch versions. For example cuSPARSE added new APIs in 11.3.1 not seen in 11.3.0, which caused hard-to-detect bug reports to CuPy (credit goes to @kmaehashi). Splitting is hypothetically doable, but without a GPU CI to test it we are sending a time bomb to the feedstock maintainers who should not have to worry about this because AFAIK no upstream libraries test this setting seriously, at least CuPy doesn't. |
I don't get why we need to mix and match components.
Why do we need to take this into account? If we go with the original idea, we'll split the toolkit to 2 packages.
We'd have two run_exports from |
Ah, wait, I had misunderstood the proposed split scheme, then. I thought we were going to package Instead it seems that the proposal is |
Yes
Not exactly. |
Ok, got it! If we can find a list of libraries and their corresponding groups, I think I can do this. The split only makes sense for CUDA >= 11, right? Before that there was no ABI guarantees, I recall? |
Yes. |
For additional context, am working with other folks at NVIDIA to improve CUDA Toolkit packaging. One of the things we are doing is splitting the package into various components. So it should be easier to consume what one needs. Though as this work has already been done internally and we are focusing now on testing out these packages, would like to avoid redoing this same work in conda-forge and instead reuse what has already been done. |
In that case I'd say we wait until the component-based packages are public and we can repackage those directly, possibly grouping them under |
@jakirkham, is there any sort of goal or timeline when the component packages should be available? |
Maybe this is helpful ( #62 )? |
x-linking conda-forge/conda-smithy#1494 to prevent DoS-y feedstocks |
Starting with CUDA 12.0, packages are now split into more granular components ( conda-forge/staged-recipes#21382 ) |
nvRTC is the exception to this rule. If we split the
cudatoolkit
feedstock up into separate libraries and update recipes accordingly then we absolutely could for most of the libraries.Originally posted by @kkraus14 in conda-forge/conda-forge-pinning-feedstock#1162 (comment)
Maybe we can create separate packages for each component, and leave
cudatoolkit
as a metapackage. This could also include #27.The text was updated successfully, but these errors were encountered: