Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split recipe in components #48

Closed
jaimergp opened this issue Jan 28, 2021 · 27 comments
Closed

Split recipe in components #48

jaimergp opened this issue Jan 28, 2021 · 27 comments

Comments

@jaimergp
Copy link
Member

I begin wondering if it's ok to always use the latest 11.x to build for all 11.y with y <= x. I think CUDA 11 provides a strong guarantee in ABI compatibility?

nvRTC is the exception to this rule. If we split the cudatoolkit feedstock up into separate libraries and update recipes accordingly then we absolutely could for most of the libraries.

Originally posted by @kkraus14 in conda-forge/conda-forge-pinning-feedstock#1162 (comment)


Maybe we can create separate packages for each component, and leave cudatoolkit as a metapackage. This could also include #27.

@leofang
Copy link
Member

leofang commented Jan 28, 2021

Related: conda-forge/nvcc-feedstock#35

@leofang
Copy link
Member

leofang commented Jan 28, 2021

Personally I'd hope that we wait until the CI infrastructure is ready to actually test GPU packages (conda-forge/conda-forge.github.io#1062)... The current approach provides a certain degree of guarantee that packages built like this will work fine, but I am less confident if all components are split.

In addition, at least CuPy is designed to be released for each CUDA version, so the original concern

Do we want to support 7 CUDA versions? This would result in big matrix of builds per feedstock.

is not really solved unless each software changes the packaging expectation. Just my two cents.

@kkraus14
Copy link
Contributor

Agreed. Libraries would need to move towards a path of gracefully handling things at runtime. I.E. CUDA 11.2 introduced cudaMallocAsync, but if the driver isn't sufficiently new it will return a specific error code at runtime if it's used.

It would protect libraries from failing to load due to missing symbols though.

@isuruf
Copy link
Member

isuruf commented Jan 28, 2021

Isn't nvrtc used by packages using nvcc? i.e. this wouldn't change benefit many packages.

@kkraus14
Copy link
Contributor

Isn't nvrtc used by packages using nvcc? i.e. this wouldn't change benefit many packages.

I don't believe so? If I remember correctly the only thing nvcc links is libcudart. Everything else needs to be explicitly linked.

@isuruf
Copy link
Member

isuruf commented Jan 28, 2021

libcudart also has major.minor SONAME.

@kkraus14
Copy link
Contributor

kkraus14 commented Jan 29, 2021

libcudart also has major.minor SONAME.

As of CUDA 11.0 it has stayed at 11.0:

(dev) keith@Keith-PC:/usr/local/cuda-11.0/lib64$ readelf -d libcudart.so | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libcudart.so.11.0]
(dev) keith@Keith-PC:/usr/local/cuda-11.1/lib64$ readelf -d libcudart.so | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libcudart.so.11.0]
(dev) keith@Keith-PC:/usr/local/cuda-11.2/lib64$ readelf -d libcudart.so | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libcudart.so.11.0]

@jaimergp
Copy link
Member Author

So let's think about how this could be done.

Nvidia maintains official RPMs for CentOS. Can we repackage them right away?

@isuruf
Copy link
Member

isuruf commented Jun 28, 2021

What do you mean by that? We already have a mechanism to download cudatoolkit and repackaging them.

@jaimergp
Copy link
Member Author

I realized we are also packaging Windows here, so we can't automate the RPM->Conda extraction (which was the idea I was hinting at).

In that case, I guess my proposal means "repackage in different outputs following the same split Nvidia uses for CentOS". I haven't checked if these follow the same scheme proposed in the CUDA components page.

Anyway, in short. The question is which scheme we follow:

A. The official one proposed by Nvidia here.
B. The strategy followed on CentOS or any other main distro.

@kkraus14
Copy link
Contributor

@jakirkham it would likely be good for you to chime in here

@jakirkham
Copy link
Member

I'd rather us not spend too much time with the current structure of the package for the same reason Leo already outlined above ( #48 (comment) )

@h-vetinari
Copy link
Member

How do we proceed here?

The wait for the GPU CI infrastructure that Leo mentions is potentially quite long. With 10.2-11.4 we'd be back to building 6 versions for everything (especially painful for packages like pytorch/tensorflow), where we could realistically get away with 1-2 if we build per major version - that would clearly be a big win.

However, perhaps pursuing these two things is not orthogonal? We can start migrating for 11.3/11.4 while still pushing forward design & implementation for a major-based split? And as soon as the latter becomes available, we could try switching feedstocks.

@isuruf
Copy link
Member

isuruf commented Jun 30, 2021

I don't think we need GPU CI infrastructure to do a split. We can split now. It's a lot to ask of maintainers of packages like tensorflow to rebuild when we can do the split and avoid any rebuilds.

@jaimergp
Copy link
Member Author

jaimergp commented Jul 1, 2021

I agree that a split package is the best way forward, but we also need to decide which general strategy we are going to follow for CUDA support. Last major.minor from the last two major releases (e.g. 10.2 and... 11.4 now?). That conversation can happen in a different issue.

If we do go ahead and decide for a split package, we still need to decide which subpackages will be created. This might be involve more work than it looks like, since I don't know if there's a resource where the different CUDA components list their corresponding files.

Also, we need to account for this:

Starting with CUDA 11, the various components in the toolkit are versioned independently.

Which allows us to mix and match components from previous releases as part of the new CUDA "distribution" once we drop 10.2. Question is how to express all of these relationships in the meta.yaml so it works for all versions as it does now.

In short, all of this is a non-trivial amount of work.

@leofang
Copy link
Member

leofang commented Jul 1, 2021

Mix-and-match of CUDA components is problematic even within patch versions. For example cuSPARSE added new APIs in 11.3.1 not seen in 11.3.0, which caused hard-to-detect bug reports to CuPy (credit goes to @kmaehashi). Splitting is hypothetically doable, but without a GPU CI to test it we are sending a time bomb to the feedstock maintainers who should not have to worry about this because AFAIK no upstream libraries test this setting seriously, at least CuPy doesn't.

@isuruf
Copy link
Member

isuruf commented Jul 1, 2021

Which allows us to mix and match components from previous releases as part of the new CUDA "distribution" once we drop 10.2. Question is how to express all of these relationships in the meta.yaml so it works for all versions as it does now.

I don't get why we need to mix and match components.

Also, we need to account for this:
Starting with CUDA 11, the various components in the toolkit are versioned independently.

Why do we need to take this into account? If we go with the original idea, we'll split the toolkit to 2 packages.

Question is how to express all of these relationships in the meta.yaml so it works for all versions as it does now.

We'd have two run_exports from cudatoolkit, say cudatoolkit-major (major only part) and cudatoolkit-major-minor (major-minor part).
Then packages that don't depend on the major-minor part we'll add ignore_run_exports: cudatoolkit-major-minor.

@jaimergp
Copy link
Member Author

jaimergp commented Jul 1, 2021

Ah, wait, I had misunderstood the proposed split scheme, then. I thought we were going to package cudatoolkit-cudart, cudatoolkit-cublas, cudatoolkit-cufft, etc, and maintainers would pick the parts they need for their package. An overarching cudatoolkit would be a metapackage listing all of the parts needed (and this is where the mix-and-match part could be, but I see how we don't want that).

Instead it seems that the proposal is cudatoolkit-major and cudatoolkit-major-minor so they behave like a configurable run_exports? In that case, is it just a matter of adding these "virtual" outputs?

@isuruf
Copy link
Member

isuruf commented Jul 1, 2021

Instead it seems that the proposal is cudatoolkit-major and cudatoolkit-major-minor so they behave like a configurable run_exports?

Yes

In that case, is it just a matter of adding these "virtual" outputs?

Not exactly. cudatoolkit-major-minor will have libraries that depend on the major-minor. currently nvrtc and a couple of others I think. cudatoolkit-major will have libraries that depend only on major. Then, cudatoolkit will be a meta package.

@jaimergp
Copy link
Member Author

jaimergp commented Jul 1, 2021

Ok, got it! If we can find a list of libraries and their corresponding groups, I think I can do this. The split only makes sense for CUDA >= 11, right? Before that there was no ABI guarantees, I recall?

@isuruf
Copy link
Member

isuruf commented Jul 1, 2021

The split only makes sense for CUDA >= 11, right? Before that there was no ABI guarantees, I recall?

Yes.

@jakirkham
Copy link
Member

For additional context, am working with other folks at NVIDIA to improve CUDA Toolkit packaging. One of the things we are doing is splitting the package into various components. So it should be easier to consume what one needs. Though as this work has already been done internally and we are focusing now on testing out these packages, would like to avoid redoing this same work in conda-forge and instead reuse what has already been done.

@jaimergp
Copy link
Member Author

jaimergp commented Jul 3, 2021

In that case I'd say we wait until the component-based packages are public and we can repackage those directly, possibly grouping them under cudatoolkit-{major,minor} if necessary?

@h-vetinari
Copy link
Member

@jakirkham, is there any sort of goal or timeline when the component packages should be available?

@jakirkham
Copy link
Member

Maybe this is helpful ( #62 )?

@jaimergp
Copy link
Member Author

x-linking conda-forge/conda-smithy#1494 to prevent DoS-y feedstocks

@jakirkham
Copy link
Member

Starting with CUDA 12.0, packages are now split into more granular components ( conda-forge/staged-recipes#21382 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants