Split recipe in components #48

jaimergp · 2021-01-28T10:09:06Z

I begin wondering if it's ok to always use the latest 11.x to build for all 11.y with y <= x. I think CUDA 11 provides a strong guarantee in ABI compatibility?

nvRTC is the exception to this rule. If we split the cudatoolkit feedstock up into separate libraries and update recipes accordingly then we absolutely could for most of the libraries.

Originally posted by @kkraus14 in conda-forge/conda-forge-pinning-feedstock#1162 (comment)

Maybe we can create separate packages for each component, and leave cudatoolkit as a metapackage. This could also include #27.

The text was updated successfully, but these errors were encountered:

leofang · 2021-01-28T15:14:52Z

Related: conda-forge/nvcc-feedstock#35

leofang · 2021-01-28T15:20:52Z

Personally I'd hope that we wait until the CI infrastructure is ready to actually test GPU packages (conda-forge/conda-forge.github.io#1062)... The current approach provides a certain degree of guarantee that packages built like this will work fine, but I am less confident if all components are split.

In addition, at least CuPy is designed to be released for each CUDA version, so the original concern

Do we want to support 7 CUDA versions? This would result in big matrix of builds per feedstock.

is not really solved unless each software changes the packaging expectation. Just my two cents.

kkraus14 · 2021-01-28T18:54:24Z

Agreed. Libraries would need to move towards a path of gracefully handling things at runtime. I.E. CUDA 11.2 introduced cudaMallocAsync, but if the driver isn't sufficiently new it will return a specific error code at runtime if it's used.

It would protect libraries from failing to load due to missing symbols though.

isuruf · 2021-01-28T21:14:03Z

Isn't nvrtc used by packages using nvcc? i.e. this wouldn't change benefit many packages.

kkraus14 · 2021-01-28T23:17:52Z

Isn't nvrtc used by packages using nvcc? i.e. this wouldn't change benefit many packages.

I don't believe so? If I remember correctly the only thing nvcc links is libcudart. Everything else needs to be explicitly linked.

isuruf · 2021-01-28T23:24:13Z

libcudart also has major.minor SONAME.

kkraus14 · 2021-01-29T00:24:45Z

libcudart also has major.minor SONAME.

As of CUDA 11.0 it has stayed at 11.0:

(dev) keith@Keith-PC:/usr/local/cuda-11.0/lib64$ readelf -d libcudart.so | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libcudart.so.11.0]

(dev) keith@Keith-PC:/usr/local/cuda-11.1/lib64$ readelf -d libcudart.so | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libcudart.so.11.0]

(dev) keith@Keith-PC:/usr/local/cuda-11.2/lib64$ readelf -d libcudart.so | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libcudart.so.11.0]

jaimergp · 2021-06-28T15:08:53Z

So let's think about how this could be done.

Nvidia maintains official RPMs for CentOS. Can we repackage them right away?

isuruf · 2021-06-28T15:15:18Z

What do you mean by that? We already have a mechanism to download cudatoolkit and repackaging them.

jaimergp · 2021-06-28T15:21:18Z

I realized we are also packaging Windows here, so we can't automate the RPM->Conda extraction (which was the idea I was hinting at).

In that case, I guess my proposal means "repackage in different outputs following the same split Nvidia uses for CentOS". I haven't checked if these follow the same scheme proposed in the CUDA components page.

Anyway, in short. The question is which scheme we follow:

A. The official one proposed by Nvidia here.
B. The strategy followed on CentOS or any other main distro.

kkraus14 · 2021-06-28T16:05:48Z

@jakirkham it would likely be good for you to chime in here

jakirkham · 2021-06-28T18:09:39Z

I'd rather us not spend too much time with the current structure of the package for the same reason Leo already outlined above ( #48 (comment) )

h-vetinari · 2021-06-30T15:01:23Z

How do we proceed here?

The wait for the GPU CI infrastructure that Leo mentions is potentially quite long. With 10.2-11.4 we'd be back to building 6 versions for everything (especially painful for packages like pytorch/tensorflow), where we could realistically get away with 1-2 if we build per major version - that would clearly be a big win.

However, perhaps pursuing these two things is not orthogonal? We can start migrating for 11.3/11.4 while still pushing forward design & implementation for a major-based split? And as soon as the latter becomes available, we could try switching feedstocks.

isuruf · 2021-06-30T21:38:43Z

I don't think we need GPU CI infrastructure to do a split. We can split now. It's a lot to ask of maintainers of packages like tensorflow to rebuild when we can do the split and avoid any rebuilds.

jaimergp · 2021-07-01T12:59:43Z

I agree that a split package is the best way forward, but we also need to decide which general strategy we are going to follow for CUDA support. Last major.minor from the last two major releases (e.g. 10.2 and... 11.4 now?). That conversation can happen in a different issue.

If we do go ahead and decide for a split package, we still need to decide which subpackages will be created. This might be involve more work than it looks like, since I don't know if there's a resource where the different CUDA components list their corresponding files.

Also, we need to account for this:

Starting with CUDA 11, the various components in the toolkit are versioned independently.

Which allows us to mix and match components from previous releases as part of the new CUDA "distribution" once we drop 10.2. Question is how to express all of these relationships in the meta.yaml so it works for all versions as it does now.

In short, all of this is a non-trivial amount of work.

leofang · 2021-07-01T13:16:47Z

Mix-and-match of CUDA components is problematic even within patch versions. For example cuSPARSE added new APIs in 11.3.1 not seen in 11.3.0, which caused hard-to-detect bug reports to CuPy (credit goes to @kmaehashi). Splitting is hypothetically doable, but without a GPU CI to test it we are sending a time bomb to the feedstock maintainers who should not have to worry about this because AFAIK no upstream libraries test this setting seriously, at least CuPy doesn't.

isuruf · 2021-07-01T13:21:02Z

Which allows us to mix and match components from previous releases as part of the new CUDA "distribution" once we drop 10.2. Question is how to express all of these relationships in the meta.yaml so it works for all versions as it does now.

I don't get why we need to mix and match components.

Also, we need to account for this:
Starting with CUDA 11, the various components in the toolkit are versioned independently.

Why do we need to take this into account? If we go with the original idea, we'll split the toolkit to 2 packages.

Question is how to express all of these relationships in the meta.yaml so it works for all versions as it does now.

We'd have two run_exports from cudatoolkit, say cudatoolkit-major (major only part) and cudatoolkit-major-minor (major-minor part).
Then packages that don't depend on the major-minor part we'll add ignore_run_exports: cudatoolkit-major-minor.

jaimergp · 2021-07-01T13:32:24Z

Ah, wait, I had misunderstood the proposed split scheme, then. I thought we were going to package cudatoolkit-cudart, cudatoolkit-cublas, cudatoolkit-cufft, etc, and maintainers would pick the parts they need for their package. An overarching cudatoolkit would be a metapackage listing all of the parts needed (and this is where the mix-and-match part could be, but I see how we don't want that).

Instead it seems that the proposal is cudatoolkit-major and cudatoolkit-major-minor so they behave like a configurable run_exports? In that case, is it just a matter of adding these "virtual" outputs?

isuruf · 2021-07-01T13:35:54Z

Instead it seems that the proposal is cudatoolkit-major and cudatoolkit-major-minor so they behave like a configurable run_exports?

Yes

In that case, is it just a matter of adding these "virtual" outputs?

Not exactly. cudatoolkit-major-minor will have libraries that depend on the major-minor. currently nvrtc and a couple of others I think. cudatoolkit-major will have libraries that depend only on major. Then, cudatoolkit will be a meta package.

jaimergp · 2021-07-01T13:38:09Z

Ok, got it! If we can find a list of libraries and their corresponding groups, I think I can do this. The split only makes sense for CUDA >= 11, right? Before that there was no ABI guarantees, I recall?

isuruf · 2021-07-01T13:38:37Z

The split only makes sense for CUDA >= 11, right? Before that there was no ABI guarantees, I recall?

Yes.

jakirkham · 2021-07-02T23:49:33Z

For additional context, am working with other folks at NVIDIA to improve CUDA Toolkit packaging. One of the things we are doing is splitting the package into various components. So it should be easier to consume what one needs. Though as this work has already been done internally and we are focusing now on testing out these packages, would like to avoid redoing this same work in conda-forge and instead reuse what has already been done.

jaimergp · 2021-07-03T10:14:28Z

In that case I'd say we wait until the component-based packages are public and we can repackage those directly, possibly grouping them under cudatoolkit-{major,minor} if necessary?

h-vetinari · 2021-07-12T12:22:37Z

@jakirkham, is there any sort of goal or timeline when the component packages should be available?

jakirkham · 2021-07-14T22:54:16Z

Maybe this is helpful ( #62 )?

jaimergp · 2021-07-20T08:06:55Z

x-linking conda-forge/conda-smithy#1494 to prevent DoS-y feedstocks

jakirkham · 2023-07-07T20:42:37Z

Starting with CUDA 12.0, packages are now split into more granular components ( conda-forge/staged-recipes#21382 )

isuruf mentioned this issue Jun 28, 2021

Tracking updates for CUDA 11.3 and 11.4 #58

Closed

5 tasks

jaimergp mentioned this issue Jul 21, 2021

Initial CUDA 11.3 Conda Packages #62

Closed

jakirkham mentioned this issue Jul 29, 2022

Shim packages #61

Open

1 task

jakirkham closed this as completed Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split recipe in components #48

Split recipe in components #48

jaimergp commented Jan 28, 2021

leofang commented Jan 28, 2021

leofang commented Jan 28, 2021

kkraus14 commented Jan 28, 2021

isuruf commented Jan 28, 2021

kkraus14 commented Jan 28, 2021

isuruf commented Jan 28, 2021

kkraus14 commented Jan 29, 2021 •

edited

Loading

jaimergp commented Jun 28, 2021

isuruf commented Jun 28, 2021

jaimergp commented Jun 28, 2021

kkraus14 commented Jun 28, 2021

jakirkham commented Jun 28, 2021

h-vetinari commented Jun 30, 2021

isuruf commented Jun 30, 2021

jaimergp commented Jul 1, 2021

leofang commented Jul 1, 2021

isuruf commented Jul 1, 2021 •

edited

Loading

jaimergp commented Jul 1, 2021

isuruf commented Jul 1, 2021

jaimergp commented Jul 1, 2021

isuruf commented Jul 1, 2021

jakirkham commented Jul 2, 2021

jaimergp commented Jul 3, 2021

h-vetinari commented Jul 12, 2021

jakirkham commented Jul 14, 2021

jaimergp commented Jul 20, 2021

jakirkham commented Jul 7, 2023

Split recipe in components #48

Split recipe in components #48

Comments

jaimergp commented Jan 28, 2021

leofang commented Jan 28, 2021

leofang commented Jan 28, 2021

kkraus14 commented Jan 28, 2021

isuruf commented Jan 28, 2021

kkraus14 commented Jan 28, 2021

isuruf commented Jan 28, 2021

kkraus14 commented Jan 29, 2021 • edited Loading

jaimergp commented Jun 28, 2021

isuruf commented Jun 28, 2021

jaimergp commented Jun 28, 2021

kkraus14 commented Jun 28, 2021

jakirkham commented Jun 28, 2021

h-vetinari commented Jun 30, 2021

isuruf commented Jun 30, 2021

jaimergp commented Jul 1, 2021

leofang commented Jul 1, 2021

isuruf commented Jul 1, 2021 • edited Loading

jaimergp commented Jul 1, 2021

isuruf commented Jul 1, 2021

jaimergp commented Jul 1, 2021

isuruf commented Jul 1, 2021

jakirkham commented Jul 2, 2021

jaimergp commented Jul 3, 2021

h-vetinari commented Jul 12, 2021

jakirkham commented Jul 14, 2021

jaimergp commented Jul 20, 2021

jakirkham commented Jul 7, 2023

kkraus14 commented Jan 29, 2021 •

edited

Loading

isuruf commented Jul 1, 2021 •

edited

Loading