Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart CUDA 12 migration with CUDA arch builds #4799

Merged
merged 3 commits into from
Aug 23, 2023

Conversation

jakirkham
Copy link
Member

As all CUDA Linux builds are now on equal footing, fold the CUDA 12 arch migration into the CUDA 12 migration and restart it. That way the latest migrator can be retrieved.


Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

As all CUDA Linux builds are now on equal footing, fold the CUDA 12 arch
migration into the CUDA 12 migration and restart it. That way the latest
migrator can be retrieved.
@jakirkham jakirkham requested a review from a team as a code owner August 17, 2023 00:34
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member

Isn't this still dependent on having had the arch migration on that feedstock first? Or have all CUDA feedstocks passed that migration already?

@jakirkham
Copy link
Member Author

During the core meeting on 2023 July 26, we discussed how CUDA arch libraries make use of a newer GLIBC. In particular some libraries use symbols from GLIBC versions newer than 2.17. However GLIBC 2.28 support in conda-forge is still under discussion ( conda-forge/conda-forge.github.io#1941 )

That said, one of the points brought up in the meeting is that from a conda-forge build perspective, mainly we are interested to know the GLIBC usage of static libraries that are linked in builds. With CUDA, this is libcudart generally. So a takeaway point from that meeting was to investigate symbols usage of libcudart and report back on what GLIBC was found

Have since done this and reported it in the last conda-forge meeting. Only saw GLIBC 2.17 symbols in use. Am including the results below for completeness:

$ conda create -n cuda cuda-cudart-dev -y
Channels:                                                                       
 - conda-forge
Platform: linux-aarch64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/cuda

  added / updated specs:
    - cuda-cudart-dev


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _openmp_mutex-4.5          |            2_gnu          23 KB  conda-forge
    cuda-cccl_linux-aarch64-12.0.90|       h579c4fd_1         1.1 MB  conda-forge
    cuda-cudart-12.0.107       |       hac28a21_6          21 KB  conda-forge
    cuda-cudart-dev-12.0.107   |       hac28a21_6          22 KB  conda-forge
    cuda-cudart-dev_linux-aarch64-12.0.107|       hac28a21_6         317 KB  conda-forge
    cuda-cudart-static-12.0.107|       hac28a21_6          22 KB  conda-forge
    cuda-cudart-static_linux-aarch64-12.0.107|       hac28a21_6         641 KB  conda-forge
    cuda-cudart_linux-aarch64-12.0.107|       hac28a21_6         180 KB  conda-forge
    cuda-version-12.0          |       hffde075_2          20 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.3 MB

The following NEW packages will be INSTALLED:

  _openmp_mutex      conda-forge/linux-aarch64::_openmp_mutex-4.5-2_gnu 
  cuda-cccl_linux-a~ conda-forge/noarch::cuda-cccl_linux-aarch64-12.0.90-h579c4fd_1 
  cuda-cudart        conda-forge/linux-aarch64::cuda-cudart-12.0.107-hac28a21_6 
  cuda-cudart-dev    conda-forge/linux-aarch64::cuda-cudart-dev-12.0.107-hac28a21_6 
  cuda-cudart-dev_l~ conda-forge/noarch::cuda-cudart-dev_linux-aarch64-12.0.107-hac28a21_6 
  cuda-cudart-static conda-forge/linux-aarch64::cuda-cudart-static-12.0.107-hac28a21_6 
  cuda-cudart-stati~ conda-forge/noarch::cuda-cudart-static_linux-aarch64-12.0.107-hac28a21_6 
  cuda-cudart_linux~ conda-forge/noarch::cuda-cudart_linux-aarch64-12.0.107-hac28a21_6 
  cuda-version       conda-forge/noarch::cuda-version-12.0-hffde075_2 
  libgcc-ng          conda-forge/linux-aarch64::libgcc-ng-13.1.0-h2b4548d_0 
  libgomp            conda-forge/linux-aarch64::libgomp-13.1.0-h2b4548d_0 
  libstdcxx-ng       conda-forge/linux-aarch64::libstdcxx-ng-13.1.0-h452befe_0 



Downloading and Extracting Packages
                                                                                
Preparing transaction: done                                                     
Verifying transaction: done                                                     
Executing transaction: done                                                     
#                                                                               
# To activate this environment, use                                             
#                                                                               
#     $ conda activate cuda                                                     
#                                                                               
# To deactivate an active environment, use
#
#     $ conda deactivate

$ strings /opt/conda/envs/cuda/targets/sbsa-linux/lib/libcudart_static.a | grep GLIBC
GLIBC_2.17
GLIBC_2.17

@jakirkham
Copy link
Member Author

jakirkham commented Aug 17, 2023

Isn't this still dependent on having had the arch migration on that feedstock first? Or have all CUDA feedstocks passed that migration already?

If the arch migration has not occurred on a feedstock (either via the bot or a maintainer manually enabling those builds), then the CUDA 12 migrator will not add those jobs

If the the arch migration has run (or is run later), then the CUDA 12 arch builds will be added

IOW they are orthogonal

Edit: Here's an example with a feedstock where arch builds are disabled, the CUDA 12 migration is run, and then arch builds are enabled ( conda-forge/nccl-feedstock#93 ). Note CUDA 12 arch builds are not added until arch builds are enabled

@jakirkham
Copy link
Member Author

@conda-forge/core, would be interested to hear your thoughts on this

Comment on lines 5 to +6
migration_number:
1
2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to double check this doesn't drop the migrator in migrated feedstocks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this code path in conda-smithy continues to use the migrator already in a feedstock when a new migration_number occurs. IOW it will wait until the migrator bot add the new migration file (or a user adds it manually)

Removal doesn't occur as long as migrator_ts is the same, which is the case here

Copy link
Member Author

@jakirkham jakirkham Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also IIUC the migrators used by conda-smithy can be overridden by passing --exclusive-config-file during rerender. So tried this with the ucx-split-feedstock using latest changes and found no change. So it looks like this is not an issue

Edit: Should add can see this migration file being used based on log lines that look like this pointing to the local copy of the migrator

INFO:conda_smithy.configure_feedstock:Applying migrations: /Users/jkirkham/Developer/conda-forge/feedstocks/feedstocks/ucx-split/.ci_support/migrations/cuda_112_ppc64le_aarch64.yaml,/Users/jkirkham/Developer/conda-forge/feedstocks/feedstocks/ucx-split/.ci_support/migrations/cuda120.yaml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also have done a few re-renderings myself and with the bot after this was merged and everything seems to be working correctly

Since `cudnn` is no longer in `zip_keys` with `cuda_compiler_version`
and the version of `cudnn` used by CUDA 12 is the same as CUDA 11, go
ahead and drop `cudnn` from the CUDA 12 migrator. It is not actually
needed or used here.
@jakirkham
Copy link
Member Author

Based on discussion in the conda-forge meeting today, it sounds like we are ok going ahead with this.

Only item was to check on the behavior of migration_number, which was done above (and behaves as we would intend).

Given this going to go ahead and get this started. Happy to follow up on anything else as needed. Thanks all! 🙏

@jakirkham jakirkham merged commit cbaccc9 into conda-forge:main Aug 23, 2023
2 checks passed
@jakirkham jakirkham deleted the start_cuda12_arch branch August 23, 2023 21:58
@jakirkham
Copy link
Member Author

Posted an update about this in the conda-forge CUDA 12 bringup issue: conda-forge/conda-forge.github.io#1963 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants