Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Windows builds #2

Merged

Conversation

conda-forge-admin
Copy link
Contributor

@conda-forge-admin conda-forge-admin commented Apr 13, 2023

Fixes #1

Configure Windows builds of cuda-cudart.

xref: conda-forge/staged-recipes#21723 (comment)


Hi! This is the friendly automated conda-forge-webservice.

I've rerendered the recipe as instructed in #1.

Here's a checklist to do before merging.

  • Bump the build number if needed.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham jakirkham changed the title MNT: rerender [WIP] Configure Windows builds Apr 13, 2023
Configure Windows builds on the feedstock.
@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@jakirkham jakirkham marked this pull request as draft April 13, 2023 00:26
Configuring to allow the packages to be downloaded from CI for debugging
purposes. This should be reverted once it is no longer needed.
@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@github-actions
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/cuda-cudart-feedstock/actions/runs/4684345001.

Workaround to ignore Windows linking issues with `cuda-cudart` to get
artifacts uploaded on CI. If a better fix is found, this should be
reverted.
recipe/meta.yaml Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

jakirkham commented Apr 13, 2023

For posterity attaching the Windows CI log and the artifacts produced on CI.

@adibbley
Copy link
Contributor

@conda-forge-admin, please re-render

conda-forge-webservices[bot] and others added 2 commits August 24, 2023 19:43
Only linux files here
@jakirkham
Copy link
Member

Alex think we need to raise an upstream issue about the missing libraries noted above ( #2 (comment) ). Would you be willing to pick this up?

@adibbley
Copy link
Contributor

adibbley commented Aug 31, 2023

The libraries being installed seem to be sufficient to get nvcc working locally with these outputs. I think we should merge this and enable all nvcc packages in conda-forge/cuda-nvcc-impl-feedstock#5.

If newer libraries are still needed upstream I can follow up on that after.

@adibbley adibbley marked this pull request as ready for review August 31, 2023 18:07
conda-forge.yml Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

jakirkham commented Aug 31, 2023

Am not sure that is sufficient

This library is linking to libraries outside the VC redist. It will be linked into everything built on Windows with CUDA support. Suspect this will cause issues building other packages down the line

@isuruf do you have any thoughts here? How should we handle these libraries outside the redist? Can we add them to the redist?

@isuruf
Copy link
Member

isuruf commented Sep 1, 2023

I checked and these DLLs are not found anywhere on the windows VM we have. There is a mention of the first DLL in https://learn.microsoft.com/en-us/uwp/win32-and-com/win32-apis. Something about UWP apps calling win32 APIs. The other DLL is not mentioned in the docs at all.

The two DLLs are in my wine installation, so they must be used by some apps.

Is there anybody on the cudart team that you could talk to?

Just ignoring the DLL here would probably result in downstream packages not working.

@jakirkham
Copy link
Member

Thanks Isuru & Alex! 🙏

If that's true, then maybe these libraries would make sense to list in some sort of Windows sysroot package

Though am little confused as at least one of the libraries is in the redist (just with an older version), which suggests cudart requires newer system libraries for some reason

It would help to get some more from the cudart team about what VS version they are using and OS version minimum they are targeting

@adibbley
Copy link
Contributor

adibbley commented Sep 1, 2023

Sounds like it was built with with VS2015 (14.0) and Windows 10 is the minimum OS.

@isuruf
Copy link
Member

isuruf commented Sep 2, 2023

If that's true, then maybe these libraries would make sense to list in some sort of Windows sysroot package

Did you read what I said? This makes no sense because the DLL is not found anywhere in a windows VM I checked.
What's the use of asking me for feedback and then ignoring it?

@jakirkham
Copy link
Member

It was responding to what the cudart team said (quoted by Alex above) as you asked us to talk to them

In any event it sounds like you are disputing the cudart team's statement. Am I understanding that correctly? FWIW it's ok to disagree, am just trying to understand what you are saying

@isuruf
Copy link
Member

isuruf commented Sep 2, 2023

In any event it sounds like you are disputing the cudart team's statement. Am I understanding that correctly?

Yes

@jakirkham
Copy link
Member

Ok, thanks Isuru 🙏

Could you please share how the VM was setup? We can share this info with the cudart team. Maybe that will help clarify the issue

@isuruf
Copy link
Member

isuruf commented Sep 2, 2023

Could you please share how the VM was setup? We can share this info with the cudart team. Maybe that will help clarify the issue

It's the OVH VM that we have access to

@aterrel
Copy link

aterrel commented Oct 5, 2023

@isuruf is this the Windows Server 2022 edition VM? I'm spinning up a node on OVH to test now.

@jakirkham
Copy link
Member

Andy & I paired up and tried looking at this issue on the lowest tier OVH machine. We tried to build the package there, but were running into issues with memory. So were not able to progress very far. Also we tried looking for the libraries listed above as well as ones that were in the redist, but actually were not finding any libraries (including ones from the redist on the machine).

Andy has a local Windows machine on Windows 11 that he is going to start looking at. Have offered to help however I can there.

We searched around on the web as well. Andy found this link, which mentions api-ms-win-core-libraryloader-l1-2-0 is intended for support on older Windows systems. It does suggest that is should be in the Microsoft Visual C++ Redistributable for Visual Studio 2019. So maybe we can add this to the redist.

It is much harder to find info about api-ms-win-security-systemfunctions-l1-1-0.dll, which is the other missing DSO above. Though did find a thread in MinGW about using this library. AFAICT this library is primarily on older Windows systems and used for compatibility with them. So it may make sense that it cannot be found on newer Windows OSes.

One thing we might consider is going ahead and building this package so we can get to cuda-nvcc where we have bit more of the testing pieces (like running CMake’s test suite and building sample programs). That might make it easier to see what issues are encountered on Azure (if any).

Must admit Windows is not my strong suit. So this is probably as far as I can go.

@aterrel
Copy link

aterrel commented Oct 6, 2023

I was able to get a Windows 11 machine to build this feedstock (after commenting out the missing_dso_whitelist lines). See the attached log.

condart-feedstock-log.txt

Additionally this library api-ms-win-core-libraryloader-l1-2-0.dll does not appear to be on my Windows 11 box either.

I hope this allays @isuruf fears that accepting this patch would break all the other cuda dependencies.

@isuruf
Copy link
Member

isuruf commented Oct 7, 2023

No, it doesn't. If you do conda build recipe --error-overlinking, conda-build would error.

I'm not sure why you are saying that things would work when you clearly say that the dependent DLL does not appear anywhere on your windows box.

@aterrel
Copy link

aterrel commented Oct 7, 2023

@isuruf thanks for the command to check. I ran it this morning and it throws the error.

Our hypothesis was that this dll is only needed for older systems and there wouldn't be a linking. @jakirkham and I are not windows experts so we were trying to go through the process to validate the competing hypothesis in this thread.

I've attached the log for folks to see.
condart-feedstock-log-with-error-overlinking.txt

@aterrel
Copy link

aterrel commented Oct 7, 2023

Okay it looks like these files are part of the Visual C++ Redistibution Package [0]. It looks like conda-forge already redistibutes some of win-core dlls, might be able to include those as well, e.g.

NVIDIA.COM+aterrel@NV-BGPW1N3 MINGW64 ~
$ find /c -name api-ms-win-core-libraryloader-l1-1-0.dll 2>/dev/null
/c/Program Files/Dell/DTP/IPDT/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Program Files/dotnet/shared/Microsoft.NETCore.App/6.0.22/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Program Files/Microsoft OneDrive/23.194.0917.0001/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Program Files (x86)/Microsoft Azure Information Protection/x64/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Program Files (x86)/Microsoft Azure Information Protection/x86/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/conda-bld/cuda-cudart-split_1696692242872/_h_env/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/conda-bld/cuda-cudart-split_1696692242872/_h_env/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/envs/nvidia/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/envs/nvidia/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/envs/rapids/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/envs/rapids/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/pkgs/libsolv-0.7.25-h12be248_0/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/pkgs/ucrt-10.0.22621.0-h57928b3_0/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/mambaforge/pkgs/ucrt-10.0.22621.0-h57928b3_0/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/Microsoft/Teams/current/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/Microsoft/Teams/previous/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/miniconda3/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/miniconda3/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/miniconda3/pkgs/vs2015_runtime-14.27.29016-h5e58377_2/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/miniconda3/pkgs/vs2015_runtime-14.27.29016-h5e58377_2/Library/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/slack/app-4.34.119/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Local/slack/app-4.34.121/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Users/aterrel/AppData/Roaming/Zoom/bin/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Windows/System32/downlevel/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Windows/SysWOW64/downlevel/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Windows/WinSxS/amd64_microsoft-windows-m..namespace-downlevel_31bf3856ad364e35_10.0.22621.1_none_28faa66474d87e1a/api-ms-win-core-libraryloader-l1-1-0.dll
/c/Windows/WinSxS/wow64_microsoft-windows-m..namespace-downlevel_31bf3856ad364e35_10.0.22621.1_none_334f50b6a9394015/api-ms-win-core-libraryloader-l1-1-0.dll

NVIDIA.COM+aterrel@NV-BGPW1N3 MINGW64 ~
$ find /c -name api-ms-win-core-libraryloader-l1-2-0.dll 2>/dev/null

[0] https://learn.microsoft.com/en-us/cpp/windows/redistributing-visual-cpp-files?view=msvc-170

@isuruf
Copy link
Member

isuruf commented Oct 7, 2023

DLLs starting with api- are special. See https://learn.microsoft.com/en-us/windows/win32/apiindex/windows-apisets. Basically these are DLLs that redirects symbols in the core OS components. When linking against the core OS libraries, the downstream packages are linked against these API sets so that the actual definition of these files can be changed from the different core OS components like kernel32.dll, advapi32.dll, etc.

I assume cudart was compiled with MinGW-w64 and MinGW-w64 emulates this mechanism as well. api-ms-win-core-libraryloader-l1-2-0.dll is an API set used for windows apps and is not present in regular x86_64 apps. MinGW-w64 however redirects symbols like LoadLibraryA to this API set DLL present in windows apps by mistake (I assume. Not 100% sure). So, technically a libcudart_64.dll should fail to load since the API set is not found. However the windows loader redirects all unknown API sets to kernelbase.dll as indicated by lucasg/Dependencies@3edc2a4 and therefore works fine.

Therefore it's fine to ignore these two API sets.

@aterrel
Copy link

aterrel commented Oct 7, 2023

Thanks for the heads up! Definitely learned something new there. We will get it merged on Monday.

@jakirkham
Copy link
Member

@leofang recently pointed out offline that some CUDA 11 Windows libraries are also linking to the same API DLLs. Indeed those seem to be loading without issues. Though it was still unclear why that worked. Your explanation helps makes sense of that. Thanks Isuru! 🙏

@leofang
Copy link
Member

leofang commented Oct 8, 2023

Yes, I was puzzled when I inspected the old cudatoolkit build log (ex: here) and saw that cudart has been linked to the two DLLs in question since CUDA 11.7. Glad to know what's protecting us!

@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@jakirkham jakirkham changed the title [WIP] Configure Windows builds Configure Windows builds Oct 9, 2023
Co-authored-by: adibbley <103537006+adibbley@users.noreply.github.com>
@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@jakirkham jakirkham added the automerge Merge the PR when CI passes label Oct 9, 2023
@github-actions github-actions bot merged commit d6fd097 into conda-forge:main Oct 9, 2023
5 checks passed
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • travis: passed
  • azure: passed

Thus the PR was passing and merged! Have a great day!

@@ -25,13 +25,16 @@ source:

build:
number: 6
skip: true # [osx or win]
skip: true # [osx]

outputs:
- name: cuda-cudart_{{ target_platform }}
build:
noarch: generic
binary_relocation: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we know it was patchelf causing issues, perhaps it's time to remove this workaround?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the patchelf bug is not fixed: NixOS/patchelf#492

So no don't think we should be removing that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not fixed (I am fully aware), but we marked 0.18.0 as broken (you did it 🙂) and there's no 0.18.1 yet. I'd assume we are safe for now, and be extra cautious when a new patchelf is out.

We can even go further and pin patchelf globally, which I was surprised to see not done yet. We should consider it given its high impact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rather not open us to unnecessary risk with patchelf. This keeps us protected

We explored that before ( conda-forge/conda-smithy#1737 ), but there wasn't a good option to pursue ( conda-forge/conda-smithy#1737 (comment) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to add patchelf to conda-forge-pinning-feedstock, not conda-smithy. Is it not an option (and why)?

Copy link
Member

@jakirkham jakirkham Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear on how that would help. conda-forge-pinning only affects dependencies in the requirements of recipes themselves

Not build tools like patchelf that are installed in base (before the recipe is seen). This would require changes to conda-smithy to work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, John, I see. I think this is where it confused me.

I know for a fact that installing patchelf to host could help ignore the one installed by conda-smithy. But it requires the recipe maintainer to manually add patchelf to host, which is something awkward that we don't want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there would still be a patchelf installed in base that conda-build would use. IOW the problem would still be there

@jakirkham
Copy link
Member

Next steps will be to update the cuda-nvcc* feedstocks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Windows
6 participants