Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime detection MacOS #258

Merged
merged 24 commits into from
Nov 12, 2023
Merged

Conversation

SignalRT
Copy link
Collaborator

@SignalRT SignalRT commented Nov 6, 2023

  • Add MacOS x86_64 in the build.
  • Copy different runtimes to the output folder
  • Dynamically load the right MacOS library
  • Tested on MacOS Arm64
  • Tested on MacOS x86_64

@SignalRT SignalRT mentioned this pull request Nov 6, 2023
2 tasks
@martindevans
Copy link
Member

I guess the https://github.com/SciSharp/LLamaSharp/blob/6334f25627d4a5903e14345fb50fd4cf03389cef/LLama/runtimes/build/LLamaSharp.Backend.Cpu.nuspec# probably need updating to reference this new binary?

Also, does the MacMetal nuspec still need to exist? I think we dropped that as a specific backend a while ago?

@SignalRT
Copy link
Collaborator Author

SignalRT commented Nov 6, 2023

@martindevans, this is related with Issue #251 I reorganize the binaries in directories (only the MacOS binaries), Change the dynamic detection, change the build to include Intel binaries, update MacOS binaries to make a test in my branch (https://github.com/SignalRT/LLamaSharp/actions/runs/6777594696) where the Mac test on Intel platform are included.

There is no work on Windows / Linux binaries.

@martindevans
Copy link
Member

What I was meaning about the nuspec is at the moment it does this:

<file src="runtimes/libllama.dylib" target="runtimes\osx-x64\native\libllama.dylib" />
<file src="runtimes/ggml-metal.metal" target="runtimes\osx-x64\native\ggml-metal.metal" />
<file src="runtimes/libllama.dylib" target="runtimes\osx-arm64\native\libllama.dylib" />
<file src="runtimes/ggml-metal.metal" target="runtimes\osx-arm64\native\ggml-metal.metal" />

i.e. the same files on ARM64 and x64.

Should that be changed to something like this:

<file src="runtimes/macos-x86_64/libllama.dylib" target="runtimes\macos-x86_64\native\libllama.dylib" />
<file src="runtimes/macos-arm64/libllama.dylib" target="runtimes\macos-arm64\native\libllama.dylib" />
<file src="runtimes/macos-arm64/ggml-metal.metal" target="runtimes\macos-arm64\native\ggml-metal.metal" />

Note: I haven't worked with nuspec before, so I'm not sure that's correct!

@SignalRT
Copy link
Collaborator Author

SignalRT commented Nov 7, 2023

@martindevans, @AsakusaRinne I did not change any nuget package because there are several possible directions with these packages. With the dynamic library load it could be only "one" nuget package, as the most extreme solution, a nuget package by OS version, etc, etc.

Talking about the Mac use case only, to limit the options, the intel build only supports memory, the arm build supports with the same binary memory and GPU with the configuration of GPU layers so it could have sense to introduce all this package in only ONE nuget package. But this package is not CPU only and not METAL. It could be just MacOS package, it will use the best possible option.

Would we keep the current packages?, Would we think about another nuget package distribution (by OS for example)?.

Once I have clear if you want to keep the current nuget packages or to group the packages with another layout I can make the changes.

@AsakusaRinne
Copy link
Collaborator

AsakusaRinne commented Nov 7, 2023

@SignalRT Thank you a lot for this great work! Each of these options has its own advantages and disadvantages, and we need to make a choice among them.

In my opinion, the following options are ranked from high to low priority:

  1. Keep all binaries in one package, but allow using a self-compiled binary without installing the backend.
  2. Split binaries by OS.
  3. Keep the current distribution.

However, my view me be limited. Therefore I explain the reasons of mine and I'm open for suggestions and different voices.

The binaries are not of large size, the sum of all binaries is less than 20MB, so that it won't be a burden in most of the cases. Besides, the backend not installed is one of the most frequently asked questions, which makes me think stacking all binaries in one package may be a good choice. However, we should also support the case that sometimes users want to compile themselves or just don't want to keep all binaries.

There's an even more aggressive but not bad option: keep all binaries along with our main library in LLamaSharp package. Instead, name the one without binaries LLamaSharp.Lite (for those who want to run on mobile for example).

Splitting by OS is a good compromise. To be honest I'm quite hesitant with the first two options. The risk of it comes from user -- I think the gap of changing package by computation ability to package by OS is larger than that of changing to one package.

The last one is related with the risk of option 1 and 2. Changing the package distribution has so many effects, especially on other libraries depending on us, that I'm thinking if we should update the major version to 1.0.0. However is this already a good chance to publish 1.0.0? I'm not sure, too. Keeping the current distribution has only one advantage, which is to make major version change when everything is ok in the future.

cc @martindevans @saddam213 @xbotter @Oceania2018 @sagilio I'd also like to hear from you, please.

@martindevans
Copy link
Member

One option we could consider would be the extreme version SignalRT mentioned where each package has exactly one DLL (e.g. LLamaSharp.Backend.CPU.AVX512, LLamaSharp.Backend.CUDA12.AVX2 etc), but then we distribute "meta" packages which simply depend on those other packages. e.g. LLamsSharp.Backend.CPU would depend on every single CPU package for all platforms (relying on runtime feature detection to pick the right one). That way users can pick a very specific package if they want, but the default would look at lot like it does today: either install Backend.CPU or Backend.CUDA_VERSION and it all just works.

My concern with this approach is some users might try to isntall various different specific packages to debug DLL dependency issues and end up with a bit of a mess.


For this immediate release I would vote for keeping it simple - add the MacOS binaries for both architectures into the Backend.CPU package (since that's the "default" backend, even if it's not quite the right name since MacOS+ARM64 includes Metal).

@SignalRT
Copy link
Collaborator Author

SignalRT commented Nov 7, 2023

@martindevans When I talk about extreme solution is to have only ONE package (the option 1 that refers @AsakusaRinne ).

It's OK for me to change my changes in the Backend.CPU package. In that case I think that the Metal package should dissapear.

I will try to make the changes today.

@martindevans
Copy link
Member

Oh sorry, I misunderstood what you meant.

When I was talking about "extreme" I meant we would have:

  • LLamaSharp.Backend.CPU, which depends on:
    • LLamaSharp.Backend.Windows-x64-AVX
    • LLamaSharp.Backend.Windows-x64-AVX2
    • LLamaSharp.Backend.Windows-x64-AVX512
    • LLamaSharp.Backend.Linux-x64-AVX
    • etc...
  • LLamaSharp.Backend.CUDA12, which depends on:
    • LLamaSharp.Backend.Windows-x64-CUDA12-AVX
    • LLamaSharp.Backend.Windows-x64-CUDA12-AVX2
    • LLamaSharp.Backend.Windows-x64-CUDA12-AVX512
    • LLamaSharp.Backend.Linux-x64-CUDA12-AVX
    • etc...

So we would be shipping a lot of packages. Users would have the ability to pick one exactly specific backend if they want (by just installing one single package). But they could also just install the LLamaSharp.Backend.CPU to get the "best" option for them.

I'm not actually sure I if I like this option to be honest, I'm just presenting it as another option to consider!

Delete Backend.Metal because is not needed anymore.
Do not include .metal in x86_64 binaries
@SignalRT
Copy link
Collaborator Author

SignalRT commented Nov 7, 2023

Eliminated Backend.Metal
Included all the MacOS binaries and metal y Backed.CPU.

@AsakusaRinne
Copy link
Collaborator

Oh sorry, I misunderstood what you meant.

When I was talking about "extreme" I meant we would have:

  • LLamaSharp.Backend.CPU, which depends on:

    • LLamaSharp.Backend.Windows-x64-AVX
    • LLamaSharp.Backend.Windows-x64-AVX2
    • LLamaSharp.Backend.Windows-x64-AVX512
    • LLamaSharp.Backend.Linux-x64-AVX
    • etc...
  • LLamaSharp.Backend.CUDA12, which depends on:

    • LLamaSharp.Backend.Windows-x64-CUDA12-AVX
    • LLamaSharp.Backend.Windows-x64-CUDA12-AVX2
    • LLamaSharp.Backend.Windows-x64-CUDA12-AVX512
    • LLamaSharp.Backend.Linux-x64-CUDA12-AVX
    • etc...

So we would be shipping a lot of packages. Users would have the ability to pick one exactly specific backend if they want (by just installing one single package). But they could also just install the LLamaSharp.Backend.CPU to get the "best" option for them.

I'm not actually sure I if I like this option to be honest, I'm just presenting it as another option to consider!

I'll agree with that if the number of packages is not so large. However, as you can see, the combinations could reach 20, which may confuse the users who are not familiar with this scope. Besides, if there're other options added to llama.cpp in the future, we'll have to release new packages, which seems to be inconvenience. Anyway, I really thanks for your suggestion.

@SignalRT @martindevans Can the feature detection select the best package now? I think stacking binaries in one package is the best if it can. Otherwise we should keep the current distribution. But I'm still open for other voices.

@martindevans
Copy link
Member

We can put everything in one package except CUDA, we don't currently have a runtime way to detect that. In theory it could be done in the future, and then we could truly have just one package with all backends.

@AsakusaRinne
Copy link
Collaborator

@martindevans @SignalRT I think I found a method to detect cuda and avx support. I've pushed my changes in #268, please take a review of it. Since it's already too late for me, it's only a template now and I haven't verified if it works. If it does work, we could have one more choice which is to cherry-pick my commit and support feature detection based on it. The file structure I listed in that PR is only for experiment, please feel easy to change it.

BTW, whether we detect cuda and avx or not, we should consider how to provide a convenient way for user to use LLamaSharp with self-compiled DLL. I haven't had a good idea yet. Do you have any idea about it?

@SignalRT
Copy link
Collaborator Author

SignalRT commented Nov 9, 2023

Reverted #268

@martindevans
Copy link
Member

What's left to do before we merge this PR?

@SignalRT
Copy link
Collaborator Author

@martindevans To test the nuget package. I will do it only if there is no layout changes before release. If the plan is to release changes on nuget packages better wait to test the final configuration.

@martindevans
Copy link
Member

It's up to @AsakusaRinne but I think the plan is to do a release as soon as this is merged?

@SignalRT SignalRT marked this pull request as ready for review November 10, 2023 15:28
@AsakusaRinne
Copy link
Collaborator

It's up to @AsakusaRinne but I think the plan is to do a release as soon as this is merged?

Yes, I think we'll not change the distribution of nuget packages and include #244 in the next release. It seems that we didn't reach an agreement and I'll open a vote after investigating the possibility of cuda feature detection. To be honest I'm also not sure about which approach the users like. 😶‍🌫️

@AsakusaRinne
Copy link
Collaborator

There's a conflict caused by downgrading semantic-kernel version in examples in #244. @SignalRT Could you please resolve it and test the nuget package on MACOS? I'd like to help if there's problem generating nuget package (I don't have a MACOS). Thank you for this hard work!

@AsakusaRinne AsakusaRinne mentioned this pull request Nov 11, 2023
5 tasks
@SignalRT
Copy link
Collaborator Author

@AsakusaRinne I think it is ok. I build the nuget package an test the CPU package on ARM. The test pass OK.

I changed the prepare_release.sh script to be able to execute this on osx (there are some pending changes), I hope not to break nothing on linux execution but osx bash seems to have some incompatibilities.

Copy link
Collaborator

@AsakusaRinne AsakusaRinne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's a good work. :) However we'll delay the next release for somewhile for the two reasons:

  1. I have to test on my fork to check if packages could be generated correctly by the workflow.
  2. Noticed that this PR removed MacMetal backend, which is a break change. Therefore we'll publish a minor release instead, and include feat: cuda feature detection. #275 if possible.

.github/prepare_release.sh Show resolved Hide resolved
@AsakusaRinne
Copy link
Collaborator

One more question, why does the MAC ci so slow 🤣 It won't block the merging of this PR though

@martindevans
Copy link
Member

martindevans commented Nov 11, 2023

It's been waiting for an entire hour on this step!?

Waiting for a runner to pick up this job...

@AsakusaRinne
Copy link
Collaborator

It's been waiting for an entire hour on this step!?

Not sure if it's because the limited resources of github ci cluster. But anyway, one hour waiting is too long. Is MAC server very rare? 😹

@SignalRT
Copy link
Collaborator Author

@AsakusaRinne ....Occam's razor.... It was a mismatch renaming macOS to osx.

@AsakusaRinne AsakusaRinne merged commit ed479d1 into SciSharp:master Nov 12, 2023
5 checks passed
@SignalRT SignalRT deleted the RuntimeDetection branch November 13, 2023 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants