Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF #122

Merged
merged 16 commits into from
Sep 2, 2023
Merged

GGUF #122

merged 16 commits into from
Sep 2, 2023

Conversation

martindevans
Copy link
Member

@martindevans martindevans commented Aug 25, 2023

Initial changes required to support GGUF. Using these changes and llama.cpp b1081 successfully loaded this model. Resolve #121.

This is just a preview, required changes (all resolved)

  • Modify LLama.Unittest project to auto download a gguf version of basic llama2-chat (closest equivalent of llama-2-7b-chat.ggmlv3, which is currently used)
    • I've switched to codellama-7b
    • Switched back to LLama2-7B
  • New binaries
    • Most binaries generated in new GitHub action pipeline
    • MacOS Metal binaries created by SignalRT
  • Testing on MacOS
    • Tested by SignalRT and swrhim

…ne in llama.cpp with a few different options:

 - Just convert it to a `string`, nice and simple
 - Write the bytes to a `Span<byte>` no allocations
 - Write the chars to a `StringBuilder` potentially no allocations
@saddam213
Copy link
Collaborator

saddam213 commented Aug 28, 2023

Would it make sense to usellama_token_to_piece instead of llama_token_to_piece_with_model and move the Tokenize methods from LLamaModelHandle to LLamaContextHandle now?

@martindevans
Copy link
Member Author

Keepig it in the model handle allows tokenizing without having to allocate a context, so I think keeping it in model makes more sense.

@AsakusaRinne
Copy link
Collaborator

Good job!. Is it ready to publish a new version of binary now? And maybe along with a new version of the main package?

@martindevans
Copy link
Member Author

Is it ready to publish a new version of binary now

We do need new binaries for this to work.

The new GitHub action I've been working on builds everything except for Metal. It's running right now, I'll copy across the binaries in this PR once it finishes.

However, that doesn't do Metal yet. I need some help building that and/or setting it up in the action.

And maybe along with a new version of the main package?

Definitely! There's a lot of new features and I think not supporting GGUF will confuse people using the library (e.g. #129).

@martindevans
Copy link
Member Author

martindevans commented Aug 28, 2023

Does Not Include:

  • Linux CUDA (added from GitHub action)
  • MacOS metal (added by SignalRT)

@AsakusaRinne
Copy link
Collaborator

That's great. I could help to compile Linux CUDA and metal binary. However unfortunately I don't know how to added these two processes to actions, either.

@martindevans
Copy link
Member Author

I managed to figure out how to build the Linux CUDA binaries (I'll add them once the action finishes running).

I really have no idea how to do metal though. If you could add the metal stuff into this PR that would be great.

Do you have a script you use to build the metal stuff? If you do then I can probably take that and adapt it into the GitHub action.

@drasticactions
Copy link
Contributor

Compiling Metal should be adding -DLLAMA_METAL=ON to the macOS Cmake config

You could probably do what was done for the Windows builds to have an Includes for a non-metal and metal build.

@martindevans
Copy link
Member Author

Aha, thanks for the tip @drasticactions!

One thing I'm a bit confused about with the metal build is there's libllama-metal.dylib in the runtimes folder (which I assume -DLLAMA_METAL=ON will produce) but there's also ggml-metal.metal which seems to be a source file. Is this really something we need to distribute with the binaries? If so, I assume I should just grab the file from llama.cpp (https://github.com/ggerganov/llama.cpp/blob/master/ggml-metal.metal)?

@drasticactions
Copy link
Contributor

@martindevans Yes, it's part of the build output and read in the native library, https://github.com/ggerganov/llama.cpp/blob/bcce96ba4dd95482824700c4ce2455fe8c49055a/ggml-metal.m#L143-L156

But it is a build artifact, so instead of having it in source control, IMO, bundling it as part of the runtime library would make more sense, since that's where it's used and will ensure the exact version is put into the right place.

@martindevans
Copy link
Member Author

Thanks again, I think that should be able to piece together a metal build now 👍

@AsakusaRinne I probably won't be able to look into doing this GitHub action+Metal stuff until next weekend, so if you want to go ahead with manually building the metal stuff and doing the nuget release that's fine by me.

@AsakusaRinne
Copy link
Collaborator

AsakusaRinne commented Aug 29, 2023

@martindevans Is the metal binary and this GGUF support the only barrier before the next release? I'm free to compile them and make a release these days.

@martindevans
Copy link
Member Author

Yes, I think so 🥳

@swrhim
Copy link

swrhim commented Aug 30, 2023

I was able to build this branch and implement the example from the docs. One replacement I had to do from the example was replace LLamaModel with LLamaContext. After that, I was able to load the model successfully and run it locally on mac arm64

This commit includes changes to compile en VS Mac + changest to use llama2 not codellama.

It includes MacOS binaries in memory and metal
@SignalRT
Copy link
Collaborator

@martindevans
In this branch (https://github.com/SignalRT/LLamaSharp/tree/GGUF-MERGE) I update MacOS binaries (memory and metal), change code llama for a llama2 model and change on test that doesn't compile en VS Mac if your are interested.

All the test pass correctly and at least the basic manual test also works

@martindevans
Copy link
Member Author

@SignalRT could you PR those changes into this branch (on my fork)? I'll merge it and it'll become part of this PR.

Changes to compile in VS Mac + change model to llama2
@martindevans martindevans marked this pull request as ready for review August 31, 2023 13:50
@martindevans
Copy link
Member Author

@AsakusaRinne The final binaries have been added by SignalRT, so this is now fully ready to merge and release when you're ready.

@Arlodotexe
Copy link

Eagerly awaiting 👀

@theolivenbaum
Copy link

Same here! 🚀
(if it helps, tested the PR on Windows, works fine with CPU and CUDA)

@AsakusaRinne
Copy link
Collaborator

Great job! I'll merge it and publish a new release today. Thank you for the contributions from martindevans and other developers.❤️

@AsakusaRinne AsakusaRinne merged commit 4e83e48 into SciSharp:master Sep 2, 2023
4 checks passed
@martindevans martindevans deleted the gguf branch September 2, 2023 12:43
@Arlodotexe Arlodotexe mentioned this pull request Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for GGUF format.
8 participants