-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGUF #122
GGUF #122
Conversation
…ne in llama.cpp with a few different options: - Just convert it to a `string`, nice and simple - Write the bytes to a `Span<byte>` no allocations - Write the chars to a `StringBuilder` potentially no allocations
Fix Tokenize of new line, Remove space inserts
Would it make sense to use |
Keepig it in the model handle allows tokenizing without having to allocate a context, so I think keeping it in model makes more sense. |
Good job!. Is it ready to publish a new version of binary now? And maybe along with a new version of the main package? |
We do need new binaries for this to work. The new GitHub action I've been working on builds everything except for Metal. It's running right now, I'll copy across the binaries in this PR once it finishes. However, that doesn't do Metal yet. I need some help building that and/or setting it up in the action.
Definitely! There's a lot of new features and I think not supporting GGUF will confuse people using the library (e.g. #129). |
Does Not Include:
|
…clude missing binaries
That's great. I could help to compile Linux CUDA and metal binary. However unfortunately I don't know how to added these two processes to actions, either. |
I managed to figure out how to build the Linux CUDA binaries (I'll add them once the action finishes running). I really have no idea how to do metal though. If you could add the metal stuff into this PR that would be great. Do you have a script you use to build the metal stuff? If you do then I can probably take that and adapt it into the GitHub action. |
Compiling Metal should be adding You could probably do what was done for the Windows builds to have an Includes for a non-metal and metal build. |
Aha, thanks for the tip @drasticactions! One thing I'm a bit confused about with the metal build is there's |
@martindevans Yes, it's part of the build output and read in the native library, https://github.com/ggerganov/llama.cpp/blob/bcce96ba4dd95482824700c4ce2455fe8c49055a/ggml-metal.m#L143-L156 But it is a build artifact, so instead of having it in source control, IMO, bundling it as part of the runtime library would make more sense, since that's where it's used and will ensure the exact version is put into the right place. |
Thanks again, I think that should be able to piece together a metal build now 👍 @AsakusaRinne I probably won't be able to look into doing this GitHub action+Metal stuff until next weekend, so if you want to go ahead with manually building the metal stuff and doing the nuget release that's fine by me. |
@martindevans Is the metal binary and this GGUF support the only barrier before the next release? I'm free to compile them and make a release these days. |
Yes, I think so 🥳 |
I was able to build this branch and implement the example from the docs. One replacement I had to do from the example was replace LLamaModel with LLamaContext. After that, I was able to load the model successfully and run it locally on mac arm64 |
This commit includes changes to compile en VS Mac + changest to use llama2 not codellama. It includes MacOS binaries in memory and metal
@martindevans All the test pass correctly and at least the basic manual test also works |
@SignalRT could you PR those changes into this branch (on my fork)? I'll merge it and it'll become part of this PR. |
Changes to compile in VS Mac + change model to llama2
@AsakusaRinne The final binaries have been added by SignalRT, so this is now fully ready to merge and release when you're ready. |
Eagerly awaiting 👀 |
Same here! 🚀 |
Great job! I'll merge it and publish a new release today. Thank you for the contributions from martindevans and other developers.❤️ |
Initial changes required to support GGUF. Using these changes and llama.cpp b1081 successfully loaded this model. Resolve #121.
This is just a preview, required changes (all resolved)
LLama.Unittest
project to auto download a gguf version of basic llama2-chat (closest equivalent of llama-2-7b-chat.ggmlv3, which is currently used)I've switched to codellama-7b