GGUF #122

martindevans · 2023-08-25T14:53:42Z

Initial changes required to support GGUF. Using these changes and llama.cpp b1081 successfully loaded this model. Resolve #121.

This is just a preview, required changes (all resolved)

Modify LLama.Unittest project to auto download a gguf version of basic llama2-chat (closest equivalent of llama-2-7b-chat.ggmlv3, which is currently used)
- ~~I've switched to codellama-7b~~
- Switched back to LLama2-7B
New binaries
- Most binaries generated in new GitHub action pipeline
- MacOS Metal binaries created by SignalRT
Testing on MacOS
- Tested by SignalRT and swrhim

LLama.Examples/LLama.Examples.csproj

…ne in llama.cpp with a few different options: - Just convert it to a `string`, nice and simple - Write the bytes to a `Span<byte>` no allocations - Write the chars to a `StringBuilder` potentially no allocations

Fix Tokenize of new line, Remove space inserts

saddam213 · 2023-08-28T00:47:45Z

Would it make sense to usellama_token_to_piece instead of llama_token_to_piece_with_model and move the Tokenize methods from LLamaModelHandle to LLamaContextHandle now?

martindevans · 2023-08-28T01:00:15Z

Keepig it in the model handle allows tokenizing without having to allocate a context, so I think keeping it in model makes more sense.

AsakusaRinne · 2023-08-28T15:44:55Z

Good job!. Is it ready to publish a new version of binary now? And maybe along with a new version of the main package?

martindevans · 2023-08-28T17:21:37Z

Is it ready to publish a new version of binary now

We do need new binaries for this to work.

The new GitHub action I've been working on builds everything except for Metal. It's running right now, I'll copy across the binaries in this PR once it finishes.

However, that doesn't do Metal yet. I need some help building that and/or setting it up in the action.

And maybe along with a new version of the main package?

Definitely! There's a lot of new features and I think not supporting GGUF will confuse people using the library (e.g. #129).

…LLamaSharp/actions/runs/6002797872/job/16279896150 Based on this version: ggerganov/llama.cpp@6b73ef1

martindevans · 2023-08-28T18:50:41Z

Added some of the binaries, generated by this action
Based on this version: ggerganov/llama.cpp@6b73ef1

Does Not Include:

~~Linux CUDA~~ (added from GitHub action)
~~MacOS metal~~ (added by SignalRT)

…clude missing binaries

AsakusaRinne · 2023-08-29T06:38:18Z

That's great. I could help to compile Linux CUDA and metal binary. However unfortunately I don't know how to added these two processes to actions, either.

martindevans · 2023-08-29T12:12:20Z

I managed to figure out how to build the Linux CUDA binaries (I'll add them once the action finishes running).

I really have no idea how to do metal though. If you could add the metal stuff into this PR that would be great.

Do you have a script you use to build the metal stuff? If you do then I can probably take that and adapt it into the GitHub action.

drasticactions · 2023-08-29T12:21:18Z

Compiling Metal should be adding -DLLAMA_METAL=ON to the macOS Cmake config

You could probably do what was done for the Windows builds to have an Includes for a non-metal and metal build.

martindevans · 2023-08-29T12:29:30Z

Aha, thanks for the tip @drasticactions!

One thing I'm a bit confused about with the metal build is there's libllama-metal.dylib in the runtimes folder (which I assume -DLLAMA_METAL=ON will produce) but there's also ggml-metal.metal which seems to be a source file. Is this really something we need to distribute with the binaries? If so, I assume I should just grab the file from llama.cpp (https://github.com/ggerganov/llama.cpp/blob/master/ggml-metal.metal)?

drasticactions · 2023-08-29T12:36:16Z

@martindevans Yes, it's part of the build output and read in the native library, https://github.com/ggerganov/llama.cpp/blob/bcce96ba4dd95482824700c4ce2455fe8c49055a/ggml-metal.m#L143-L156

But it is a build artifact, so instead of having it in source control, IMO, bundling it as part of the runtime library would make more sense, since that's where it's used and will ensure the exact version is put into the right place.

martindevans · 2023-08-29T12:44:58Z

Thanks again, I think that should be able to piece together a metal build now 👍

@AsakusaRinne I probably won't be able to look into doing this GitHub action+Metal stuff until next weekend, so if you want to go ahead with manually building the metal stuff and doing the nuget release that's fine by me.

AsakusaRinne · 2023-08-29T16:01:42Z

@martindevans Is the metal binary and this GGUF support the only barrier before the next release? I'm free to compile them and make a release these days.

martindevans · 2023-08-29T16:03:47Z

Yes, I think so 🥳

swrhim · 2023-08-30T14:27:25Z

I was able to build this branch and implement the example from the docs. One replacement I had to do from the example was replace LLamaModel with LLamaContext. After that, I was able to load the model successfully and run it locally on mac arm64

This commit includes changes to compile en VS Mac + changest to use llama2 not codellama. It includes MacOS binaries in memory and metal

SignalRT · 2023-08-30T20:39:03Z

@martindevans
In this branch (https://github.com/SignalRT/LLamaSharp/tree/GGUF-MERGE) I update MacOS binaries (memory and metal), change code llama for a llama2 model and change on test that doesn't compile en VS Mac if your are interested.

All the test pass correctly and at least the basic manual test also works

martindevans · 2023-08-30T21:08:04Z

@SignalRT could you PR those changes into this branch (on my fork)? I'll merge it and it'll become part of this PR.

Changes to compile in VS Mac + change model to llama2

martindevans · 2023-08-31T13:53:05Z

@AsakusaRinne The final binaries have been added by SignalRT, so this is now fully ready to merge and release when you're ready.

Arlodotexe · 2023-09-01T14:40:59Z

Eagerly awaiting 👀

theolivenbaum · 2023-09-01T17:44:32Z

Same here! 🚀
(if it helps, tested the PR on Windows, works fine with CPU and CUDA)

AsakusaRinne · 2023-09-02T03:54:08Z

Great job! I'll merge it and publish a new release today. Thank you for the contributions from martindevans and other developers.❤️

martindevans commented Aug 26, 2023

View reviewed changes

LLama.Examples/LLama.Examples.csproj Outdated Show resolved Hide resolved

martindevans added 5 commits August 27, 2023 00:14

Initial changes required for GGUF support

2056078

Removed LLAMA_MAX_DEVICES (not used)

6ffa28f

Passing ctx to llama_token_nl(_ctx)

0c98ae1

Switched to codellama-7b.gguf in tests (probably temporarily)

95dc12d

martindevans force-pushed the gguf branch from 5483199 to 95dc12d Compare August 26, 2023 23:20

martindevans mentioned this pull request Aug 27, 2023

No LLamaSharp backend was installed #58

Closed

saddam213 and others added 2 commits August 28, 2023 11:57

Fix Tokenize of new line, Remove space inserts

a5d742b

Merge pull request #3 from saddam213/feature/gguf

974f160

Fix Tokenize of new line, Remove space inserts

martindevans mentioned this pull request Aug 28, 2023

gguf implementation from llama.cpp broke lib? #129

Closed

Added binaries generated by this action: https://github.com/SciSharp/…

2022b82

…LLamaSharp/actions/runs/6002797872/job/16279896150 Based on this version: ggerganov/llama.cpp@6b73ef1

martindevans added 2 commits August 28, 2023 19:53

Removed hardcoded paths from projects, modified Runtime.targets to ex…

ba49ea2

…clude missing binaries

Included Linux deps

6711a59

Added binaries for CUDA+Linux

c9d08b9

martindevans mentioned this pull request Aug 29, 2023

GitHub Action Compiling Dependencies #123

Closed

Updated some of the docs

516c291

Changes to compile in VS Mac + change model to llama2

fb007e5

This commit includes changes to compile en VS Mac + changest to use llama2 not codellama. It includes MacOS binaries in memory and metal

Merge pull request #4 from SignalRT/GGUF-MERGE

cb506cc

Changes to compile in VS Mac + change model to llama2

martindevans marked this pull request as ready for review August 31, 2023 13:50

martindevans requested a review from AsakusaRinne August 31, 2023 13:53

martindevans added 2 commits September 2, 2023 02:22

Added some comments on various native methods

bcf06e2

Merge branch 'gguf' of github.com:martindevans/LLamaSharp into gguf

97349d9

AsakusaRinne approved these changes Sep 2, 2023

View reviewed changes

AsakusaRinne merged commit 4e83e48 into SciSharp:master Sep 2, 2023
4 checks passed

martindevans deleted the gguf branch September 2, 2023 12:43

Arlodotexe mentioned this pull request Sep 5, 2023

CUDA Error 12 #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF #122

GGUF #122

martindevans commented Aug 25, 2023 •

edited

Loading

saddam213 commented Aug 28, 2023 •

edited

Loading

martindevans commented Aug 28, 2023

AsakusaRinne commented Aug 28, 2023

martindevans commented Aug 28, 2023

martindevans commented Aug 28, 2023 •

edited

Loading

AsakusaRinne commented Aug 29, 2023

martindevans commented Aug 29, 2023

drasticactions commented Aug 29, 2023

martindevans commented Aug 29, 2023

drasticactions commented Aug 29, 2023

martindevans commented Aug 29, 2023

AsakusaRinne commented Aug 29, 2023 •

edited

Loading

martindevans commented Aug 29, 2023

swrhim commented Aug 30, 2023

SignalRT commented Aug 30, 2023

martindevans commented Aug 30, 2023

martindevans commented Aug 31, 2023

Arlodotexe commented Sep 1, 2023

theolivenbaum commented Sep 1, 2023

AsakusaRinne commented Sep 2, 2023

GGUF #122

GGUF #122

Conversation

martindevans commented Aug 25, 2023 • edited Loading

saddam213 commented Aug 28, 2023 • edited Loading

martindevans commented Aug 28, 2023

AsakusaRinne commented Aug 28, 2023

martindevans commented Aug 28, 2023

martindevans commented Aug 28, 2023 • edited Loading

AsakusaRinne commented Aug 29, 2023

martindevans commented Aug 29, 2023

drasticactions commented Aug 29, 2023

martindevans commented Aug 29, 2023

drasticactions commented Aug 29, 2023

martindevans commented Aug 29, 2023

AsakusaRinne commented Aug 29, 2023 • edited Loading

martindevans commented Aug 29, 2023

swrhim commented Aug 30, 2023

SignalRT commented Aug 30, 2023

martindevans commented Aug 30, 2023

martindevans commented Aug 31, 2023

Arlodotexe commented Sep 1, 2023

theolivenbaum commented Sep 1, 2023

AsakusaRinne commented Sep 2, 2023

martindevans commented Aug 25, 2023 •

edited

Loading

saddam213 commented Aug 28, 2023 •

edited

Loading

martindevans commented Aug 28, 2023 •

edited

Loading

AsakusaRinne commented Aug 29, 2023 •

edited

Loading