Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes/updates the following server-side components:
Upgrade CUDA drivers
11.4 -> 12.2
and NVIDIA to gpgpu (not actually part of this PR code-wise; but was necessary)use
vLLM
for batching requests and Paged Attention. Engines at0.9
fractional GPU utilisation;20GB
swap space.Add
StarCoder2-3b
as a backend model, replacingCodeGPT
andUniXCoder
.float16
.Why do we store ground truths only for accepted completions?
Store
v1
user requests underdata/user_uuid/json_uuid.json
, to avoid counting all invocations on every request. However, this brings two issues:user_uuid-json_uuid.json
touser_uuid/json_uuid.json
; but this can be done with a simple replacement command on the server.user/json
structure to make processing locally manageable).Fix User Study passthrough filter; I forgot to save before amending my last commit on the
aral_user_study
branch.Client side (
vsc
):shown_times
is used before declared.