Efficiently Serving Many Transformer Adapters #5449
smellslikeml
started this conversation in
Ideas
Replies: 2 comments
-
Recently found Batched LoRAs For those interested in running with Triton, here is an implementation |
Beta Was this translation helpful? Give feedback.
0 replies
-
Now we have S-LoRA |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Transformer adapters are used for efficient fine-tuning, can we optimize tritonserver for serving inference for many adapters in a way which efficiently shares memory for models?
Beta Was this translation helpful? Give feedback.
All reactions