[Misc]: [Question] vLLM's model loading & instance contract, 1 model per vLLM instance, or multiple models per vLLM instance #9429

yx-lamini · 2024-10-16T18:16:28Z

Anything you want to discuss about vllm.

Can vLLM consider support multiple models on the same vLLM instance?

We are evaluating using vLLM for large scale LLM inference serving. But we are concerned by the limit of 1 model per vLLM instances, as we are serving small models on beefy GPUs (which are keep getting bigger).

Managing multiple vLLM instances for each model on the same GPU is very challenging.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

russellb · 2024-10-16T19:05:19Z

This is listed under "Help Wanted" on this roadmap issue: #9006. I suggest you keep an eye on that for now to see if someone picks it up.

Does that answer your question?

yx-lamini added the misc label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc]: [Question] vLLM's model loading & instance contract, 1 model per vLLM instance, or multiple models per vLLM instance #9429

[Misc]: [Question] vLLM's model loading & instance contract, 1 model per vLLM instance, or multiple models per vLLM instance #9429

yx-lamini commented Oct 16, 2024

russellb commented Oct 16, 2024

[Misc]: [Question] vLLM's model loading & instance contract, 1 model per vLLM instance, or multiple models per vLLM instance #9429

[Misc]: [Question] vLLM's model loading & instance contract, 1 model per vLLM instance, or multiple models per vLLM instance #9429

Comments

yx-lamini commented Oct 16, 2024

Anything you want to discuss about vllm.

Before submitting a new issue...

russellb commented Oct 16, 2024