Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc]: [Question] vLLM's model loading & instance contract, 1 model per vLLM instance, or multiple models per vLLM instance #9429

Open
1 task done
yx-lamini opened this issue Oct 16, 2024 · 1 comment
Labels

Comments

@yx-lamini
Copy link

Anything you want to discuss about vllm.

Can vLLM consider support multiple models on the same vLLM instance?

We are evaluating using vLLM for large scale LLM inference serving. But we are concerned by the limit of 1 model per vLLM instances, as we are serving small models on beefy GPUs (which are keep getting bigger).

Managing multiple vLLM instances for each model on the same GPU is very challenging.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@yx-lamini yx-lamini added the misc label Oct 16, 2024
@russellb
Copy link
Collaborator

This is listed under "Help Wanted" on this roadmap issue: #9006. I suggest you keep an eye on that for now to see if someone picks it up.

Does that answer your question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants