You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently have an LLM engine built on TensorRT-LLM. Trying to evaluate different setups and gains on types.
Was trying to deploy the llama model on a multi-gpu, whereby between the 4 GPUs, I would have a copy of the model running on each.
Is this possible with NVIDIA triton inference container?
The text was updated successfully, but these errors were encountered:
Currently have an LLM engine built on TensorRT-LLM. Trying to evaluate different setups and gains on types.
Was trying to deploy the llama model on a multi-gpu, whereby between the 4 GPUs, I would have a copy of the model running on each.
Is this possible with NVIDIA triton inference container?
The text was updated successfully, but these errors were encountered: