Inference with multiple backends? #5553
-
Suppose that I have the following model repo:
Also, the models should be executed in the following manner:
Now assume that we have hundreds of model versions in different. So, by default we only load the v0 of all of the models due to gpu memory limitations. Now we use "Model Control Mode EXPLICIT" to load/unload the different model versions depending on the input request. Do I need to write a custom backend to handle the model execution or is this achievable using Triton's built-in functionality? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
There should be no issue with running inference on models using multiple backends. Triton will load them, as needed, based on the models you load. That said, I don't think you can do the above with most of the backgrounds. Different model versions still use the same config, so they'll need to use the same backend. It may be possible to do the above with a custom backend or the Python backend. |
Beta Was this translation helpful? Give feedback.
There should be no issue with running inference on models using multiple backends. Triton will load them, as needed, based on the models you load.
That said, I don't think you can do the above with most of the backgrounds. Different model versions still use the same config, so they'll need to use the same backend. It may be possible to do the above with a custom backend or the Python backend.