Use request priority with the ensemble scheduler #6019
Unanswered
chandanidoshi
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are using an ensemble model with the first two steps being a python backend tokenizer and an ONNX model to get embeddings and have enabled dynamic batching in the ONNX model. We want to set request priority in the ONNX model but it seems like the priority level is not getting passed through to it in the ensemble. We have checked that priority works when we make requests directly to the ONNX model and it also seems to be working if the ONNX model is the first step in the ensemble scheduler. What should we do to ensure the request priority gets passed through to the ONNX model when we include the tokenizer in the ensemble?
ONNX config:
Ensemble config:
Beta Was this translation helpful? Give feedback.
All reactions