Use request priority with the ensemble scheduler #6019

chandanidoshi · 2023-07-04T01:59:11Z

chandanidoshi
Jul 4, 2023

We are using an ensemble model with the first two steps being a python backend tokenizer and an ONNX model to get embeddings and have enabled dynamic batching in the ONNX model. We want to set request priority in the ONNX model but it seems like the priority level is not getting passed through to it in the ensemble. We have checked that priority works when we make requests directly to the ONNX model and it also seems to be working if the ONNX model is the first step in the ensemble scheduler. What should we do to ensure the request priority gets passed through to the ONNX model when we include the tokenizer in the ensemble?

ONNX config:

name: "all-mpnet-base-v2-onnx"
max_batch_size: 256
input {
  name: "input_ids"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "attention_mask"
  data_type: TYPE_INT64
  dims: -1
}
output {
  name: "last_hidden_state"
  data_type: TYPE_FP32
  dims: -1
  dims: 768
}
instance_group {
  count: 1
  kind: KIND_GPU
}
dynamic_batching {
  max_queue_delay_microseconds: 10000
  priority_levels: 3
  default_priority_level: 3
}
backend: "onnxruntime"

Ensemble config:

name: "all-mpnet-base-v2-inference"
platform: "ensemble"
max_batch_size: 256
input {
  name: "input"
  data_type: TYPE_STRING
  dims: 1
}
output {
  name: "normalized_embeddings"
  data_type: TYPE_FP32
  dims: 768
}
ensemble_scheduling {
  step {
    model_name: "all-mpnet-base-v2-tokenizer"
    model_version: -1
    input_map {
      key: "TEXT"
      value: "input"
    }
    output_map {
      key: "attention_mask"
      value: "attention_mask"
    }
    output_map {
      key: "input_ids"
      value: "input_ids"
    }
  }
  step {
    model_name: "all-mpnet-base-v2-onnx"
    model_version: -1
    input_map {
      key: "attention_mask"
      value: "attention_mask"
    }
    input_map {
      key: "input_ids"
      value: "input_ids"
    }
    output_map {
      key: "last_hidden_state"
      value: "last_hidden_state"
    }
  }
  step {
    model_name: "mean-pooling"
    model_version: -1
    input_map {
      key: "attention_mask"
      value: "attention_mask"
    }
    input_map {
      key: "last_hidden_state"
      value: "last_hidden_state"
    }
    output_map {
      key: "embeddings"
      value: "embeddings"
    }
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use request priority with the ensemble scheduler #6019

{{title}}

Replies: 0 comments

Select a reply

Use request priority with the ensemble scheduler #6019

chandanidoshi Jul 4, 2023

Replies: 0 comments

chandanidoshi
Jul 4, 2023