Release Triton Model Navigator v0.7.5 · triton-inference-server/model_navigator

Updates:

new: FP8 precision support for TensorRT
new: Support for autocast and inference mode configuration for Torch runners
new: Allow to select device for Torch and ONNX runners
new: Add support for default_model_filename in Triton model configuration
new: Detailed profiling of inference steps (pre- and postprocessing, memcpy and compute)
fix: JAX export and TensorRT conversion fails when custom workspace is used
fix: Missing max workspace size passed to TensorRT conversion
fix: Execution of TensorRT optimize raise error during handling output metadata
fix: Limited Polygraphy version to work correctly with onnxruntime-gpu package

Version of external components used during testing:

PyTorch 2.2.0a0+6a974be
TensorFlow 2.13.0
TensorRT 8.6.1
ONNX Runtime 1.16.2
Polygraphy: 0.49.0
GraphSurgeon: 0.3.27
tf2onnx v1.15.1
Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.

Provide feedback