Triton Model Navigator v0.7.5
Updates:
- new: FP8 precision support for TensorRT
- new: Support for autocast and inference mode configuration for Torch runners
- new: Allow to select device for Torch and ONNX runners
- new: Add support for
default_model_filename
in Triton model configuration - new: Detailed profiling of inference steps (pre- and postprocessing, memcpy and compute)
- fix: JAX export and TensorRT conversion fails when custom workspace is used
- fix: Missing max workspace size passed to TensorRT conversion
- fix: Execution of TensorRT optimize raise error during handling output metadata
- fix: Limited Polygraphy version to work correctly with onnxruntime-gpu package
Version of external components used during testing:
- PyTorch 2.2.0a0+6a974be
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.