Triton Model Navigator v0.11.0

kacper-kleczewski released this 05 Aug 12:44

· 29 commits to main since this release

Updates:
- new: Python 3.12 support
- new: Improved logging
- new: optimized in-place module can be stored to Triton model repository
- new: multi-profile support for TensorRT model build and runtime
- new: measure duration of each command executed in optimization pipeline
- new: TensorRT-LLM model store generation for deployment on Triton Inference Server
- change: filter unsupported runners instead of raising an error when running optimize
- change: moved JAX to support to experimental module and limited support
- change: use autocast=True for Torch based runners
- change: use torch.inference_mode or torch.no_grad context in nav.profile measurements
- change: use multiple strategies to select optimized runtime, defaults to [MaxThroughputAndMinLatencyStrategy, MinLatencyStrategy]
- change: trt_profiles are not set automatically for module when using nav.optimize
- fix: properly revert log level after torch onnx dynamo export
Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
  See its support matrix
  for a detailed summary.

Assets 3