Triton Model Navigator v0.11.0
kacper-kleczewski
released this
05 Aug 12:44
·
29 commits
to main
since this release
-
Updates:
- new: Python 3.12 support
- new: Improved logging
- new: optimized in-place module can be stored to Triton model repository
- new: multi-profile support for TensorRT model build and runtime
- new: measure duration of each command executed in optimization pipeline
- new: TensorRT-LLM model store generation for deployment on Triton Inference Server
- change: filter unsupported runners instead of raising an error when running optimize
- change: moved JAX to support to experimental module and limited support
- change: use autocast=True for Torch based runners
- change: use torch.inference_mode or torch.no_grad context in
nav.profile
measurements - change: use multiple strategies to select optimized runtime, defaults to [
MaxThroughputAndMinLatencyStrategy
,MinLatencyStrategy
] - change:
trt_profiles
are not set automatically for module when usingnav.optimize
- fix: properly revert log level after torch onnx dynamo export
-
Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.