Releases: triton-inference-server/model_navigator
Triton Model Navigator v0.12.0
-
Updates:
- new: simple and detailed reporting of the optimization process
- new: adjusted exporting TensorFlow SavedModel for Keras 3.x
- new: inform user when wrapped a module which is not called during optimize
- new: inform user when module use a custom forward function
- new: support for dynamic shapes in Torch ExportedProgram
- new: use ExportedProgram for Torch-TensorRT conversion
- new: support back-off policy during profiling to avoid reporting local minimum
- new: automatically scale conversion batch size when modules have different batch sizes in scope of a single pipeline
- change: TensorRT conversion max batch size search rely on saturating throughput for base formats
- change: adjusted profiling configuration for throughput cutoff search
- change: include optimized pipeline to list of examined variants during
nav.profile
- change: performance is not executed when correctness failed for format and runtime
- change: verify command is not executed when verify function is not provided
- change: do not create a model copy before executing
torch.compile
- fix: pipelines sometimes obtain model and tensors on different devices during
nav.profile
- fix: extract graph from ExportedProgram for running inference
- fix: runner configuration not propagated to pre-processing steps
-
Version of external components used during testing:
- PyTorch 2.4.0a0+3bcc3cddb5
- TensorFlow 2.16.1
- TensorRT 10.3.0.26
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.12
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.11.0
-
Updates:
- new: Python 3.12 support
- new: Improved logging
- new: optimized in-place module can be stored to Triton model repository
- new: multi-profile support for TensorRT model build and runtime
- new: measure duration of each command executed in optimization pipeline
- new: TensorRT-LLM model store generation for deployment on Triton Inference Server
- change: filter unsupported runners instead of raising an error when running optimize
- change: moved JAX to support to experimental module and limited support
- change: use autocast=True for Torch based runners
- change: use torch.inference_mode or torch.no_grad context in
nav.profile
measurements - change: use multiple strategies to select optimized runtime, defaults to [
MaxThroughputAndMinLatencyStrategy
,MinLatencyStrategy
] - change:
trt_profiles
are not set automatically for module when usingnav.optimize
- fix: properly revert log level after torch onnx dynamo export
-
Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.10.1
-
Updates:
- fix: Check if torch 2 is available before doing dynamo cleanup
-
Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.0
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.10.0
-
Updates:
- new: inplace
nav.Module
acceptsbatching
flag which overrides a config setting andprecision
which allows setting appropriate configuration for TensorRT - new: Allow to set device when loading optimized modules using
nav.load_optimized()
- new: Add support for custom i/o names and dynamic shapes in Torch ONNX Dynamo path
- new: Added
nav.bundle.save
andnav.bundle.load
to save and load optimized models from cache - change: Improved optimize and profile status in inplace mode
- change: Improved handling defaults for ONNX Dynamo when executing
nav.package.optimize
- fix: Maintaining modules device in
nav.profile()
- fix: Add support for all precisions for TensorRT in
nav.profile()
- fix: Forward method not passed to other inplace modules.
- new: inplace
-
Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.0
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.9.0
-
Updates:
- new: TensorRT Timing Tactics Cache Management - using timing tactics cache files for optimization performance improvements
- new: Added throughput saturation verification in
nav.profile()
(enabled by default) - new: Allow to override Inplace cache dir through
MODEL_NAVIGATOR_DEFAULT_CACHE_DIR
env variable - new: inplace
nav.Module
can now receive a function name to be used instead of call in modules/submodules, allows customizing modules with non-standard calls - fix: torch dynamo export and torch dynamo onnx export
- fix: measurement stabilization in
nav.profile()
- fix: inplace inference through Torch
- fix: trt_profiles argument handling in ONNX to TRT conversion
- fix: optimal shape configuration for batch size in Inplace API
- change: Disable TensorRT profile builder
- change:
nav.optimize()
does not override module configuration
-
Known issues and limitations
- DistillERT ONNX dynamo export does not support dynamic shapes
-
Version of external components used during testing:
- PyTorch 2.3.0a0+6ddf5cf85e
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.8.1
- fix: Inference with TensorRT when model has input with empty shape
- fix: Using stabilized runners when model has no batching
- fix: Invalid dependencies for cuDNN - review known issues
- fix: Make ONNX Graph Surgeon produce artifacts within protobuf Limit (2G)
- change: Remove TensorRTCUDAGraph from default runners
- change: updated ONNX package to 1.16.0
- Version of external components used during testing:
- PyTorch 2.3.0a0+40ec155e58
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.8.0
Updates:
-
new: Allow to select device for TensorRT runner
-
new: Add device output buffers to TensorRT runner
-
new: nav.profile added for profiling any Python function
-
change: API for Inplace optimization (breaking change)
-
fix: Passing inputs for Torch to ONNX export
-
fix: Parse args to kwargs in torchscript-trace export
-
fix: Lower peak memory usage when loading Torch inplace optimized model
-
Version of external components used during testing:
- PyTorch 2.3.0a0+ebedce2
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.7.7
Updates:
- change: Add input and output specs for Triton model repositories generated from packages
Version of external components used during testing:
- PyTorch 2.2.0a0+81ea7a48
- TensorFlow 2.14.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.7.6
Updates:
- fix: Passing inputs for Torch to ONNX export
- fix: Passing input data to OnnxCUDA runner
Version of external components used during testing:
- PyTorch 2.2.0a0+81ea7a48
- TensorFlow 2.14.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.
Triton Model Navigator v0.7.5
Updates:
- new: FP8 precision support for TensorRT
- new: Support for autocast and inference mode configuration for Torch runners
- new: Allow to select device for Torch and ONNX runners
- new: Add support for
default_model_filename
in Triton model configuration - new: Detailed profiling of inference steps (pre- and postprocessing, memcpy and compute)
- fix: JAX export and TensorRT conversion fails when custom workspace is used
- fix: Missing max workspace size passed to TensorRT conversion
- fix: Execution of TensorRT optimize raise error during handling output metadata
- fix: Limited Polygraphy version to work correctly with onnxruntime-gpu package
Version of external components used during testing:
- PyTorch 2.2.0a0+6a974be
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions.
See its support matrix
for a detailed summary.