Skip to content

Releases: triton-inference-server/model_navigator

Triton Model Navigator v0.12.0

10 Sep 12:27
Compare
Choose a tag to compare
  • Updates:

    • new: simple and detailed reporting of the optimization process
    • new: adjusted exporting TensorFlow SavedModel for Keras 3.x
    • new: inform user when wrapped a module which is not called during optimize
    • new: inform user when module use a custom forward function
    • new: support for dynamic shapes in Torch ExportedProgram
    • new: use ExportedProgram for Torch-TensorRT conversion
    • new: support back-off policy during profiling to avoid reporting local minimum
    • new: automatically scale conversion batch size when modules have different batch sizes in scope of a single pipeline
    • change: TensorRT conversion max batch size search rely on saturating throughput for base formats
    • change: adjusted profiling configuration for throughput cutoff search
    • change: include optimized pipeline to list of examined variants during nav.profile
    • change: performance is not executed when correctness failed for format and runtime
    • change: verify command is not executed when verify function is not provided
    • change: do not create a model copy before executing torch.compile
    • fix: pipelines sometimes obtain model and tensors on different devices during nav.profile
    • fix: extract graph from ExportedProgram for running inference
    • fix: runner configuration not propagated to pre-processing steps
  • Version of external components used during testing:

Triton Model Navigator v0.11.0

05 Aug 12:44
Compare
Choose a tag to compare
  • Updates:

    • new: Python 3.12 support
    • new: Improved logging
    • new: optimized in-place module can be stored to Triton model repository
    • new: multi-profile support for TensorRT model build and runtime
    • new: measure duration of each command executed in optimization pipeline
    • new: TensorRT-LLM model store generation for deployment on Triton Inference Server
    • change: filter unsupported runners instead of raising an error when running optimize
    • change: moved JAX to support to experimental module and limited support
    • change: use autocast=True for Torch based runners
    • change: use torch.inference_mode or torch.no_grad context in nav.profile measurements
    • change: use multiple strategies to select optimized runtime, defaults to [MaxThroughputAndMinLatencyStrategy, MinLatencyStrategy]
    • change: trt_profiles are not set automatically for module when using nav.optimize
    • fix: properly revert log level after torch onnx dynamo export
  • Version of external components used during testing:

Triton Model Navigator v0.10.1

26 Jun 19:23
Compare
Choose a tag to compare

Triton Model Navigator v0.10.0

24 Jun 12:49
Compare
Choose a tag to compare
  • Updates:

    • new: inplace nav.Module accepts batching flag which overrides a config setting and precision which allows setting appropriate configuration for TensorRT
    • new: Allow to set device when loading optimized modules using nav.load_optimized()
    • new: Add support for custom i/o names and dynamic shapes in Torch ONNX Dynamo path
    • new: Added nav.bundle.save and nav.bundle.load to save and load optimized models from cache
    • change: Improved optimize and profile status in inplace mode
    • change: Improved handling defaults for ONNX Dynamo when executing nav.package.optimize
    • fix: Maintaining modules device in nav.profile()
    • fix: Add support for all precisions for TensorRT in nav.profile()
    • fix: Forward method not passed to other inplace modules.
  • Version of external components used during testing:

Triton Model Navigator v0.9.0

07 May 18:21
Compare
Choose a tag to compare
  • Updates:

    • new: TensorRT Timing Tactics Cache Management - using timing tactics cache files for optimization performance improvements
    • new: Added throughput saturation verification in nav.profile() (enabled by default)
    • new: Allow to override Inplace cache dir through MODEL_NAVIGATOR_DEFAULT_CACHE_DIR env variable
    • new: inplace nav.Module can now receive a function name to be used instead of call in modules/submodules, allows customizing modules with non-standard calls
    • fix: torch dynamo export and torch dynamo onnx export
    • fix: measurement stabilization in nav.profile()
    • fix: inplace inference through Torch
    • fix: trt_profiles argument handling in ONNX to TRT conversion
    • fix: optimal shape configuration for batch size in Inplace API
    • change: Disable TensorRT profile builder
    • change: nav.optimize() does not override module configuration
  • Known issues and limitations

    • DistillERT ONNX dynamo export does not support dynamic shapes
  • Version of external components used during testing:

Triton Model Navigator v0.8.1

04 Apr 14:04
Compare
Choose a tag to compare
  • fix: Inference with TensorRT when model has input with empty shape
  • fix: Using stabilized runners when model has no batching
  • fix: Invalid dependencies for cuDNN - review known issues
  • fix: Make ONNX Graph Surgeon produce artifacts within protobuf Limit (2G)
  • change: Remove TensorRTCUDAGraph from default runners
  • change: updated ONNX package to 1.16.0

Triton Model Navigator v0.8.0

22 Mar 16:23
Compare
Choose a tag to compare

Updates:

Triton Model Navigator v0.7.7

09 Feb 05:43
Compare
Choose a tag to compare

Updates:

  • change: Add input and output specs for Triton model repositories generated from packages

Version of external components used during testing:

Triton Model Navigator v0.7.6

29 Jan 12:00
Compare
Choose a tag to compare

Updates:

  • fix: Passing inputs for Torch to ONNX export
  • fix: Passing input data to OnnxCUDA runner

Version of external components used during testing:

Triton Model Navigator v0.7.5

20 Dec 18:37
Compare
Choose a tag to compare

Updates:

  • new: FP8 precision support for TensorRT
  • new: Support for autocast and inference mode configuration for Torch runners
  • new: Allow to select device for Torch and ONNX runners
  • new: Add support for default_model_filename in Triton model configuration
  • new: Detailed profiling of inference steps (pre- and postprocessing, memcpy and compute)
  • fix: JAX export and TensorRT conversion fails when custom workspace is used
  • fix: Missing max workspace size passed to TensorRT conversion
  • fix: Execution of TensorRT optimize raise error during handling output metadata
  • fix: Limited Polygraphy version to work correctly with onnxruntime-gpu package

Version of external components used during testing: