Skip to content

Latest commit

 

History

History
106 lines (78 loc) · 4.32 KB

install.md

File metadata and controls

106 lines (78 loc) · 4.32 KB

Recommended Installation Method

Triton SDK Container

The recommended way to "install" Perf Analyzer is to run the pre-built executable from within the Triton SDK docker container available on the NVIDIA GPU Cloud Catalog. As long as the SDK container has its network exposed to the address and port of the inference server, Perf Analyzer will be able to run.

export RELEASE=<yy.mm> # e.g. to use the release from the end of February of 2023, do `export RELEASE=23.02`

docker pull nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

docker run --gpus all --rm -it --net host nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# inside container
perf_analyzer -m <model>

Alternative Installation Methods

Pip

pip install tritonclient

perf_analyzer -m <model>

Warning: If any runtime dependencies are missing, Perf Analyzer will produce errors showing which ones are missing. You will need to manually install them.

Build from Source

The Triton SDK container is used for building, so some build and runtime dependencies are already installed.

export RELEASE=<yy.mm> # e.g. to use the release from the end of February of 2023, do `export RELEASE=23.02`

docker pull nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

docker run --gpus all --rm -it --net host nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# inside container
# prep installing newer version of cmake
apt update && apt install -y gpg wget && wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null && . /etc/os-release && echo "deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ $UBUNTU_CODENAME main" | tee /etc/apt/sources.list.d/kitware.list >/dev/null

# install build/runtime dependencies
apt update && apt install -y cmake-data=3.27.7* cmake=3.27.7* libcurl4-openssl-dev rapidjson-dev

rm -rf perf_analyzer ; git clone --depth 1 https://github.com/triton-inference-server/perf_analyzer

mkdir perf_analyzer/build ; cd perf_analyzer/build

cmake ..

make -j8 perf-analyzer

perf_analyzer/src/perf-analyzer-build/perf_analyzer -m <model>
  • To enable CUDA shared memory, add -DTRITON_ENABLE_GPU=ON to the cmake command.
  • To enable C API mode, add -DTRITON_ENABLE_PERF_ANALYZER_C_API=ON to the cmake command.
  • To enable TorchServe backend, add -DTRITON_ENABLE_PERF_ANALYZER_TS=ON to the cmake command.
  • To enable Tensorflow Serving backend, add -DTRITON_ENABLE_PERF_ANALYZER_TFS=ON to the cmake command.