-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [Bug] Detectron2 with TensorRT is slower than vanilla Detectron2 #2098
Comments
Torch-TensorRT does not use ONNX FYI As to speed for PyTorch vs TensorRT it would be good to know how much of the model is being converted. Printing out the debug logs will tell you (also you should be able to print the compiled graph |
I set python's logger level to
@narendasan can you be more specific (ideally in code) on how to print debug logs? |
import torch_tensorrt
...
with torch_tensorrt.logging.debug():
trt_module = torch_tensorrt.compile(my_module,...)
results = trt_module(input_tensors) |
@narendasan Thanks for the detailed instructions, I am able to dump the logs, which is attached below: The logs are huge, and I can see TensorRT doing a ton of work. |
Yeah so the reason I was asking for the logs was the first thing to look at re: performance with torch-tensorrt is how much the graph is getting cut up. Looking here there is upwards of 130 "blocks" or graph breaks. The more switches between PyTorch and TensorRT the worse the performance is. The reason the graph gets cut up is based on support in torch-tensorrt's converter library. So by implementing support for key ops typically performance can be improved. The next thing I see is there are some ops where TRT just wont be able to run them: e.g. requires_grad and this is causing some dependent ops to stay in Torch even if we have support. @bowang007 Any idea if we can get more of the torch ops in this graph to run in trt? In particular graphs like this:
|
@narendasan thanks for your great analysis! This makes a lot things clearer! Regarding In addition, in the graph you quoted, it seems that you are pointing at the https://github.com/facebookresearch/detectron2/blob/main/detectron2/modeling/backbone/resnet.py#L362 Given that ResNet is pretty old stuff, I'm surprised that the module isn't compiled into one big single graph, unless the detectron2 guys are doing something wrong or wierd? |
Hi @fumin thanks for sharing the log. So, basically after compilation, most part of this model still runs in Torch, I think that's the reason why you observed a slow down. |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
Description
Detectron2 with TensorRT is slower than vanilla Detectron2 out of the box
Environment
TensorRT Version: 8.6.1.6-1+cuda11.8
NVIDIA GPU: NVIDIA GeForce RTX 3060
NVIDIA Driver Version: Driver Version: 530.41.03
CUDA Version: 11.8
CUDNN Version: 8.9.2
Operating System:
Python Version (if applicable): 3.8.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.0.1+cu118
Baremetal or Container (if so, version):
Relevant Files
track.zip
Model link:
Steps To Reproduce
Unzip the file above which result in a single reproducible python script.
Run it with
python track.py
.By default it does not use TensorRT, and it will print:
Now comment out "track.py" 504 line, which essentially compiles the model with TensorRT, and runs the compiled TensorRT model.
It will print something like:
As you can see, whereas vanilla Pytorch runs at 10.5fps, TensorRT runs at only 9.3fps!
How could compiled TensorRT be slower than barebones Pytorch!?
Commands or scripts:
Have you tried the latest release?:
Yes, mine is the latest version.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):Yes, as far as I know, torch_tensorrt utilizes onnx in an intermediary phase, so it's possible to extract the onnx model from my script above.
P.S. This was previously filed as NVIDIA/TensorRT#3116 , but folks overthere suggested reporting this to torch trt, too.
The text was updated successfully, but these errors were encountered: