INT8 Precision for higher FPS #55
Ironclad17
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
Hi, int8 is a hidden advanced feature that may provide additional acceleration . However, applying int8 inference acceleration for neural networks is not an easy task, which usually requires sophisticated calibration to reduce the loss of accuracy, and this problem is much more severe for image processing tasks. Given that the feature cannot be used in an out-of-the-box manner, this feature requires advanced users to manually follow guides elsewhere. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Considering the fps gains fp16 precision inference provides: https://github.com/AmusementClub/vs-mlrt/wiki/RealESRGANv2
Has anyone attempted int8 precision? The parameter appears to be accessible: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec-flags
It should also be accessible in ort: https://cloudblogs.microsoft.com/opensource/2022/05/02/optimizing-and-deploying-transformer-int8-inference-with-onnx-runtime-tensorrt-on-nvidia-gpus/
Apologies, I'm out of my depth. I have no idea what level of precision is necessary for a useful result, but a lot of research seems to be focusing on lower precision inference and training for higher throughput.
https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
https://developer.nvidia.com/blog/int4-for-ai-inference/
https://youtu.be/hCxvS1dVufs?si=hqpzognE2-hCT8Gz&t=224
Beta Was this translation helpful? Give feedback.
All reactions