TensorRT FP16 Problem #7
-
Sorry about that I'm not familiar with the details about the NN behind w2x, but when I tried to execute: from vsmlrt import Waifu2x, Waifu2xModel, Backend
Waifu2x(src_rgbs, 0, 1, model=Waifu2xModel.cunet, backend=Backend.TRT(fp16=True, num_streams=2)).set_output() The console of trtexec continues logging
Shall anybody explain the "issues when converted to FP16"? Is it avoidable? Machine: 5950x, 3090ti p.s. Comparing fp16=True and fp16=False, the 3x running speed difference shows that it may actually work. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
These NNs are usually not trained with quantization in mind: since not all fp32 values can be represented in fp16, the output of fp16 acceleration may differs from fp32 inference. This warning is introduced in TensorRT 8.4 to warn against naive fp16 quantization, but in practice I think it can be ignored in most times. |
Beta Was this translation helpful? Give feedback.
These NNs are usually not trained with quantization in mind: since not all fp32 values can be represented in fp16, the output of fp16 acceleration may differs from fp32 inference. This warning is introduced in TensorRT 8.4 to warn against naive fp16 quantization, but in practice I think it can be ignored in most times.