Skip to content

TensorRT FP16 Problem #7

Answered by WolframRhodium
RyougiKukoc asked this question in Q&A
Discussion options

You must be logged in to vote

These NNs are usually not trained with quantization in mind: since not all fp32 values can be represented in fp16, the output of fp16 acceleration may differs from fp32 inference. This warning is introduced in TensorRT 8.4 to warn against naive fp16 quantization, but in practice I think it can be ignored in most times.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@RyougiKukoc
Comment options

Answer selected by RyougiKukoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants