You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 28, 2023. It is now read-only.
I have a trained onnx model that needs to be quantized to INT8. But I want my last fully connected layers are still in FP32 or FP16. So how can I choose specific layers to quantize (or not to quantize)?
PS when I was working with NNCF, I just use parametr ignored_scopes. Maybe is there something similar here at Workbench?
The text was updated successfully, but these errors were encountered:
DL Workbench uses POT for the quantization. It seems that POT does not provide the layer ignoring option as NNCF does.
You can use the Jupyter Notebooks in DL Workbench for the more fine-grained quantization and inference: in the notebooks, there is a possibility of using your own parameters with the OpenVINO tools (in your case, for the quantization); additionally, it would be possible to install any required tools (in your case, NNCF) and use alongside the rest of the OpenVINO packages.
Should you have any more questions, please do contact us.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I have a trained onnx model that needs to be quantized to INT8. But I want my last fully connected layers are still in FP32 or FP16. So how can I choose specific layers to quantize (or not to quantize)?
PS when I was working with NNCF, I just use parametr ignored_scopes. Maybe is there something similar here at Workbench?
The text was updated successfully, but these errors were encountered: