-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert model.bin (fp32) to model.bin (int8) #1761
Comments
The model.bin is designed quite strictly in storing the weights and their sizes continuously. Actually, there isn't any solution to save the weight quantized while loading the model. |
But there is a way while converting the model from lets say openNMTPy, to quantize the model and save it. I require this, to decrease the size of the model.bin file. Isn't it possible to develop an api which will quantise the model and save it as a new model.bin file. It seems logical to me as it is available everywhere where quantisation is supported. |
I think it can easily be implemented using existing code, but i am not able to figure out how to get model_spec of a current model.bin Once it is there, then the converter code can be modified to save the quantised model again. |
It could develop a new feature where we can save tensor quantized in new model.bin but it isn't simple (require new converter to load the model from model.bin for a spec). Currently, we don't have plan to do it. |
I have a pretrained model.bin file which was earlier converted using OpenNMTPy converter using fp32 quantisation. Now i want to reduce the size of the model and thought of quantising it to int8. But, I was only able to find ways to quantise it upon loading of the model, not able to find how to save it for further use.
Any idea how it can be achieved?
The text was updated successfully, but these errors were encountered: