Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different way of saving models #236

Open
niekdejonge opened this issue Sep 30, 2024 · 2 comments
Open

Different way of saving models #236

niekdejonge opened this issue Sep 30, 2024 · 2 comments

Comments

@niekdejonge
Copy link
Collaborator

We now save models using torch.save() and torch.load(). This is the recommended way for pytorch as far as I could find: https://pytorch.org/tutorials/beginner/saving_loading_models.html#:~:text=Saving%20the%20model%27s%20state_dict%20with,recommended%20method%20for%20saving%20models.
However, this method relies on pickle which poses a security issue. This is now picked up by sonarcloud as security issue https://sonarcloud.io/project/security_hotspots?id=matchms_ms2deepscore&branch=cleaning_refactoring_232&issueStatuses=OPEN%2CCONFIRMED&sinceLeakPeriod=true&tab=code
One solution for reducing the risk would be only loading the weights (see pytorch/pytorch#52181 ), however we need the settings too, so this is not really an option unless we separately store the settings in a json file, which I don't think is ideal.

So I currently don't see a good alternative for saving and loading. @florian-huber do you know an alternative way, that is common to use for pytorch?

@florian-huber
Copy link
Contributor

SonarCloud suggests the following solution (https://sonarcloud.io/organizations/matchms/rules?open=python%3AS6985&rule_key=python%3AS6985):

import torch
import safetensors

model = MyModel()
safetensors.torch.load_model(model, 'model.pth')

But that would require to save the model differently as well.

@niekdejonge
Copy link
Collaborator Author

niekdejonge commented Oct 30, 2024

@florian-huber The issue with that solution is that I believe this would not store settings like embedding dimensions etc, which requires us to remove that flexibility or to add a separate settings file.

An alternative that can be read about here stores the completely runnable model: https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-load-entire-model

model_scripted = torch.jit.script(model) # Export to TorchScript
model_scripted.save('model_scripted.pt') # Save

# load
model = torch.jit.load('model_scripted.pt')
model.eval()

Using the TorchScript format, you will be able to load the exported model and run inference without defining the model class.

This is how I implemented running ms2deepscore in mzmine (java). The only difficulty here is that the loaded deep learning model only does tensorized_spectrum -> embedding, however the tensorization of spectra is not fixed for our models, since we can use different kinds of metadata and binning widths. So this would be required to share next to the model (which I do for MZMine running).

We could start using this method for saving/loading ms2deepscore in the python package as well, but this would require us to share the settings file as well for determining tensorization, which is not ideal, since this requires two files (we could of course just zip them?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants