Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Simplemma memory usage using TrieDictionaryFactory #812

Open
osma opened this issue Oct 21, 2024 · 0 comments
Open

Reduce Simplemma memory usage using TrieDictionaryFactory #812

osma opened this issue Oct 21, 2024 · 0 comments

Comments

@osma
Copy link
Member

osma commented Oct 21, 2024

Since version 1.1.0, Simplemma has support for trie-backed data structures which reduce the memory requirements a lot, at the cost of runtime performance.

I think this could be a good trade-off for Annif, because the lemmatization performance is probably not the main bottleneck in processing, but memory can be costly.

We should investigate how enabling this support would affect Annif and make it available either as an option, or possibly just switch to it entirely if the performance isn't too bad.

One question is how to initialize the tries. Simplemma does this lazily the first time a language is needed, but this could be problematic for Annif especially if it's running as a service. So maybe there should be a separate CLI operation to perform the initialization just once for all languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant