-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage #15
Comments
Seems like we load the whole VCF in memory here varifier/varifier/vcf_stats.py Line 70 in 04105de
and then we create a nested dictionary for each record. |
Yes, could rewrite that file to not load the VCF into memory. Wasn't expecting such big VCF files. |
I am thinking about the simplest way to deal with this memory issue. Could we have a new function that returns the |
Sounds good to me @leoisl . I was thinking similar. Definitely can't break backwards compatibility. That would break minos (and maybe gramtools), which needs things in memory because of all the VCF record merging it does. |
I don't think that solves the memory issue though. It's not necessarily reading the VCF into memory that's causing the memory explosion, I think it's also creating nested dictionaries for each record that does it? |
Yes, was thinking iterate over the VCF, update a final dict of stats. Don't store any of the intermediate dicts. |
Varifier's memory usage seems quite excessive. For example, I had a ~350Mb VCF that took 13Gb RAM to complete (the most I've seen so far is 21Gb).
Here is an idea of where the problem lies (produced by
memory_profiler
)Is there a way we could be more efficient with the way we get per-record stats?
The text was updated successfully, but these errors were encountered: