-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuously increasing RAM demand #21
Comments
Hm, that is strange. In the current version I reset everything between contigs (which I probably shouldn't do since contigs in the same file should probably be treated as the same genome by default; and to treat them as separate genomes use a --meta flag). I wonder if pythons automatic garbage collector isn't working correctly between contigs to clear old memory. Or perhaps I have a memory leak somewhere. Thanks for info, I will try to track down the cause. If you have the time, can you check whether the issue happens with my newer code?
|
Interesting thought regarding I tried to run the new code. It worked on the test data, but then failed on may contigs with the following error:
|
Yeah, the problem with making it default is that most people won't read the fulls docs, and so will run MAG files through as the same genome, which will skew the results. The tRNA error should be fixed, I forgot to push a change I made to the code. |
Yeah, users ;). You could enforce a choice with a required mode argument, something like Caught another error...
|
Ah, that error is caused by the weights not being dialed in yet. I will see if I can get the weights sorted out, and/or track down the cause of the RAM usage. Thanks so for you help and feedback. |
Sure. Happy to help. Debugging my own code as we speak. Let me know if you want me to give it another try |
The negative weight cycle is caused when the cost to go backwards via an overlap is not enough in relation to a really good gene (in this case most likely a tRNA), so during the path solving step, the Bellman-Ford will go through that gene, then do a backwards overlap, then back through the same gene, infinite times. |
Hmm, I still get the negative weight cycle error. I'm currently trying to find out, which sequence exactly is causing the error. If I find it and it would help, I can send it to you. |
OK, now it is getting bizarre. When I split the file into one file per seq, and run phanotate on each - everything runs through fine. But if I run phanotate on the file with all sequences, I get the error after a few sequences... Any ideas? If it would help, I can share the data, assuming you would handle it confidentially |
Ah, one of the improvements I made with version 2 is moving the ORF to ORF connections creation step to Cpython. It looks like I forgot to empty the data structure between contigs. I will have to get that fixed. |
Hi, I am also running the dev branch because the master doesn't work on some of my phage sequences. I also get the error: |
The "negative weight cycle" error is the reason why I have not pushed this version 2 to the main. I mentioned in this version I had to simplify the weighting of the ORFs but kept the tRNAs the same weight (-20). If I drop the weight to -1 it should run fine, but I have to tune it properly, which I havent done. One reason is that I dont have a contig that generates the error. |
Unfortunately I can't share the contigs. I am pretty sure there are public sequences that would have the error, I just don't have the capacity to analyze them until I hit the error. |
I am revisiting this issue, can you clarify if you got the 'negative cycle' error with the develop or main branch? If it was only the dev branch, what was the error that caused the main branch not to work on some phages? I also set the weight of tRNAs to 0, so neither beneficial or penalized, with the idea that I will add ALL of them back in post processing. Can you test out the develop branch on the contig that gave you the error?
|
Hi Katelyn,
thanks for PHANOTATE, great tool!
I'm using it for larger sets of viral contigs, and noticed that it seems to not free up RAM between contigs. On my current set with 100+ sequence, towards the end, it needs >5GB RAM. Obviously, I can split the input file to get around that, but I'd also suspect, it wouldn't be too difficult to clean the RAM after a contig has been processed. Might make sense as an improvement for future versions.
Cheers
Thomas
The text was updated successfully, but these errors were encountered: