Add VirSorter2 process as an alternative to VirSorter #128

fischer-hub · 2024-07-17T11:47:50Z

Added VirSorter2 as a new process that can be used instead of VirSorter using the new flag --use_virsort2, adapted the downstream processes (the parse process and GFF generation script) to work with the different results of VirSorter2. E.g.: VirSorter2 reports a confidence score (0-1) for every viral hit in the input data instead of a category.

Also had to add some changes to the GFF generation process because I always ran into file collision issues when more than one of the input samples reported significant viral sequences, since the VirSort and VirFinder reports have the same file name for every sample. Might also be solved by running the GFF script for every sample containing viral seqs instead of once for all samples but I already asked about this in #127, maybe I'm missing something here!

hoelzer · 2024-07-17T15:42:14Z

Great, thanks a lot!

@mberacochea @guille0387: @fischer-hub is working with me at the RKI and we anyway wanted to use VS2 for some annotations. Thus, David kindly added the functionality to VIRify. Of couse, this would need more proper benchmarking but the idea would be to swap VS against VS2 at some point.

For now, there would be this parameter switch possible.

mberacochea · 2024-07-18T16:16:10Z

@hoelzer @fischer-hub this is amazing, thank you so much. This has been in our backlog for ages. I will go over the PR as soon as I can, but from a quick read it looks great.

mberacochea

Thanks @fischer-hub, great stuff. I left a few comments, nothing major.
The one thing I would like to add is a few more unit tests for parse_viral_pred.py, can you send me an example of the virsorter2 outputs?

bin/parse_viral_pred.py

mberacochea · 2024-08-15T08:55:01Z

virify.nf

+        viphos_annotations = annotation.out.map { _, __, annotations -> tuple(annotations, i++) }.collect(){it -> it[0]} //{ annotations, count -> "$annotations".replace('.', '_' + count + '.') }
+        taxonomy_annotations = assign.out.map { _, __, taxonomy -> tuple(taxonomy, j++) }.collect(){it -> it[0]} //{ taxonomy, count -> "$taxonomy".replace('.', '_' + count + '.') }
+        checkv_results = checkV.out.map { _, __, quality_summary, ___ -> tuple(quality_summary, k++) }.collect(){it -> it[0]} //{ quality_summary, count -> "$quality_summary".replace('.', '_' + count + '.') }


This is to support the -list of assemblies, correct?

When running on a list of samples this was my first fix to prevent the write_gff file collision, but I think this should be sufficient as well since then the input files are enumerated and have different names too. (I just forgot to remove this afterwards, will check again)

Also this might be solved when removing the .first() from contigs.first() later on because then each sample will run in its own write_gff process instance? Or could there be annotation files from the same sample with the same name too, e.g.: from different contigs of the same assembly?

I see. The change on the contigs.first() it a but more invovled as we need to .join() the assemblies with the corresponding annotations. Otherwise, the pipeline could mix results from different assemblies.

Yes, true. Maybe its better then to keep the fix and handle the bug in a separate PR.

virify.nf

fischer-hub · 2024-08-15T09:42:09Z

Thanks @fischer-hub, great stuff. I left a few comments, nothing major. The one thing I would like to add is a few more unit tests for parse_viral_pred.py, can you send me an example of the virsorter2 outputs?

Sure, will go over the changes asap and attach some example output here!

Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>

Fischer, David and others added 10 commits July 9, 2024 13:02

add virsorter2 module

a6f2472

adjust summary scrikpt for virus sorter 2

6f9b323

add some documentation

d701e04

adjust resources for virsorter2

51f8439

add process tag

42df50f

remove view()

00b776d

adjust GFF generation script

e87e937

adjust docstring

b2cf197

fix typo

fba759d

remove comments

24a3863

mberacochea requested changes Aug 15, 2024

View reviewed changes

fischer-hub and others added 5 commits August 15, 2024 19:33

Update bin/parse_viral_pred.py

ed8b948

Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>

Update bin/parse_viral_pred.py

ee41ddd

Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>

remove trailing newlines

ae8bd1d

Co-authored-by: Martín Beracochea <mbc@ebi.ac.uk>

clean up merge

9f94fd1

remove comment

e53f7f1

KateSakharova mentioned this pull request Oct 7, 2024

Fix/virsorter2 #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VirSorter2 process as an alternative to VirSorter #128

Add VirSorter2 process as an alternative to VirSorter #128

fischer-hub commented Jul 17, 2024 •

edited

Loading

hoelzer commented Jul 17, 2024

mberacochea commented Jul 18, 2024

mberacochea left a comment

mberacochea Aug 15, 2024

fischer-hub Aug 15, 2024 •

edited

Loading

mberacochea Aug 15, 2024

fischer-hub Aug 16, 2024

fischer-hub commented Aug 15, 2024

Add VirSorter2 process as an alternative to VirSorter #128

Are you sure you want to change the base?

Add VirSorter2 process as an alternative to VirSorter #128

Conversation

fischer-hub commented Jul 17, 2024 • edited Loading

hoelzer commented Jul 17, 2024

mberacochea commented Jul 18, 2024

mberacochea left a comment

Choose a reason for hiding this comment

mberacochea Aug 15, 2024

Choose a reason for hiding this comment

fischer-hub Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

mberacochea Aug 15, 2024

Choose a reason for hiding this comment

fischer-hub Aug 16, 2024

Choose a reason for hiding this comment

fischer-hub commented Aug 15, 2024

fischer-hub commented Jul 17, 2024 •

edited

Loading

fischer-hub Aug 15, 2024 •

edited

Loading