Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxonomic ranks are inverted #113

Open
Ales-ibt opened this issue Sep 7, 2023 · 11 comments
Open

Taxonomic ranks are inverted #113

Ales-ibt opened this issue Sep 7, 2023 · 11 comments

Comments

@Ales-ibt
Copy link

Ales-ibt commented Sep 7, 2023

Hello there!

I've been testing the VIRify v2.0 and I realised that the taxonomic annotation on the GFF file has the ranks inverted.

For instance:
taxonomy=Entomopoxvirinae;Poxviridae;Chitovirales

Should be:
taxonomy=Chitovirales;Poxviridae;Entomopoxvirinae

And ideally, it would be great to have the whole lineage like:
taxonomy=Viruses;Bamfordvirae;Nucleocytoviricota;Pokkesviricetes;Chitovirales;Poxviridae;Entomopoxvirinae

There are also some problems with names like Caudovirales which is shown in the NCBI taxonomy database as Caudoviricetes.

Thanks in advance!

Ales.

@hoelzer
Copy link
Collaborator

hoelzer commented Sep 15, 2023

Hey, thx @Ales-ibt !

Yes agree, inverting the ranks would make more sense probably. Having the full ranks shown should be also possible with the NCBI taxonomy file @guille0387 , or?

Regarding the Caudovirales vs Caudoviricetes: actually Caudovirales should not be in the pipeline anymore bc the taxa was discontinued by ICTV. We added the following warning mssg when running VIRify:

Warning: --meta_version v4 does not include the following discontinued virus taxa 
(according to ICTV) anymore and they have been excluded from the dataset.
- Allolevivirus
- Autographivirinae
- Buttersvirus
- Caudovirales
- Chungbukvirus
- Incheonvirus
- Leviviridae
- Levivirus
- Mandarivirus
- Pbi1virus
- Phicbkvirus
- Radnorvirus
- Sitaravirus
- Vidavervirus
- Myoviridae
- Siphoviridae
- Podoviridae
- Viunavirus
- Orthohepevirus
- Klosneuvirus
- Hendrixvirus
- Rubulavirus
- Avulavirus
- Catovirus
- Nucleorhabdovirus
- Viunavirus
- Gammalipothrixvirus
- Peduovirinae
- Sedoreovirinae

Did you still had Caudovirales in your results? Can you try a fresh installation and most importantly re-download of the database files? Maybe an old database file was still used.

@guille0387
Copy link
Collaborator

Hi @hoelzer @Ales-ibt

Yes, I think it should be possible to invert the order of the ranks and include the complete lineage.... let me have a look into this and I'll get back to you asap.

@guille0387
Copy link
Collaborator

Hi @hoelzer @Ales-ibt

I created a new branch called out_lineage with modifications in the contig taxonomic assignment script. The output should now reflect the suggestions that Ales made. I tested it with the two mock datasets we used in the paper and it worked, but perhaps Ales would like to try it with her own data? Let me know if you have any issues.

@hoelzer
Copy link
Collaborator

hoelzer commented Sep 20, 2023

Great, thx @guille0387 ! Looks also good for me. @Ales-ibt can you give it a try as well? thx!

@Ales-ibt
Copy link
Author

Great, I'll run a test and be back to you soon.

@Ales-ibt
Copy link
Author

Ales-ibt commented Oct 4, 2023

Hello, sorry about taking that long to be back. I updated the NCBI database and now I have the correct Caudoviricetes annotation :D. I also tested the pipeline on the out_lineage branch and I can see the complete lineages beautifully sorted on the 08-final/taxonomy/*prodigal_annotation_taxonomy.tsv, thank you so much for this. The only detail is that this fix is not reflected on the GFF output file.

Thank you again!

Ales

@hoelzer
Copy link
Collaborator

hoelzer commented Oct 7, 2023

Awesome, thanks for checking, @Ales-ibt !

@guille0387 can you also do the GFF fix and then we could merge that into dev @mberacochea

@mberacochea
Copy link
Member

Excellent @guille0387!, thank you for that fix. Let me know if you need a hand fixing the GFF.

@guille0387
Copy link
Collaborator

guille0387 commented Oct 20, 2023 via email

@mberacochea
Copy link
Member

Hey folks,

I'm trying to catch up with the virify backlog, there is an excellent PR #84 to add support for Virsorter2 so it's perfect oporunity to make a new release including also this fix.

Cheers

@hoelzer
Copy link
Collaborator

hoelzer commented Jul 30, 2024

Hey, yes agree that would be perfect to have another release with VS2 support and some of the current open issues resolved.

I think here everything was solved

I created a new branch called out_lineage with modifications in the contig taxonomic assignment script.

just not the change of taxonomic rank orders in the GFF... Ah, or this was done in #129 @mberacochea ? Then this issue should be solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants