-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] GTDB plus NCBI kraken2 database building using GTDB grafted NCBI taxonomy #6
Comments
Someone had the same need before, and I write these steps and will add them to the doc of TaxonKit. Merging GTDB and NCBI taxonomySometimes (1) one needs to build a database including bacteria and archaea (from GTDB) and viral database from NCBI.
Some tests
|
Hi Shenwei, Great documentation! Thank you for your time! I have 2 related questions:
Especially for the a. part. Is the taxdump file containing all files needed for taxonomy folder? For a standard download with kraken2-build --download-taxonomy, the file list I got are:
Thank you very much. |
Whatever you've got, you can follow the steps above to 1) export/create complete lineages and 2) create taxdump files from them.
Here are some steps for Custom database for Kraken and Bracken you can learn from. Just prepare the files in the required format that Kraken2 wants.
Essential taxdump files are:
The accession2taxid files map accession (only these in RefSeq/Genbank) to taxid, which is optional if you format the FASTA IDs for Kraken by yourself. For the sequences from GTDB, the mapping relationship is provided by
For viral sequences, it requires an extra step. The relationship is
What I can provide is the way to generate taxdump files that combine GTDB taxonomy and NCBI taxonomy, and taxid.map file that maps accession (whatever you provide, you can use the sequence accession) to TaxIds. Good luck! |
Hi Shenwei,
We are exploring building a database where we can include bacteria and archaea (from GTDB) and viral database from NCBI, while using NCBI style taxonomy but grafting the GTDB taxonomy for the bacteria and archaea level. Do you know how can I achieve this with gtdb-taxdump?
Flextaxd is another tool promising a similar effect. However, we have already downloaded Struo2 r207 genomes and taxdump. I think we might be better off using your tool to combine with NCBI taxonomy instead of flextaxd? Or did I not grasp the idea completely.
The text was updated successfully, but these errors were encountered: