This project contains the functions necessary to update the list of species used by LINNAEUS.
Gerner, M.; Nenadic, G. & Bergman, C. M. LINNAEUS: A species name identification system for biomedical literature BMC Bioinformatics, 2010, 11, 85
The species dictionaries available for LINNAEUS via https://sourceforge.net/projects/linnaeus/files/Entity_packs/ haven't been updated since 2011. To be able to tag the most recently discovered species, build a new one using the following simple steps:
git clone https://github.com/JULIELab/taxonupdate.git
cd taxonupdate
wget ftp://ftp.ebi.ac.uk/pub/databases/taxonomy/taxonomy.dat
# With default arguments, this is equal to:
# python DictWriter.py -i taxonomy.dat -o taxonomy.tsv --rank species
python DictWriter.py
Additionally, it is possible to restrict the dictionary to a subtree defined by its root ID. E. g., if one wanted to only extract bacterial species, it can be done issuing:
python DictWriter -o bacterial_species.tsv --root 2
To make the use of this feature easier, it might help to consult the following lists:
Archaea (ID: 2157) Bacteria (ID: 2) Eukaryota (ID: 2759) Viroids (ID: 12884) Viruses (ID: 10239)
Fungi (ID: 4751) Metazoa (ID: 33208) Viridiplantae (ID: 33090)
This work has been funded by the German Research Foundation (DFG) as part of the project D01 in the Collaborative Research Center (CRC) 1076 “AquaDiva”.