Skip to content

Release 2.2.0

Compare
Choose a tag to compare
@evanroyrees evanroyrees released this 16 Dec 19:01
· 21 commits to main since this release
52fd5b0

Release 2.2.0

Breaking changes 💣 💔🙈

Autometa's autometa-taxonomy and autometa-binning-summary entrypoints (instead of taking --ncbi as an input parameter) now take --dbdir and --dbtype to allow the user to use either the NCBI or GTDB database.

Command 💔 Previous 💔 💚 New 💚
autometa-taxonomy --ncbi --dbdir <ncbi-database-dirpath>, --dbtype ncbi (choices: ncbi, gtdb)
autometa-binning-summary --ncbi --dbdir <ncbi-database-dirpath>, --dbtype ncbi (choices: ncbi, gtdb)

NOTE: For implementation details for integrating other taxonomy databases see #284

🐚 Additional autometa workflows

  • Addition of workflow where only required inputs are reads and assembly
  • autometa.sh and autometa-large-data-mode.sh now require taxa_routine as an input parameter (choices are "ncbi" or "ncbi_gtdb". Autometa workflow now contains an optional sub-workflow where binning is guided by GTDB taxonomy after retrieving bacteria and archaea classified using the NCBI database.

TaxonomyDatabase

🐍 GTDB taxonomy integration to use within Autometa's taxon-binning and genome-binning workflows.

For more information on GTDB database setup see the Autometa GTDB database documentation

(autometa) evan@userserver:~/Autometa$ autometa-setup-gtdb -h
usage: autometa-setup-gtdb [-h] --reps-faa REPS_FAA --dbdir DBDIR [--cpus CPUS]

optional arguments:
  -h, --help           show this help message and exit
  --reps-faa REPS_FAA  Path to directory containing GTDB ref genome animo acid data sequences. Can be tarballed.
  --dbdir DBDIR        Path to output GTDB database directory
  --cpus CPUS          Number of cpus to use for diamond-formatting GTDB database

TL;DR

Abstraction of taxonomy databases using TaxonomyDatabase abstract base class with required abstractmethods.

This is currently implemented for both the NCBI and GTDB taxonomy database. Future taxonomy database integrations should follow the format from the TaxonomyDatabase class.

CAMI formatter

Binning validation/benchmarking utilities added/updated such as formatting autometa binning results into biobox format

Misc

  • 💚 Fix pytest requirements in GH actions
  • 🐛⬇️ pin scipy and joblib to avoid hdbscan import error
  • 🐍🤫 Fix deprecated pandas method invocation in bedtools.py

What's Changed

  • 📝 Added contribution documentation by @jason-c-kwan in #277
  • 🎨🐍 Add CAMI formatter entrypoint autometa-cami-format by @WiscEvan in #276
  • 🎨🐍 Update deprecated pandas method invocation by @WiscEvan in #279
  • ⬇️ 💚 force scipy==1.8 by @kaw97 in #286
  • 🐚 New bash files for Autometa workflow by @samche42 in #281
  • ⬆️ 🎨 Allow the use of gtdb taxonomy in Autometa by @Sidduppal in #284
  • Add --average-method parameter to autometa-benchmark by @WiscEvan in #290
  • GTDB integration by @Sidduppal and @WiscEvan in #284

New Contributors

Full Changelog: 2.1.0...2.2.0