atavide
is a simple, yet complete workflow for metagenomics data analysis, including QC/QA, optional host removal, assembly and cross-assembly, and individual read based annotations. We have also built in some advanced analytics including tools to assign annotations from reads to contigs, and to generate metagenome-assembled genomes in several different ways, giving you the power to explore your data!
atavide
is 100% snakemake and conda, so you only need to install the snakemake workflow, and then everything else will be installed with conda.
It is definitely a work in progress, but you can run it with the following command
snakemake --configfile config/atavide.yaml -s workflow/atavide.snakefile --profile slurm
But you will need a slurm profile to make this work!
- Clone this repository from GitHub:
git clone https://github.com/linsalrob/atavide.git
- Set the location of the repository:
export ATAVIDE_DIR=$PWD/atavide/
- Install a few python dependencies. You probably already have most of these, but the one that trips up is
pysam
. We're working on gettingconda
configured properly to do this automatically.pip install -r $ATAVIDE_DIR/requirements.txt
- Install the appropriate super-focus database [hint: probably version 2] and set the
SUPERFOCUS_DB
directory to point to the location of those files. - Copy the NCBI taxonomy (You really just need the taxdump.tar.gz file), and set the
NCBI_TAXONOMY
environment variable to point to the location of those files. - Have a directory of fastq files with both
_R1_
and_R2_
files in a data directory:$DATA_DIR/fastq
- Run atavide:
cd $DATA_DIR && snakemake --configfile $ATAVIDE_DIR/atavide.yaml -s $ATAVIDE_DIR/workflow/atavide.snakefile --profile slurm
- QC/QA with prinseq++
- optional host removal using bowtie2 and samtools, as described previously. To enable this, you need to provide a path to the host db and a host db.
- pairwise assembly of each sample using megahit
- extraction of all reads that do not assemble using samtools flags
- assembly of all unassembled reads using megahit
- compilation of all contigs into a single unified set using Flye
- comparison of reads -> contigs to generate coverage
Want something else added to the suite? File an issue on GitHub and we'll add it ASAP!