A workflow for metagenomics.
PikaVirus is a bioinformatics best-practise analysis pipeline for metagenomic analysis following a new approach, based on eliminatory k-mer analysis, followed by assembly and posterior contig-binning.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-
Install
nextflow
-
Install any of
Docker
,Singularity
orPodman
for full pipeline reproducibility (please only useConda
as a last resort; see docs) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/pikavirus -profile test,<docker/singularity/podman/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run nf-core/pikavirus -profile <docker/singularity/podman/conda> --input '*_R{1,2}.fastq.gz'
See usage docs for all of the available options when running the pipeline.
By default, the pipeline currently performs the following:
- Sequencing quality control (
FastQC
) - Trimming of low-quality regions in the reads (
FastP
) - Trimmed sequences quality control (
FastQC
) - Identification isolation of viral, bacterial, fungal and unknown reads (
Kraken2
) - Assembly of unknow reads (
MetaQuast
) and mapping against databases (Kaiju
) to identify new possible pathogens - Selection of suitable viral, bacterial and fungal references from the provided directory (
Mash
) - Alignment of viral, bacterial and fungal reads against reference genomes to ensure the presence of certain organisms (
Bowtie2
)
The nf-core/pikavirus pipeline comes with documentation about the pipeline: usage and output.
PikaVirus 2.0 was originally written by Guillermo Jorge Gorines Cordero, under supervision of the BU-ISCIII team in Madrid, Spain.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #pikavirus
channel (you can join with this invite).
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. ReadCube: Full Access Link
In addition, references of tools and data used in this pipeline are as follows:
Improved metagenomic analysis with Kraken 2.
Derrick E Wood, Jennifer Lu & Ben Langmead.
Genome biology 2019 Nov 28. doi: 10.1186/s13059-019-1891-0
fastp: an ultra-fast all-in-one FASTQ preprocessor.
Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu.
Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages i884–i890,. doi: 10.1093/bioinformatics/bty560
Fast and sensitive taxonomic classification for metagenomics with Kaiju
Peter Menzel, Kim Lee Ng & Anders Krogh
Nature Communications volume 7, Article number: 11257 (2016). doi 10.1038/ncomms11257
QUAST: quality assessment tool for genome assemblies
Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi & Glenn Tesler
Bioinformatics Volume 29, Issue 8, 15 April 2013, Pages 1072–1075. doi 10.1093/bioinformatics/btt086
Bioconda: sustainable and comprehensive software distribution for the life sciences
Björn Grüning, Ryan Dale, Andreas Sjödin, Brad A. Chapman, Jillian Rowe, Christopher H. Tomkins-Tinch, Renan Valieris, Johannes Köster & The Bioconda Team
Nature Methods volume 15, pages 475–476(2018). doi 10.1038/s41592-018-0046-7
Mash: fast genome and metagenome distance estimation using MinHash
Brian D. Ondov, Todd J. Treangen, Páll Melsted, Adam B. Mallonee, Nicholas H. Bergman, Sergey Koren & Adam M. Phillippy
Genome Biology 17, Article number: 132 (2016). doi 10.1186/s13059-016-0997-x
metaSPAdes: a new versatile metagenomic assembler
Sergey Nurk1, Dmitry Meleshko1, Anton Korobeynikov and Pavel A. Pevzner
Genome Res 27: 824-834 (2017). doi 10.1101/gr.213959.116