Nextflow pipeline for calling SNPs based on GATK best practices

Overview

Here is a high-level summary of the pipeline:

Convert BAM to fastq.
Fastqc on the input files
Trim Galore! on the input files to trim reads and repeat quality control on trimmed reads.
Align reads to a reference genome with BWA-MEM
Sort, index and return statistics with samtools
Remove duplicate reads with picard
qualimap on deduplicated reads
Not run: GATK to realign the reads at the positions where there are indels (this was deprecated in GATK 4)
Base recalibration with GATK tools BaseRecalibrator, ApplyBQSR and AnalyzeCovariates
SNP calls with gatk HaplotypeCaller.
Multiqc to summarise the various QC checks.

See main.nf for full details.

Installation

This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub rbpisupati/nf-haplocaller The pipeline runs with singularity container (based on environment.yml file included with package).

git clone https://github.com/Gregor-Mendel-Institute/nf-haplocaller.git

Running the pipeline

Basic usage

Assuming you have:

cloned the repo into directory library
a set of BAM files to process in directory data
a FASTA file gibing a reference genome to align to in directory data
a valid, active installation of NextFlow

then a minimal command to run the pipeline is:

nextflow run library/nf-haplocaller/main.nf \
    --reads "data/*bam" \
    --fasta "data/TAIR10_wholeGenome.fasta" \
    --outdir output_folder \
    -profile cbe

This will take a long time, so it is recommended to run this in a detatchable window, such as tmux.

Options

Here is a full list of arguments and options

Input

--reads: Path to input files. This will usually include a wildcard to include all files matching a pattern, and be enclosed in double quotes ("").
--fasta: Optional path to a reference fasta file to align reads to.
--file_ext: File type of the input files. Options are "bam", 'fastq' and 'aligned_bam'.
--singleEnd: Flag for whether data are single- or paired end. Defaults to false.
-profile: Give a nextflow profile to allow the pipeline to talk to the job scheduling system on your machine. Valid inputs are:
- mendel for PBS systems
- cbe for SLURM systems
- singularity
- local to run on a local machine

Processing

--saveTrimmed: If true, keep trimmed data. Defaults to false.
--notrim: If true, skip trimming reads. Defaults to false

--clip_r1 Integer number of bases to trim from the 5` end of read 1.
--clip_r2 Integer number of bases to trim from the 5` end of read 2.
--three_prime_clip_r1 Integer number of bases to trim from the 3` end of read 1.
--three_prime_clip_r2 Integer number of bases to trim from the 3` end of read 2.

Output

--project: Project name
--outdir: Path to directory for the results.
--cohort: Optional. Specify a group of samples to lump together into a single output file.

--email: Optional email address to contact when the pipeline finishes.
--plaintext_email = If true, send the notification email in plain text.
-w: Path to working directory. Defaults to the current working directory. Note that w is preceded by only one hyphen.

Credits

Rahul Pisupati (rahul.pisupati[at]gmi.oeaw.ac.at)
Fernando Rabanal (fernando.rabanal@tuebingen.mpg.de)

Citation

Please cite the paper below if you use this pipeline.

Pisupati, R. et al.. Verification of Arabidopsis stock collections using SNPmatch, a tool for genotyping high-plexed samples. Nature Scientific Data 4, 170184 (2017). doi:10.1038/sdata.2017.184

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
conf		conf
scripts		scripts
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
bam_stats.nf		bam_stats.nf
environment.yml		environment.yml
get_filter_vcf.nf		get_filter_vcf.nf
get_gvcf.nf		get_gvcf.nf
main.nf		main.nf
nextflow.config		nextflow.config
snps_bcftools.nf		snps_bcftools.nf
snps_bsseq.nf		snps_bsseq.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextflow pipeline for calling SNPs based on GATK best practices

Table of contents

Overview

Installation

Running the pipeline

Basic usage

Options

Input

Processing

Output

Credits

Citation

About

Releases

Packages

Languages

License

Gregor-Mendel-Institute/nf-haplocaller

Folders and files

Latest commit

History

Repository files navigation

Nextflow pipeline for calling SNPs based on GATK best practices

Table of contents

Overview

Installation

Running the pipeline

Basic usage

Options

Input

Processing

Output

Credits

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages