-
Notifications
You must be signed in to change notification settings - Fork 0
NGMLR
- Author: Viginesh Vaibhav Muraliraman
- Date: May 17, 2019
- Purpose: Description of installing and using NGMLR
- documentation (website): https://github.com/philres/ngmlr
- documentation (publication): https://www.nature.com/articles/s41592-018-0001-7
Use any one of the following methods.
Note: This installs version 0.2.7, while the latest version is 0.2.8.
wget https://github.com/philres/ngmlr/releases/download/v0.2.7/ngmlr-0.2.7-linux-x86_64.tar.gz
tar xvzf ngmlr-0.2.7-linux-x86_64.tar.gz
cd ngmlr-0.2.7/
Note: This installs version 0.2.7, while the latest version is 0.2.8.
conda install ngmlr -c bioconda
Note: This method installs the latest version (0.2.8).
OS: Linux and Mac OSX Required dependencies: zlib-dev, cmake, gcc/g++ (>=4.8.2)
git clone https://github.com/philres/ngmlr.git
cd ngmlr/
mkdir -p build
cd build/
cmake ..
make
cd ../bin/ngmlr-*/
./ngmlr
git clone https://github.com/philres/ngmlr.git
mkdir -p ngmlr/build
docker run -v `pwd`/ngmlr:/ngmlr philres/nextgenmaplr-buildenv bash -c "cd /ngmlr/build && cmake .. && make"
`pwd`/ngmlr/bin/ngmlr-*/ngmlr
./ngmlr [options] -r <reference> -q <reads> [-o <output>]
Here, -r is a required argument. By default, -q uses files from /dev/stdin and -o defaults to stdout.
Here are the parameters you could use with ngmlr. The default values for each parameter is written inside square brackets.
-r <file>, --reference <file>
(required) Path to the reference genome (FASTA/Q, can be gzipped)
-q <file>, --query <file>
Path to the read file (FASTA/Q) [/dev/stdin]
-o <string>, --output <string>
Path to output file [stdout]
--skip-write
Don't write reference index to disk [false]
--bam-fix
Report reads with > 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]
--rg-id <string>
Adds RG:Z:<string> to all alignments in SAM/BAM [none]
--rg-sm <string>
RG header: Sample [none]
--rg-lb <string>
RG header: Library [none]
--rg-pl <string>
RG header: Platform [none]
--rg-ds <string>
RG header: Description [none]
--rg-dt <string>
RG header: Date (format: YYYY-MM-DD) [none]
--rg-pu <string>
RG header: Platform unit [none]
--rg-pi <string>
RG header: Median insert size [none]
--rg-pg <string>
RG header: Programs [none]
--rg-cn <string>
RG header: sequencing center [none]
--rg-fo <string>
RG header: Flow order [none]
--rg-ks <string>
RG header: Key sequence [none]
-t <int>, --threads <int>
Number of threads [1]
-x <pacbio, ont>, --presets <pacbio, ont>
Parameter presets for different sequencing technologies [pacbio]
-i <0-1>, --min-identity <0-1>
Alignments with an identity lower than this threshold will be discarded [0.65]
-R <int/float>, --min-residues <int/float>
Alignments containing less than <int> or (<float> * read length) residues will be discarded [0.25]
--no-smallinv
Don't detect small inversions [false]
--no-lowqualitysplit
Split alignments with poor quality [false]
--verbose
Debug output [false]
--no-progress
Don't print progress info while mapping [false]
--match <float>
Match score [2]
--mismatch <float>
Mismatch score [-5]
--gap-open <float>
Gap open score [-5]
--gap-extend-max <float>
Gap open extend max [-5]
--gap-extend-min <float>
Gap open extend min [-1]
--gap-decay <float>
Gap extend decay [0.15]
-k <10-15>, --kmer-length <10-15>
K-mer length in bases [13]
--kmer-skip <int>
Number of k-mers to skip when building the lookup table from the reference [2]
--bin-size <int>
Sets the size of the grid used during candidate search [4]
--max-segments <int>
Max number of segments allowed for a read per kb [1]
--subread-length <int>
Length of fragments reads are split into [256]
--subread-corridor <int>
Length of corridor sub-reads are aligned with [40]
In this example, we will be aligning nanopore reads of the Tuberculosis genome with it's associated reference genome. Note that NGMLR can also do the same for Pacbio reads.
- Install the SRA toolkit
sudo apt install fastq-dump
- Navigate to the NGMLR working directory
cd /path/to/ngmlr/bin/ngmlr-0.2.8
- Get the Nanopore reads for Tuberculosis
fastq-dump SRR6490088 --split-files >& SRR6490088.out cat SRR6490088_1.fastq | paste - - - - | sed 's/^@/>/g'| cut -f1-2 | tr '\t' '\n' > Nanopore_TB.fa
- Get the Reference Tuberculosis Genome
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/585/GCF_000008585.1_ASM858v1/GCF_000008585.1_ASM858v1_genomic.fna.gz gunzip GCF_000008585.1_ASM858v1_genomic.fna.gz
- Run ngmlr
After running the above command, you should get the output in the Output.SAM file. In addition to that, NGMLR will generate a few NGM files, which it uses for aligning the reads with the reference genome.
./ngmlr -t 2 -r GCF_000008585.1_ASM858v1_genomic.fna -q Nanopore_TB.fa -o Output.SAM -x ont
Take the following example:
Processed: 92198 (0.66), R/S: 37.44, RL: 8857, Time: 2.00 5.00 11.62, Align: 0.96, 490, 0.81
What this means:
- 92198 reads were processed so far
- 66 % of the 92198 reads were mapped (with > 25 % of their bp mapped)
- 37.44 are mapped on average per second 8857 is the average read length so far.
- "Time" and "Align" are used for debugging purpose and will be removed.
These are a growing collection of manuals for commonly used bioinformatics tools.
Just go to the page for the tool you are trying to use, and scroll through the page to download and install. That simple. The goal is to add extra documentation for using these tools, in addition to what is already supplied by the manual pages for the programs.