NGMLR

Author: Viginesh Vaibhav Muraliraman
Date: May 17, 2019
Purpose: Description of installing and using NGMLR
documentation (website): https://github.com/philres/ngmlr
documentation (publication): https://www.nature.com/articles/s41592-018-0001-7

Installation

Use any one of the following methods.

Install With Precompiled Binaries:

Note: This installs version 0.2.7, while the latest version is 0.2.8.

wget https://github.com/philres/ngmlr/releases/download/v0.2.7/ngmlr-0.2.7-linux-x86_64.tar.gz
tar xvzf ngmlr-0.2.7-linux-x86_64.tar.gz
cd ngmlr-0.2.7/

Install with Conda:

Note: This installs version 0.2.7, while the latest version is 0.2.8.

conda install ngmlr -c bioconda

Build from source:

Note: This method installs the latest version (0.2.8).

OS: Linux and Mac OSX Required dependencies: zlib-dev, cmake, gcc/g++ (>=4.8.2)

git clone https://github.com/philres/ngmlr.git
cd ngmlr/
mkdir -p build
cd build/
cmake ..
make

cd ../bin/ngmlr-*/
./ngmlr

Build with Docker (Linux only):

git clone https://github.com/philres/ngmlr.git
mkdir -p ngmlr/build
docker run -v `pwd`/ngmlr:/ngmlr philres/nextgenmaplr-buildenv bash -c "cd /ngmlr/build && cmake .. &&  make"
`pwd`/ngmlr/bin/ngmlr-*/ngmlr

Basic commands

./ngmlr [options] -r <reference> -q <reads> [-o <output>]

Here, -r is a required argument. By default, -q uses files from /dev/stdin and -o defaults to stdout.

Here are the parameters you could use with ngmlr. The default values for each parameter is written inside square brackets.

Input/Output Parameters

-r <file>,  --reference <file>
    (required)  Path to the reference genome (FASTA/Q, can be gzipped)
-q <file>,  --query <file>
    Path to the read file (FASTA/Q) [/dev/stdin]
-o <string>,  --output <string>
    Path to output file [stdout]
--skip-write
    Don't write reference index to disk [false]
--bam-fix
    Report reads with > 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]
--rg-id <string>
    Adds RG:Z:<string> to all alignments in SAM/BAM [none]
--rg-sm <string>
    RG header: Sample [none]
--rg-lb <string>
    RG header: Library [none]
--rg-pl <string>
    RG header: Platform [none]
--rg-ds <string>
    RG header: Description [none]
--rg-dt <string>
    RG header: Date (format: YYYY-MM-DD) [none]
--rg-pu <string>
    RG header: Platform unit [none]
--rg-pi <string>
    RG header: Median insert size [none]
--rg-pg <string>
    RG header: Programs [none]
--rg-cn <string>
    RG header: sequencing center [none]
--rg-fo <string>
    RG header: Flow order [none]
--rg-ks <string>
    RG header: Key sequence [none]

General Parameters

-t <int>,  --threads <int>
   Number of threads [1]
-x <pacbio, ont>,  --presets <pacbio, ont>
   Parameter presets for different sequencing technologies [pacbio]
-i <0-1>,  --min-identity <0-1>
   Alignments with an identity lower than this threshold will be discarded [0.65]
-R <int/float>,  --min-residues <int/float>
   Alignments containing less than <int> or (<float> * read length) residues will be discarded [0.25]
--no-smallinv
   Don't detect small inversions [false]
--no-lowqualitysplit
   Split alignments with poor quality [false]
--verbose
   Debug output [false]
--no-progress
   Don't print progress info while mapping [false]

Advanced Parameters

--match <float>
    Match score [2]
--mismatch <float>
    Mismatch score [-5]
--gap-open <float>
    Gap open score [-5]
--gap-extend-max <float>
    Gap open extend max [-5]
--gap-extend-min <float>
    Gap open extend min [-1]
--gap-decay <float>
    Gap extend decay [0.15]
-k <10-15>,  --kmer-length <10-15>
    K-mer length in bases [13]
--kmer-skip <int>
    Number of k-mers to skip when building the lookup table from the reference [2]
--bin-size <int>
    Sets the size of the grid used during candidate search [4]
--max-segments <int>
    Max number of segments allowed for a read per kb [1]
--subread-length <int>
    Length of fragments reads are split into [256]
--subread-corridor <int>
    Length of corridor sub-reads are aligned with [40]

Example

In this example, we will be aligning nanopore reads of the Tuberculosis genome with it's associated reference genome. Note that NGMLR can also do the same for Pacbio reads.

1. Get the Input Files

Install the SRA toolkit
```
 sudo apt install fastq-dump
```
Navigate to the NGMLR working directory
```
 cd /path/to/ngmlr/bin/ngmlr-0.2.8
```

Get the Nanopore reads for Tuberculosis

 fastq-dump SRR6490088 --split-files >& SRR6490088.out
 cat SRR6490088_1.fastq | paste - - - - | sed 's/^@/>/g'| cut -f1-2 | tr '\t' '\n' > Nanopore_TB.fa

Get the Reference Tuberculosis Genome

 wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/585/GCF_000008585.1_ASM858v1/GCF_000008585.1_ASM858v1_genomic.fna.gz
 gunzip GCF_000008585.1_ASM858v1_genomic.fna.gz

Run ngmlr
```
 ./ngmlr -t 2 -r GCF_000008585.1_ASM858v1_genomic.fna -q Nanopore_TB.fa -o Output.SAM -x ont
```
After running the above command, you should get the output in the Output.SAM file. In addition to that, NGMLR will generate a few NGM files, which it uses for aligning the reads with the reference genome.

2. Making Sense of the NGMLR Progress Information

Take the following example:

Processed: 92198 (0.66), R/S: 37.44, RL: 8857, Time: 2.00 5.00 11.62, Align: 0.96, 490, 0.81

What this means:

92198 reads were processed so far
66 % of the 92198 reads were mapped (with > 25 % of their bp mapped)
37.44 are mapped on average per second 8857 is the average read length so far.
"Time" and "Align" are used for debugging purpose and will be removed.

Bioinformatics tools

These are a growing collection of manuals for commonly used bioinformatics tools.

How to use

Just go to the page for the tool you are trying to use, and scroll through the page to download and install. That simple. The goal is to add extra documentation for using these tools, in addition to what is already supplied by the manual pages for the programs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGMLR

NGMLR

Installation

Install With Precompiled Binaries:

Install with Conda:

Build from source:

Build with Docker (Linux only):

Basic commands

Input/Output Parameters

General Parameters

Advanced Parameters

Example

1. Get the Input Files

2. Making Sense of the NGMLR Progress Information

Bioinformatics tools

How to use

Clone this wiki locally