Skip to content
Viginesh Vaibhav edited this page May 25, 2019 · 2 revisions

NGMLR

Installation

Use any one of the following methods.

Install With Precompiled Binaries:

Note: This installs version 0.2.7, while the latest version is 0.2.8.

wget https://github.com/philres/ngmlr/releases/download/v0.2.7/ngmlr-0.2.7-linux-x86_64.tar.gz
tar xvzf ngmlr-0.2.7-linux-x86_64.tar.gz
cd ngmlr-0.2.7/

Install with Conda:

Note: This installs version 0.2.7, while the latest version is 0.2.8.

conda install ngmlr -c bioconda

Build from source:

Note: This method installs the latest version (0.2.8).

OS: Linux and Mac OSX Required dependencies: zlib-dev, cmake, gcc/g++ (>=4.8.2)

git clone https://github.com/philres/ngmlr.git
cd ngmlr/
mkdir -p build
cd build/
cmake ..
make

cd ../bin/ngmlr-*/
./ngmlr 

Build with Docker (Linux only):

git clone https://github.com/philres/ngmlr.git
mkdir -p ngmlr/build
docker run -v `pwd`/ngmlr:/ngmlr philres/nextgenmaplr-buildenv bash -c "cd /ngmlr/build && cmake .. &&  make"
`pwd`/ngmlr/bin/ngmlr-*/ngmlr

Basic commands

./ngmlr [options] -r <reference> -q <reads> [-o <output>]

Here, -r is a required argument. By default, -q uses files from /dev/stdin and -o defaults to stdout.

Here are the parameters you could use with ngmlr. The default values for each parameter is written inside square brackets.

Input/Output Parameters

-r <file>,  --reference <file>
    (required)  Path to the reference genome (FASTA/Q, can be gzipped)
-q <file>,  --query <file>
    Path to the read file (FASTA/Q) [/dev/stdin]
-o <string>,  --output <string>
    Path to output file [stdout]
--skip-write
    Don't write reference index to disk [false]
--bam-fix
    Report reads with > 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]
--rg-id <string>
    Adds RG:Z:<string> to all alignments in SAM/BAM [none]
--rg-sm <string>
    RG header: Sample [none]
--rg-lb <string>
    RG header: Library [none]
--rg-pl <string>
    RG header: Platform [none]
--rg-ds <string>
    RG header: Description [none]
--rg-dt <string>
    RG header: Date (format: YYYY-MM-DD) [none]
--rg-pu <string>
    RG header: Platform unit [none]
--rg-pi <string>
    RG header: Median insert size [none]
--rg-pg <string>
    RG header: Programs [none]
--rg-cn <string>
    RG header: sequencing center [none]
--rg-fo <string>
    RG header: Flow order [none]
--rg-ks <string>
    RG header: Key sequence [none]

General Parameters

-t <int>,  --threads <int>
   Number of threads [1]
-x <pacbio, ont>,  --presets <pacbio, ont>
   Parameter presets for different sequencing technologies [pacbio]
-i <0-1>,  --min-identity <0-1>
   Alignments with an identity lower than this threshold will be discarded [0.65]
-R <int/float>,  --min-residues <int/float>
   Alignments containing less than <int> or (<float> * read length) residues will be discarded [0.25]
--no-smallinv
   Don't detect small inversions [false]
--no-lowqualitysplit
   Split alignments with poor quality [false]
--verbose
   Debug output [false]
--no-progress
   Don't print progress info while mapping [false]

Advanced Parameters

--match <float>
    Match score [2]
--mismatch <float>
    Mismatch score [-5]
--gap-open <float>
    Gap open score [-5]
--gap-extend-max <float>
    Gap open extend max [-5]
--gap-extend-min <float>
    Gap open extend min [-1]
--gap-decay <float>
    Gap extend decay [0.15]
-k <10-15>,  --kmer-length <10-15>
    K-mer length in bases [13]
--kmer-skip <int>
    Number of k-mers to skip when building the lookup table from the reference [2]
--bin-size <int>
    Sets the size of the grid used during candidate search [4]
--max-segments <int>
    Max number of segments allowed for a read per kb [1]
--subread-length <int>
    Length of fragments reads are split into [256]
--subread-corridor <int>
    Length of corridor sub-reads are aligned with [40]

Example

In this example, we will be aligning nanopore reads of the Tuberculosis genome with it's associated reference genome. Note that NGMLR can also do the same for Pacbio reads.

1. Get the Input Files

  • Install the SRA toolkit
     sudo apt install fastq-dump
    
  • Navigate to the NGMLR working directory
     cd /path/to/ngmlr/bin/ngmlr-0.2.8
    
  • Get the Nanopore reads for Tuberculosis
     fastq-dump SRR6490088 --split-files >& SRR6490088.out
     cat SRR6490088_1.fastq | paste - - - - | sed 's/^@/>/g'| cut -f1-2 | tr '\t' '\n' > Nanopore_TB.fa
    
  • Get the Reference Tuberculosis Genome
     wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/585/GCF_000008585.1_ASM858v1/GCF_000008585.1_ASM858v1_genomic.fna.gz
     gunzip GCF_000008585.1_ASM858v1_genomic.fna.gz
    
  • Run ngmlr
     ./ngmlr -t 2 -r GCF_000008585.1_ASM858v1_genomic.fna -q Nanopore_TB.fa -o Output.SAM -x ont
    
    After running the above command, you should get the output in the Output.SAM file. In addition to that, NGMLR will generate a few NGM files, which it uses for aligning the reads with the reference genome.

2. Making Sense of the NGMLR Progress Information

Take the following example:

Processed: 92198 (0.66), R/S: 37.44, RL: 8857, Time: 2.00 5.00 11.62, Align: 0.96, 490, 0.81

What this means:

  • 92198 reads were processed so far
  • 66 % of the 92198 reads were mapped (with > 25 % of their bp mapped)
  • 37.44 are mapped on average per second 8857 is the average read length so far.
  • "Time" and "Align" are used for debugging purpose and will be removed.

Bioinformatics tools

These are a growing collection of manuals for commonly used bioinformatics tools.

How to use

Just go to the page for the tool you are trying to use, and scroll through the page to download and install. That simple. The goal is to add extra documentation for using these tools, in addition to what is already supplied by the manual pages for the programs.

Clone this wiki locally