This project is no longer maintained. Any logical and performance differences between this program and the lumpy_filter program maintained as part of lumpy-sv have been addressed. You should move to using lumpy_filter. The easiest way to do so is to use smoove.
The purpose of this program is to extract splitter and discordant reads from a CRAM or BAM file using logic identical to SAMBLASTER. This allows the generation of splitter and discordant files without name-sorting the input file. Unlike SAMBLASTER which appends '_1' and '_2' to splitter read names, read names in the splitter output file are altered by changing the first character to an 'A' for read1 and a 'B' for read2.
Splitters and discordants are output in BAM files. Duplicates are included by default, but can be excluded using the -e
option. As of version 1.2.0, threading affects the performance of both BAM and CRAM files and specifying more than one thread will speed up the program significantly. CRAM is supported as an input format, however, I highly recommend that when running on a CRAM file the -T
option is utilized. The -T
option prevents htslib from downloading the reference sequence used to encode the CRAM to the REF_CACHE
location. By default, this is in the current user's home directory and may prove problematic for those with smallish home directories. See the htslib documentation for more information.
This program is heavily based on code from SAMBLASTER, unpublished code from Ryan Layer and code written by Travis Abbott in diagnose_dups.
Currently, extract_sv_reads
must be compiled from source code. It is routinely tested using both the gcc
and clang
compilers on Ubuntu 12.04. It should work on other Unix-based operating systems, but they are not supported.
- git
- cmake 2.8+ (cmake.org)
Boost 1.59, htslib 1.6, and zlib 1.2.8 are included with the source code and will be utilized during compilation. Older versions of Boost will not work if specified directly.
- For APT-based systems (Debian, Ubuntu), install the following packages:
sudo apt-get install build-essential git-core cmake
Download and extract the code of the latest release
or clone from the master branch using git
git clone git://github.com/hall-lab/extract_sv_reads.git
extract-sv-reads
does not support in-source builds. So create a subdirectory, enter it, build, and run tests:
mkdir extract_sv_reads/build
cd extract_sv_reads/build
cmake ..
make -j
make test
Tests should pass. The binary extract-sv-reads
can then be found under extract_sv_reads/build/bin
. If you have administrative rights, then run sudo make install
to install the tool for all users under /usr/bin
.
htslib can be linked against curl for interaction with AWS and GCS. In addition, it can be linked with lzma and bz2 for full read support of all types of CRAM files. To enable these features install the following packages.
- For APT-based systems (Debian, Ubuntu):
sudo apt-get install libbz2-dev liblzma-dev libssl-dev libcurl4-openssl-dev
mkdir extract_sv_reads/build
cd extract_sv_reads/build
cmake -DHTSLIB_USE_LIBCURL=1 -DHTSLIB_USE_LZMA=1 -DHTSLIB_USE_BZ2=1 ..
make -j
make test
Please cite extract-sv-reads
using its DOI. Note that this link corresponds to the latest version. If you used an earlier version then your DOI may be different and you can find it on Zenodo.
Please open issues on the github repository to obtain help.