Skip to content

Commit

Permalink
Merge pull request #126 from Bioinformatics/dev
Browse files Browse the repository at this point in the history
Dev to master
  • Loading branch information
Skola, Dylan authored and GitHub Enterprise committed Oct 10, 2019
2 parents 9a93d08 + 8a5ed7c commit cc949d5
Show file tree
Hide file tree
Showing 9 changed files with 38 additions and 20 deletions.
5 changes: 4 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,7 @@ compile_commands.json
tags
tags_sorted_by_file
.tags
.tags_sorted_by_file
.tags_sorted_by_file
cmake-build-debug
cmake-build-release
build
6 changes: 3 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
FROM ubuntu:16.04
FROM ubuntu:18.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
apt-get install -y \
Expand All @@ -22,14 +24,12 @@ RUN apt-get update && \
python-pandas \
python-distribute \
python-pysam \
python-software-properties \
python-scipy \
software-properties-common \
wget \
zlib1g-dev && \
apt-get clean -y


RUN pip install bx-python

# copy git repository into the image
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.ubuntu-with-tests
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM ubuntu:16.04
FROM ubuntu:18.04

RUN apt-get update && \
apt-get install -y \
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ when using variant calling methods that produce many complex variant calls,
these corner cases can become relevant. Moreover, when benchmarking against
gold-standard datasets that cover difficult regions of the genome (e.g.
[Platinum Genomes](http://www.illumina.com/platinumgenomes/)), the more complicated
subsets of the genome will be respnsible for most of the difference between
subsets of the genome will be responsible for most of the difference between
methods.

### Variant preprocessing
Expand Down Expand Up @@ -164,7 +164,7 @@ prefer to combine local haplotypes in the same variant records
different variant calling methods.

```
chr1 201586350 . CTCTCTCTCT C
chr1 201586350 . CTCTCTCTC C
chr1 201586359 . T A
```

Expand Down Expand Up @@ -351,7 +351,7 @@ docker build -f Dockerfile.centos6 .
You will need these tools / libraries on your system to compile the code:

* CMake > 2.8
* GCC/G++ 4.8+ for compiling
* GCC/G++ 4.9.2+ for compiling
* Boost 1.55+
* Python 2, version 2.7.8 or greater
* Python packages: Pandas, Numpy, Scipy, pysam, bx-python
Expand Down
10 changes: 5 additions & 5 deletions doc/happy.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ are not supported, all input bed or bed.gz files must only contain bed records).
Hap.py will report counts of

* ***true-positives (TP)***: variants/genotypes that match in truth and query.
* ***false-positives (FP)***: variants that have mismatching genotypes or alt
* ***true-positives (TP)*** : variants/genotypes that match in truth and query.
* ***false-positives (FP)*** : variants that have mismatching genotypes or alt
alleles, as well as query variant calls in regions a truth set would call
confident hom-ref regions.
* ***false-negatives (FN)*** : variants present in the truth set, but missed
in the query.
* ***non-assessed calls (UNK)***: variants outside the truth set regions
* ***non-assessed calls (UNK)*** : variants outside the truth set regions

From these counts, we are able to calculate

Expand Down Expand Up @@ -488,8 +488,8 @@ a ROC curve based on the query GQX field:
The `--roc` switch specifies the feature to filter on. Hap.py translates the
truth and query GQ(X) fields into the INFO fields T_GQ and Q_GQ, it tries to
use GQX first, if this is not present, it will use GQ. When run without
internal preprocessing any other input INFO field can be used (e.g. VQSLOD for
GATK).
internal preprocessing any other input INFO field can be used (e.g.
--roc INFO.VQSLOD for GATK).

The `--roc-filter` switch may be used to specify the particular VCF filter
which implements a threshold on the quality score. When calculating filtered
Expand Down
2 changes: 1 addition & 1 deletion example/happy/microbenchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ REF=$DIR/hg38.chr21.fa
# -------
#
# To make ROCs for GATK, we discard the LowQual filter and use QUAL
# For VQSR ROCs, we would use VQSLOD and discard the VQSR Tranche filters
# For VQSR ROCs, we would use INFO.VQSLOD and discard the VQSR Tranche filters

f=GATK3
g=${DIR}/NA12878-GATK3-chr21.vcf.gz
Expand Down
5 changes: 5 additions & 0 deletions src/c++/lib/diploidgraphs/DiploidReference.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,11 @@ void DiploidReference::setRegion(
if(opposite_path != nu_haps.end() && opposite_path != nu_haps.begin())
{
size_t p2 = opposite_path->second;
// make order reproducible since map is not ordered
if(p2 > p1)
{
std::swap(p1, p2);
}

nu_haps.erase(nu_haps.begin());
nu_haps.erase(opposite_path);
Expand Down
20 changes: 15 additions & 5 deletions src/c++/lib/quantify/QuantifyRegions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
#include "helpers/BCFHelpers.hh"

#include <map>
#include <regex>
#include <unordered_map>
#include <htslib/vcf.h>

Expand Down Expand Up @@ -102,7 +101,6 @@ namespace variant
void QuantifyRegions::load(std::vector<std::string> const &rnames, bool fixchr)
{
std::unordered_map<std::string, size_t> label_map;
const std::regex trailing_number_regex ("(.+)_([0-9]+)$");
for (std::string const &f : rnames)
{
std::vector<std::string> v;
Expand Down Expand Up @@ -238,10 +236,22 @@ namespace variant

if (!fixed_label && v.size() > 3)
{
std::smatch string_matches;
if(std::regex_match(v[3], string_matches, trailing_number_regex))
size_t split = v[3].size();
for (size_t pos = v[3].size(); pos != 0; --pos)
{
label_ids.insert(getLabelId(label + "_" + string_matches.str(1), 1));
if (v[3][pos] < '0' || v[3][pos] > '9')
{
break;
}
else
{
split = pos;
}
}

if(split < v[3].size() && v[3][split] == '_')
{
label_ids.insert(getLabelId(label + "_" + v[3].substr(0, split), 1));
label_ids.insert(getLabelId(label + "_" + v[3], 2));
}
else
Expand Down
2 changes: 1 addition & 1 deletion src/sh/run_happy_pg_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ ${PYTHON} ${HCDIR}/hap.py \
-r ${DIR}/../../example/chr21.fa \
-o ${TMP_OUT}.unhappy \
-X --unhappy \
--roc VQSLOD \
--roc INFO.VQSLOD \
--force-interactive

if [[ $? != 0 ]]; then
Expand Down

0 comments on commit cc949d5

Please sign in to comment.