-
Notifications
You must be signed in to change notification settings - Fork 5
QC TSV output file
This page describes the QC gzipped tab-delimited file
qc.tsv.gz
made when running viridian run_one_sample
.
It contains per-base information on the consensus sequence and how it aligns to the reference genome.
The first 11 columns are:
-
Ref_pos
- reference position (1-based) -
Ref_nt
- reference nucleotide -
Cons_pos
- consensus position (1-based) -
Cons_nt
- consensus nucleotide -
Masked_cons_nt
- consensus nucleotide after masking -
Amplicon
- name of amplicon(s) this position belongs to -
Primer
- name of primer(s) this position belongs to -
Mask
- list of mask reasons, or PASS if not masked -
Total_depth
- total read depth at this position -
Clean_depth
- total "clean" read depth, ie excluding primer portions of reads -
Cons_depth
- total clean depth that supports the consensus call
The remaining columns are the count of A/C/G/T/insertion/deletion pileup
depths from the reads on each strand. The counts are split into "clean"
read depth (ie excluding primer portions of reads) and "bad" read depth
(primer portions of reads): A
/a
are the clean counts of A
from the
reads on the forward/reverse strand, and similarly for the other
nucleotides. I
/i
and D
/d
show the insertion and deletion
clean counts
(but not their lengths - these are in the detailed entries of the
self_qc
entry of the log JSON file).
The "bad" read depths are given in columns of the same name, but
with _X
appended, for example A_X
/a_X
for bad read depth
of A
on the forward and reverse strands.
Here is a toy example, showing the 11 columns only (otherwise it is far too wide!):
Ref_pos Ref_nt Cons_pos Cons_nt Masked_cons_nt Amplicon Primer Mask Total_depth Clean_depth Cons_depth
1 A 0 - - . . . . . .
2 T 0 - - . . . . . .
3 G 1 G N A1 A1_l_0 DEPTH 700 0 0
4 C 2 C N A1 A1_l_0 DEPTH 702 0 0
5 G 3 G N A1 A1_l_0 DEPTH 705 0 0
6 G 4 G N A1 A1_l_0 DEPTH 701 0 0
7 A 5 A A A1 . PASS 800 800 800
8 A 6 A A A1 . PASS 804 804 799
9 C 7 C C A1 . PASS 805 805 804
10 A 8 A A A1 . PASS 800 800 800
11 A 8 - - A1 . . . . .
12 T 9 T T A1 . PASS 800 800 800
13 C 10 C C A1;A2 A2_l_0 PASS 1500 802 801
14 G 11 G G A1;A2 A2_l_0 PASS 1503 801 799
15 C 12 C C A1;A2 A2_l_0 PASS 1501 804 804
In this example, the first amplicon, called A1
, starts at reference position 3.
Its left primer, called A1_l_0
, is at reference position 3-6. The only
read depth there is "bad", and so those positions are masked in the
consensus sequence.
The consensus has the reference position 11 deleted (there is -
in the
Cons_nt
column).
The second amplicon, called A2
, starts at reference position 13. Positions
13-15 in the reference belong to amplicons A1
and A2
, and also the
left primer A2_l_0
of amplicon A2
. This means there is a mix of
good and bad coverage at positions 13-15. The good coverage is from
the reads from amplicon A1
, and the bad coverage is from the
primer parts of the reads from amplicon A2
. We see that the total depth
is around 1500X, but the good coverage that supports the
consensus call (Clean_depth
) is around 800X.