-
Notifications
You must be signed in to change notification settings - Fork 5
JSON output file
This page describes the contents of the file log.json
, made when running
viridian_workflow run_one_sample
.
The main entries in the file are:
-
run_summary
- has high-level details on the run -
read_and_primer_stats
- high-level read counts (and reads mapped etc), and amplicon scheme identification details -
read_sampling
- read depths and related information for each amplicon -
viridian
- details of the results of consensus calling using Viridian -
self_qc
- this is to be implemented.
Please read on below for more details about the contents of each of those entries.
An example run_summary
entry is:
"run_summary": {
"last_stage_completed": "Finished",
"command": "viridian_workflow run_one_sample --tech illumina --ref_fasta ref.fa --reads1 reads_1.fastq.gz --reads2 reads_2.fastq.gz --outdir OUT",
"options": {
"debug": false,
... etc listing all the command line options ...
},
"cwd": "/hps/nobackup/iqbal/mhunt/Covid_test_data_20210813.VWF.20211213.d1932ec1ea/Thielen",
"version": "0.1.1",
"finished_running": true,
"start_time": "2021-12-13T10:51:03",
"end_time": "2021-12-13T10:54:49",
"hostname": "myhost",
"result": "Success",
"run_time": "0:03:46.060333"
This should be mostly self-explanatory.
The file is written at several stages during the pipeline. Initially,
result
will be Unknown
.
The above example is how it looks at the end of a successful run - the
key thing is that result
says Success
. If the pipeline detects something
wrong during the run, then result
will be a list of error messages. For
example if too many amplicons have not enough reads to reliably call a
consensus, the pipeline will stop and this will be in the output:
"result": ["Too many amplicons are too low depth. STOPPING"]
This section contains information on mapping all the original input reads to the reference genome, and attempting to allocate them to amplicon(s). Here is an example for paired Illumina reads:
"read_and_primer_stats": {
"unpaired_reads": 0,
"reads1": 489949,
"reads2": 489949,
"total_reads": 979898,
"mapped": 971562,
"match_any_amplicon": 486271,
"read_lengths": {
"149": 1023,
"150": 692551,
... etc. key=length, value=number of reads ...
},
"amplicon_scheme_set_matches": {
"COVID-ARTIC-V3;COVID-ARTIC-V4;COVID-MIDNIGHT-1200": 83644,
"COVID-ARTIC-V3;COVID-MIDNIGHT-1200": 298897,
"COVID-ARTIC-V3": 84823,
"COVID-ARTIC-V3;COVID-ARTIC-V4": 18654,
"COVID-MIDNIGHT-1200": 252,
"COVID-ARTIC-V4": 1
},
"amplicon_scheme_simple_counts": {
"COVID-ARTIC-V3": 486018,
"COVID-ARTIC-V4": 102299,
"COVID-MIDNIGHT-1200": 382793
},
"chosen_amplicon_scheme": "COVID-ARTIC-V3"
}
The first few entries contain the number of reads: these are paired
reads, so we have counts for forward reads reads1
and reverse reads
reads2
, and total_reads
= forward plus reverse reads. For unpaired
nanopore reads, the read count would be in reads
, the
reads1
/reads2
values would be zero, and reads
= total_reads
.
The mapped
entry is simply the total number of mapped reads.
The read_lengths
entry is a histogram of read length to number
of reads (it includes all reads, whether mapped or not).
The match_any_amplicon
count is for read pairs if the reads
are paired, and for reads if the reads are unpaired. It is the
number of (unpaired) reads, or number of fragments/read pairs,
that match to any amplicon from any of the amplicon schemes under
consideration. For read pairs, the entire fragment (ie start of left read
and end of right read) is considered, and therefore the
count is for read pairs, not individual reads.
Since amplicon positions can overlap between amplicon schemes, a read (pair)
can be allocated to zero, one, or more than one amplicon. The entry
amplicon_scheme_set_matches
shows the number of reads matching
different combinations of schemes. For example, a read could match
amplicon 1 from ARTIC V3 and amplicon 1 from Midnight-1200, and in this
case the counter for "COVID-ARTIC-V3;COVID-MIDNIGHT-1200"
would be
incremented.
The entry amplicon_scheme_simple_counts
shows the number of reads
allocated to each amplicon, ignoring combinations. For example, a
read matching both COVID-ARTIC-V3
and COVID-MIDNIGHT-1200
would
result in the counters for both those schemes being incremented.
Finally, the entry chosen_amplicon_scheme
shows the amplicon
scheme that was chosen. Currently the naive method of taking the
scheme with the most counts from amplicon_scheme_simple_counts
is used. This may change in the future.
Note that there is a top-level entry in the JSON called
amplicon_scheme_name
. This is the scheme that was actually
used. It will usually be the same as chosen_amplicon_scheme
.
However, if the option to force the scheme was used
(--force_amp_scheme
) then amplicon_scheme_name
will be
that forced choice, regardless of the result in
chosen_amplicon_scheme
.
To be completed