Skip to content

JSON output file

martinghunt edited this page Jan 3, 2024 · 28 revisions

This page describes the contents of the gzipped JSON file log.json.gz, made when running viridian run_one_sample.

The file contains a dictionary of run details. The main entries (keys) are:

  • run_summary - has high-level details on the run
  • stages_completed - the progress of each main stage in the pipeline
  • reads - high-level summary of read counts
  • read_depth - genome coverage and read depth information
  • amplicon_scheme_name - name of the identified amplicon scheme
  • scheme_choice - details of the amplicon scheme scoring
  • amplicons - details of the amplicon scheme that was used
  • self_qc - details of read pileup information at each masked position
  • sequences - consensus sequence and variations (for MSAs and tree building)

Please read on for more details about the contents of each of those entries.

run_summary

This section is a dictionary with a basic summary of the run. Here is an example (some key/value pairs are omitted for brevity):

"last_stage_completed": "Finished",
"command": "viridian run_one_sample ... full command line used",
"options:" {
  "debug": false,
  "outdir": "OUT",
  "force": false,
  ... all the other command line options ...
},
"cwd": "/foo/bar/",
"version": "1.1.0",
"finished_running": true,
"start_time": "2023-09-08T13:37:59+00:00",
"end_time": "2023-09-08T13:39:28+00:00",
"hostname": "thehoff",
"result": "Success",
"errors": [],
"temp_processing_dir": "/tmp/viridian.rxs2ttki",
"total_amplicons": 98,
"successful_amplicons": 98,
"consensus_length": 29836,
"consensus_N_count": 96,
"consensus_N_percent": 0.32,
"consensus_ACGT_count": 29740,
"consensus_ACGT_percent": 99.68,
"consensus_het_count": 0,
"consensus_het_percent": 0.0,
"run_time": "0:01:29.384867"

The most important thing to check is:

"result": "Success"

meaning that the run finished successfully. If instead is says "Fail", then something went wrong and the details will be in the stages_completed section. The other entries should be self-explanatory.

stages_completed

This is a list of the stages that were completed. Each time a stage finishes the json file is written, so that if Viridian crashes or is killed, you can see the last stage that was run.

A successful run looks like this:

"stages_completed": [
  "1/10 Start pipeline (0.0s)",
  "2/10 Process amplicon scheme files (0.1s)",
  "3/10 Map reads to reference (36.8s)",
  "4/10 Detect amplicon scheme (2.7s)",
  "5/10 Sample reads (23.3s)",
  "6/10 Initial consensus sequence (6.1s)",
  "7/10 Initial VCF and MSA of consensus/reference (0.4s)",
  "8/10 QC using reads vs consensus sequence (17.9s)",
  "9/10 Final QC checks (0.1s)",
  "10/10 Tidy up final files and log (1.0s)",
  "Finished"
]

The entries can vary depending on the command line options. For example, if a BAM file of mapped reads was provided, then the "Map reads to reference" stage would not be present. However, the final entry for a successful run is always "Finished".

reads

TBC

read_depth

TBC

amplicon_scheme_name

TBC

scheme_choice

TBC

amplicons

self_qc

TBC

sequences

TBC

Clone this wiki locally