-
Notifications
You must be signed in to change notification settings - Fork 1
Running cte
cte
can be run in two modes: evaluate a single consensus sequence,
or evaluate a batch of consensus sequences.
To evaluate one SARS-CoV-2 consensus sequence, you will need:
- A VCF file of the "truth" calls
truth.vcf
, as documented on the Truth VCF file page. - The consensus sequence to evalaute in a FASTA file
cons.fa
- The primer scheme. Currently supported: COVID-ARTIC-V3, COVID-ARTIC-V4, COVID-MIDNIGHT-1200. Or use your own TSV file of primers in Viridian Workflow format.
Example, assuming primer scheme COVID-ARTIC-V4:
cte eval_one_run \
--outdir OUT \
--truth_vcf truth.vcf \
--fasta_to_eval cons.fa \
--primers COVID-ARTIC-V4
Please read the output files page for a description of the output. Briefly, the output files are:
-
results.tsv
- most likely the file you want. Tab-delimited file with counts of the truth bases vs what was called in the consensus. The same information is also put in a JSON fileresults.json
. -
per_position.tsv
- a more detailed TSV file, with information at each position in the genome.
In batch mode, each consensus sequence is evaluated independently, making the same
files as if it were run on its own. Additionally, a "grand total" results.tsv
file is made
that has the sum of all the values from the individual results files.
To run a batch you will need the same information as for running on a
single consensus. That information needs to go in a manifest tab-delimited
file that has a name for the consensus, truth VCF filename, consensus FASTA filename, and
primers name or primers filename.
The file must have the columns name
, truth_vcf
, eval_fasta
and primers
.
The order does not matter (and any other columns are ignored).
Example:
name truth_vcf eval_fasta primers
consensus1 truth1.vcf cons1.fasta COVID-ARTIC-V3
consensus1.2 truth1.vcf cons1.2.fasta COVID-ARTIC-V3
consensus2 truth2.vcf cons2.fasta COVID-ARTIC-V3
The name
must be unique within the file, and also be "filesystem friendly" -
the output files are put in directories named using the name
column.
Run the batch with this command:
cte eval_runs --outdir out manifest.tsv
The output directory out
contains the files/directories:
-
Processing/
- this contains a directory per consensus, each one named using thename
column frommanifest.tsv
. It is the result of runningcte
on that consensus, ie equivalent to runningcte eval_one_run
. -
results.tsv
- the results summed across all of the input consensus sequences. It is the same format as a per-consensusresults.tsv
file. -
results_per_run.tsv
- the results of each individual run. This is the same format as the per-consensusresults.tsv
, but with an extraname
column. The same information is also written toresults_per_run.json
.