-
Notifications
You must be signed in to change notification settings - Fork 5
Minos in single sample mode
This page describes running Minos on a single sample. The use-case is that you have one or more VCF files (probably generated using different methods), with calls with respect to the same reference genome. These calls can overlap and/or be at the same positions, or be completely different -- it does not matter and Minos will handle this by merging all overlapping variants. Minos will take all variants from all input VCF files, and genotype your sample at each site. The output is a single VCF file.
Required input files:
- The reference genome in FASTA format (must be the same reference as used in all VCF files).
- At least one VCF file of calls (with respect to the reference FASTA from 1)
- At least one reads file, in FASTA/Q or BAM format.
Only VCF records that have the GT
field present are used from the input VCF file(s).
All other records are ignored. Further, only the called alleles from each VCF record are
used, and only alleles that comprise of one or more A,G,C,T characters.
Here is an example VCF input file (header missing for brevity):
ref_name 100 . C A . . . GT 0/0
ref_name 101 . C <FOO> . . . GT 1/1
ref_name 102 . T A,G . . . GT 0/1
ref_name 103 . G A,C,<BAR> . . . GT 1/2
ref_name 104 . C T . . . foo bar
The resulting variants that would be considered for genotyping are:
T102A
, G103A
, G103C
. All others are excluded:
-
C100A
has reference genotype -
C101<FOO>
only has a non-ACGT allele genotyped -
T102G
was not genotyped (this position was genotyped as theT
orA
allele) -
T103<BAR>
was not genotyped and is non-ACGT -
C104T
has noGT
field.
Example command, where we have two VCF files of variants:
minos adjudicate --reads reads.fastq.gz outdir ref.fasta 1.vcf 2.vcf
Notes:
- the
--reads
option can be used as many times as you like - once for each reads file. Paired info is not used, so the order of these files does not matter, eg if you have paired reads in two files. -
outdir
should not exist, and is the output directory that will be made to store all output files (you can add the--force
option to overwriteoutdir
if you're feeling confident) - You can list as many VCF files as you like at the end of the command - in that example there are two files.
Example command, where we have two reads files, three VCF files of variants, and overwrite the output directory (if it exists already):
minos adjudicate --force --reads reads1.fastq.gz --reads reads2.fastq.gz outdir ref.fasta 1.vcf 2.vcf 3.vcf
Use the --sample_name
option to put the name of your sample into the final
VCF file:
minos adjudicate --sample_name sample_42 --reads reads.fastq.gz outdir ref.fasta 1.vcf 2.vcf
The important output files are:
-
final.vcf
- this is a VCF file containing the final call set, and is most likely the only file you need. -
log.txt
- logging information. -
debug.calls_with_zero_cov_alleles.vcf
- this is the initial call set. It includes all sites (and all their alleles) considered by minos, after combining all the original input VCF files. Alleles with no coverage are removed from this to make the final call set infinal.vcf
. If you want a consistent VCF file across multiple runs, eg using the same input VCF files, but each sample has their own reads file, then you may want to usedebug.calls_with_zero_cov_alleles.vcf
to compare the outputs.
The VCF file final.vcf
has these filters implemented in the
FILTER
column:
-
MIN_DP
- requires the total read depth to be at least 2. -
MAX_DP
- requires the total read depth to be less than the mean plus 3 standard deviations. -
MIN_FRS
- "minimum fraction of read support" - requires at least 90% of the reads to support the called allele. -
MIN_GCP
- "minimum genotype confidence percentile" - this is a "normalised" genotype confidence score that can be used across samples. It is described in full in the Minos publication. Briefly, the GCP is the percentile of the genotype confidence score inside the expected distribution from the genotyping model. Since read depth is a parameter of the model, this is different for each run of Minos (unless two samples happen to have identical read depth/standard deviation etc). A call must have GCP of at least 0.5% to pass.
All of the default filter cutoffs can be changed, using the options
--filter_min_dp
, --filter_min_frs
, --filter_min_gcp
, and
--filter_max_dp
. However, we recommend keeping the defaults unless
you have a compelling reason to change them.
We do not recommend changing any options other than those listed above.