Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display AS VQSR culprit information in site quality metrics table #1600

Open
ch-kr opened this issue Jul 24, 2024 · 0 comments
Open

Display AS VQSR culprit information in site quality metrics table #1600

ch-kr opened this issue Jul 24, 2024 · 0 comments
Assignees

Comments

@ch-kr
Copy link
Contributor

ch-kr commented Jul 24, 2024

An external user reached out to Mark for guidance on how to interpret variants that are flagged as failing our variant QC model (AS_VQSR). After some discussion, we came to the conclusion that it could be helpful to include a designation for which annotation was marked AS_culprit for variants that are flagged with AS_VQSR.

The public exomes and genomes release HTs include this annotation in the vqsr_results struct:

    'vqsr_results': struct {
        AS_VQSLOD: float64,
        AS_culprit: str,
        positive_train_site: bool,
        negative_train_site: bool
    }

For example, this variant fails AS_VQSR in the v4 genomes, and checking the v4 genomes release HT shows that the AS_culprit for this variant is AS_ReadPosRankSum:

>>> ht = hl.read_table('gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht')
>>> ht = hl.filter_intervals(ht, [hl.parse_locus_interval('chr11:747571-747572', reference_genome='GRCh38')])
>>> ht.vqsr_results.show()

+---------------+------------+------------------------+-------------------------+----------------------------------+
| locus         | alleles    | vqsr_results.AS_VQSLOD | vqsr_results.AS_culprit | vqsr_results.positive_train_site |
+---------------+------------+------------------------+-------------------------+----------------------------------+
| locus<GRCh38> | array<str> |                float64 | str                     |                             bool |
+---------------+------------+------------------------+-------------------------+----------------------------------+
| chr11:747571  | ["C","T"]  |              -7.48e+00 | "AS_ReadPosRankSum"     |                            False |
+---------------+------------+------------------------+-------------------------+----------------------------------+

+----------------------------------+
| vqsr_results.negative_train_site |
+----------------------------------+
|                             bool |
+----------------------------------+
|                             True |
+----------------------------------+

Would it be possible to visually designate which metric is the AS_culprit in the Site Quality Metrics table (e.g., bolding, adding an asterisk, changing text color, etc)?
image

Also, for additional context, the descriptions of these fields are:

vqsr_results: VQSR related variant annotations.

    - AS_VQSLOD: Allele-specific log-odds ratio of being a true variant versus being a false positive under the trained VQSR Gaussian mixture model.
    - AS_culprit: Allele-specific worst-performing annotation in the VQSR Gaussian mixture model.
    - positive_train_site: Variant was used to build the positive training set of high-quality variants for VQSR.
    - negative_train_site: Variant was used to build the negative training set of low-quality variants for VQSR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants