Add v3 SV dataset #1209

phildarnowsky-broad · 2023-10-16T20:39:35Z

This adds a data pipeline, dataset metadata, API support, and other miscellaneous things to add gnomAD v3 structural variants to the browser.

phildarnowsky-broad · 2023-10-16T21:03:00Z

@mattsolo1 this one will definitely be easier to review commit by commit

rileyhgrant

Heya Phil,

Lookin' very slick. I love all the testing, and I selfishly I am excited to have such a diverse set of tests for me to personally reference when adding to the test suite. On the whole this PR is looking great.

I am requesting changes mainly so a few comments get addressed, and so there's record of the fact that this PR is awaiting some copy edits (and as discussed in standups, at least one further commit that replaces v3 -> v4). Further, I'd like to request that when the demo is back up, that I could get looped in to poke around and also confirm things are stable.

I always enjoy reviewing your PRs as I get to learn more design and testing patterns as I do so -- very nicely done.

rileyhgrant · 2023-10-25T21:33:57Z

browser/src/StructuralVariantList/StructuralVariants.tsx

    // @ts-expect-error TS(7006) FIXME: Parameter 'variant' implicitly has an 'any' type.
    (variant) => {


Can this be typed and the @ts-expect-error be removed?

@rileyhgrant I defined this as a StructuralVariantPropType rather than a StructuralVariant. As a comment in this PR implies, I took a look at collapsing the two types into one, which turns out to be more finicky than it might seem. Given that we're under the gun I've decided not to touch that for now. Better two partially redundant types than none at all.

rileyhgrant · 2023-10-25T21:38:27Z

browser/src/StructuralVariantPage/StructuralVariantAttributeList.tsx

+  LOWQUAL_WHAM_SR_DEL: 'Wham And Spilt-Read Evidence Only',
+


Is this meant to read "Split" read, rather than "Spilt"? My domain knowledge fails me, so wanted to double check.

rileyhgrant · 2023-10-25T21:46:40Z

browser/src/missingContent.ts

+  key: string
+) => textMapping[key] || `TEXT NEEDED FOR ${entityType.toUpperCase()} "${key}"`


Interesting, so is the intended use case of this is to give the scientists reviewing demos the chance to see that text is missing? And then there are several pieces of copy that get wrapped in this function?

I'm not entirely sure how I feel about the need for this in production, as opposed to being a squeaky wheel to get copy, but I suppose I can see the value.

Interesting, so is the intended use case of this is to give the scientists reviewing demos the chance to see that text is missing?

Basically, yes. With this helper, it's at least clear that something is supposed to go there, rather than say me leaving something out of the frontend by accident. This came up a few times in the course of development, before I had all of the copy in question, where it would be confusing for the scientists QAing a demo if they saw a blank space where one of those missing pieces of copy should go. This way it's at least clear that something is supposed to go there, and there's a strong hint as to what exactly is missing.

I'd argue that we should keep it on the same principle: worst case, we miss some copy somewhere, someone sees this "TEXT NEEDED" message in prod, and they can give us a bug report that points directly to to the problem.

rileyhgrant · 2023-10-25T21:58:19Z

data-pipeline/src/data_pipeline/pipelines/gnomad_sv_v3.py

+        "structural_variants_step_1": "import_all_svs_from_vcfs",
+        "structural_variants_step_2": "add_histograms",
+        "structural_variants": "add_variant_id_upper_case",


This is a bit of a nitpick - I believe pipeline outputs are primarily used when referencing the pipeline in other modules to find the location of the created hailtables (e.g. to export the final table to elasticsearch).

As such, it's really needed to set outputs for each step of the pipeline, in this cause just the final step should suffice.

TY this is one detail I have a habit of forgetting

rileyhgrant · 2023-10-25T22:03:25Z

browser/src/StructuralVariantList/structuralVariantTableColumns.tsx

-const getContextType = (context: any) => {
-  if (context.transcript_id) {
-    return 'transcript'
-  }
-  if (context.gene_id) {
-    return 'gene'
-  }
-  return 'region'
-}
-
-export const getColumnsForContext = (context: any) => {
-  const contextType = getContextType(context)
+export const getColumnsForContext = (context: Context) => {
  const columns = structuralVariantTableColumns
    .filter(
-      (column) =>
-        // @ts-expect-error TS(2554) FIXME: Expected 1 arguments, but got 2.
-        column.shouldShowInContext === undefined || column.shouldShowInContext(context, contextType)
+      (column) => column.shouldShowInContext === undefined || column.shouldShowInContext(context)
    )
-    .map((column) => ({
-      ...column,
-      description: (column as any).descriptionInContext
-        ? (column as any).descriptionInContext(context, contextType)
-        : (column as any).description,
-    }))


Where did this logic go, was it unnecessary?

Apologies if I missed it, I didn't seem to see anything else in this PR that moved this logic to elsewhere.

Where did this logic go, was it unnecessary?

It was. This file here has similar functionality to several other modules in the codebase, and was originally done by copypasta from one of those. If you look over the various column definitions in structuralVariantTableColumns, you'll see none of them have a description or descriptionInContext field set--meaning these lines at the end were dead weight. Another good example of why coding by copypasta is problematic.

rileyhgrant · 2023-10-25T22:04:08Z

browser/help/topics/structural-variants/sv-effect-overview.md

+1. ![](https://placehold.it/15/D43925/000000?text=+) **Predicted loss-of-function (pLoF)**: SV is predicted to delete the gene or truncate the gene product.
+2. ![](https://placehold.it/15/7459B2/000000?text=+) **Intragenic exonic duplication (IED)**: SV is predicted to result in duplicated exons within the gene, without extending beyond the boundaries of the open reading frame. (New in gnomAD v3)


Are these icons placeholders so you can immediately see they need review? This comment is pretty much just personal curiosity.

TBH I'm not sure of why these icons are here or why it makes sense to use a random web service to render them, but that's how I found it in the original.

phildarnowsky-broad · 2023-10-26T21:37:51Z

@rileyhgrant I believe I've addressed all your feedback, and I've done the "v3" to "v4" switch. Ready for re-review.

rileyhgrant

LGTM!

Most of the key transformations in this new pipeline are taken from the v2 SV pipeline in `gnomad_sv_v2.py`.

The only difference (at least in this first iteration) between v2 and v3 SV queries is the index they use. Here we refactor so that we'll be able to re-use the existing SV queries, largely by folding two separate modules into one.

The hexidecimal suffixes for the new v3 IDs was in lowercase, which looked sloppy.

This includes the dataset ID `gnomad_sv_v4` itself as that appears in URLs.

phildarnowsky-broad requested a review from mattsolo1 October 16, 2023 21:02

rileyhgrant self-requested a review October 25, 2023 20:02

rileyhgrant assigned phildarnowsky-broad Oct 25, 2023

rileyhgrant requested changes Oct 25, 2023

View reviewed changes

rileyhgrant approved these changes Oct 26, 2023

View reviewed changes

phildarnowsky-broad added 23 commits October 27, 2023 10:42

Add pipeline to build v3 SVs

488c6e4

Most of the key transformations in this new pipeline are taken from the v2 SV pipeline in `gnomad_sv_v2.py`.

Add configuration to allow export of v3 SVs to Elasticsearch

9d5786f

Characterize v2 SV GraphQL queries

677a338

Refactor SV queries to make it easier to add v3 SVs

1018480

The only difference (at least in this first iteration) between v2 and v3 SV queries is the index they use. Here we refactor so that we'll be able to re-use the existing SV queries, largely by folding two separate modules into one.

Add support for v3 SVs to GraphQL API

d1c5091

Add gnomad_sv_r3 as a valid dataset ID to metadata

e79dd0d

Add v3 SVs option to main Searchbox

b98979b

Eliminate PropTypes from StructuralVariantPage

08903e8

Characterize StructuralVariantPage

fe97f25

Characterize isStructuralVariantId

da82e58

Search recognizes V3 SV ID format

c9fdd45

Add new consequences for v3 SVs

65f25c0

Display SV IDs in all upper case

c46a5b3

The hexidecimal suffixes for the new v3 IDs was in lowercase, which looked sloppy.

Add helpers for missing copy

116a6ef

Add new pop names to v3 SV table

79775d8

Add v3 SVs to dataset selector

641892f

Default structural variants to coloring by type, not consequence

48a2141

Add assortment of missing copy for v3 SVs

4b55f54

Add reference genome header for v3 SVs to dataset selector

16c3ad9

Fill in actual sample count for v3 SVs

6e6e873

Make ESlint happy

901a65b

Type a few things

e6cf2dd

Fix a typo

41fc7c5

Change user-facing references to dataset to say "v4 SVs"

d890f3b

This includes the dataset ID `gnomad_sv_v4` itself as that appears in URLs.

phildarnowsky-broad force-pushed the v3_svs branch from 5dcd503 to d890f3b Compare October 27, 2023 14:43

phildarnowsky-broad merged commit c1a11c2 into main Oct 27, 2023
3 checks passed

rileyhgrant deleted the v3_svs branch January 17, 2024 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add v3 SV dataset #1209

Add v3 SV dataset #1209

phildarnowsky-broad commented Oct 16, 2023

phildarnowsky-broad commented Oct 16, 2023

rileyhgrant left a comment •

edited

Loading

rileyhgrant Oct 25, 2023

phildarnowsky-broad Oct 26, 2023

rileyhgrant Oct 25, 2023

rileyhgrant Oct 25, 2023

phildarnowsky-broad Oct 26, 2023

rileyhgrant Oct 25, 2023

phildarnowsky-broad Oct 26, 2023

rileyhgrant Oct 25, 2023

phildarnowsky-broad Oct 26, 2023

rileyhgrant Oct 25, 2023

phildarnowsky-broad Oct 26, 2023

phildarnowsky-broad commented Oct 26, 2023

rileyhgrant left a comment

		// @ts-expect-error TS(7006) FIXME: Parameter 'variant' implicitly has an 'any' type.
		(variant) => {

		key: string
		) => textMapping[key] \|\| `TEXT NEEDED FOR ${entityType.toUpperCase()} "${key}"`

		1. ![](https://placehold.it/15/D43925/000000?text=+) Predicted loss-of-function (pLoF): SV is predicted to delete the gene or truncate the gene product.
		2. ![](https://placehold.it/15/7459B2/000000?text=+) Intragenic exonic duplication (IED): SV is predicted to result in duplicated exons within the gene, without extending beyond the boundaries of the open reading frame. (New in gnomAD v3)

Add v3 SV dataset #1209

Add v3 SV dataset #1209

Conversation

phildarnowsky-broad commented Oct 16, 2023

phildarnowsky-broad commented Oct 16, 2023

rileyhgrant left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phildarnowsky-broad commented Oct 26, 2023

rileyhgrant left a comment

Choose a reason for hiding this comment

rileyhgrant left a comment •

edited

Loading