Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v3 SV dataset #1209

Merged
merged 24 commits into from
Oct 27, 2023
Merged

Add v3 SV dataset #1209

merged 24 commits into from
Oct 27, 2023

Conversation

phildarnowsky-broad
Copy link
Contributor

This adds a data pipeline, dataset metadata, API support, and other miscellaneous things to add gnomAD v3 structural variants to the browser.

@phildarnowsky-broad
Copy link
Contributor Author

@mattsolo1 this one will definitely be easier to review commit by commit

Copy link
Contributor

@rileyhgrant rileyhgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya Phil,

Lookin' very slick. I love all the testing, and I selfishly I am excited to have such a diverse set of tests for me to personally reference when adding to the test suite. On the whole this PR is looking great.

I am requesting changes mainly so a few comments get addressed, and so there's record of the fact that this PR is awaiting some copy edits (and as discussed in standups, at least one further commit that replaces v3 -> v4). Further, I'd like to request that when the demo is back up, that I could get looped in to poke around and also confirm things are stable.

I always enjoy reviewing your PRs as I get to learn more design and testing patterns as I do so -- very nicely done.

Comment on lines 186 to 187
// @ts-expect-error TS(7006) FIXME: Parameter 'variant' implicitly has an 'any' type.
(variant) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be typed and the @ts-expect-error be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rileyhgrant I defined this as a StructuralVariantPropType rather than a StructuralVariant. As a comment in this PR implies, I took a look at collapsing the two types into one, which turns out to be more finicky than it might seem. Given that we're under the gun I've decided not to touch that for now. Better two partially redundant types than none at all.

Comment on lines 18 to 19
LOWQUAL_WHAM_SR_DEL: 'Wham And Spilt-Read Evidence Only',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to read "Split" read, rather than "Spilt"? My domain knowledge fails me, so wanted to double check.

Comment on lines +4 to +5
key: string
) => textMapping[key] || `TEXT NEEDED FOR ${entityType.toUpperCase()} "${key}"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so is the intended use case of this is to give the scientists reviewing demos the chance to see that text is missing? And then there are several pieces of copy that get wrapped in this function?

I'm not entirely sure how I feel about the need for this in production, as opposed to being a squeaky wheel to get copy, but I suppose I can see the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so is the intended use case of this is to give the scientists reviewing demos the chance to see that text is missing?

Basically, yes. With this helper, it's at least clear that something is supposed to go there, rather than say me leaving something out of the frontend by accident. This came up a few times in the course of development, before I had all of the copy in question, where it would be confusing for the scientists QAing a demo if they saw a blank space where one of those missing pieces of copy should go. This way it's at least clear that something is supposed to go there, and there's a strong hint as to what exactly is missing.

I'd argue that we should keep it on the same principle: worst case, we miss some copy somewhere, someone sees this "TEXT NEEDED" message in prod, and they can give us a bug report that points directly to to the problem.

"structural_variants_step_1": "import_all_svs_from_vcfs",
"structural_variants_step_2": "add_histograms",
"structural_variants": "add_variant_id_upper_case",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a nitpick - I believe pipeline outputs are primarily used when referencing the pipeline in other modules to find the location of the created hailtables (e.g. to export the final table to elasticsearch).

As such, it's really needed to set outputs for each step of the pipeline, in this cause just the final step should suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY this is one detail I have a habit of forgetting

Comment on lines -181 to -204
const getContextType = (context: any) => {
if (context.transcript_id) {
return 'transcript'
}
if (context.gene_id) {
return 'gene'
}
return 'region'
}

export const getColumnsForContext = (context: any) => {
const contextType = getContextType(context)
export const getColumnsForContext = (context: Context) => {
const columns = structuralVariantTableColumns
.filter(
(column) =>
// @ts-expect-error TS(2554) FIXME: Expected 1 arguments, but got 2.
column.shouldShowInContext === undefined || column.shouldShowInContext(context, contextType)
(column) => column.shouldShowInContext === undefined || column.shouldShowInContext(context)
)
.map((column) => ({
...column,
description: (column as any).descriptionInContext
? (column as any).descriptionInContext(context, contextType)
: (column as any).description,
}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this logic go, was it unnecessary?

Apologies if I missed it, I didn't seem to see anything else in this PR that moved this logic to elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this logic go, was it unnecessary?

It was. This file here has similar functionality to several other modules in the codebase, and was originally done by copypasta from one of those. If you look over the various column definitions in structuralVariantTableColumns, you'll see none of them have a description or descriptionInContext field set--meaning these lines at the end were dead weight. Another good example of why coding by copypasta is problematic.

Comment on lines 16 to 17
1. ![](https://placehold.it/15/D43925/000000?text=+) **Predicted loss-of-function (pLoF)**: SV is predicted to delete the gene or truncate the gene product.
2. ![](https://placehold.it/15/7459B2/000000?text=+) **Intragenic exonic duplication (IED)**: SV is predicted to result in duplicated exons within the gene, without extending beyond the boundaries of the open reading frame. (New in gnomAD v3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these icons placeholders so you can immediately see they need review? This comment is pretty much just personal curiosity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm not sure of why these icons are here or why it makes sense to use a random web service to render them, but that's how I found it in the original.

@phildarnowsky-broad
Copy link
Contributor Author

@rileyhgrant I believe I've addressed all your feedback, and I've done the "v3" to "v4" switch. Ready for re-review.

Copy link
Contributor

@rileyhgrant rileyhgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Most of the key transformations in this new pipeline are taken from the v2 SV pipeline in `gnomad_sv_v2.py`.
The only difference (at least in this first iteration) between v2 and v3 SV queries is the index they use. Here we refactor so that we'll be able to re-use the existing SV queries, largely by folding two separate modules into one.
The hexidecimal suffixes for the new v3 IDs was in lowercase, which looked sloppy.
This includes the dataset ID `gnomad_sv_v4` itself as that appears in URLs.
@phildarnowsky-broad phildarnowsky-broad merged commit c1a11c2 into main Oct 27, 2023
3 checks passed
@rileyhgrant rileyhgrant deleted the v3_svs branch January 17, 2024 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants