-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VA/VRS GraphQL API #1426
Add VA/VRS GraphQL API #1426
Conversation
closes #1132 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one teeny tiny nitpick, and one request:
Since there's a new config file that should be present for successful deployments, can we add to our CONTRIBUTING.md
to include this information? Maybe a la these lines, or in some other way that seems fitting?
I'm not entirely sure how places normally handle this, but it would be good to get ahead of the issue of someone deploying without this file on accident.
.gitignore
Outdated
@@ -29,3 +29,5 @@ hail-*.log | |||
|
|||
# vscode settings | |||
**/.vscode/ | |||
|
|||
graphql-api/unversioned_config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (non-blocking): It looks like the other env/config type files bundled into various dockerfiles starting at line 16, maybe this is a better fit to put in there?
dc41bdf
to
ee902e1
Compare
f32ef00
to
6733d19
Compare
@rileyhgrant @mattsolo1 ready for another look. This is basically a full overhaul so probably best just to approach it fresh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good things in this PR, the commitment to strong types and the inclusion of tests to confirm behavior are very nice.
2 main changes requested, and one other small thing:
- I don't appear to be able to compile this on my machine, I fail with this error from
ts
.
❯ ELASTICSEARCH_USERNAME=elastic ELASTICSEARCH_PASSWORD=$ELASTICSEARCH_PASSWORD ./start.sh
/Users/rgrant/dev/work-broad/gnomad-browser/node_modules/.pnpm/ts-node@10.9.1_@types+node@20.7.0_typescript@5.2.2/node_modules/ts-node/src/index.ts:859
return new TSError(diagnosticText, diagnosticCodes, diagnostics);
^
TSError: ⨯ Unable to compile TypeScript:
src/graphql/schema.ts:28:8 - error TS1192: Module '"/Users/rgrant/dev/work-broad/gnomad-browser/graphql-api/src/graphql/resolvers/va"' has no default export.
28 import vaResolvers from './resolvers/va'
~~~~~~~~~~~
...
pnpm typecheck
also complains about this import:
❯ pnpm typecheck
> @ typecheck /Users/rgrant/dev/work-broad/gnomad-browser
> pnpm tsc --noEmit
graphql-api/src/graphql/schema.ts:28:8 - error TS1192: Module '"/Users/rgrant/dev/work-broad/gnomad-browser/graphql-api/src/graphql/resolvers/va"' has no default export.
28 import vaResolvers from './resolvers/va'
~~~~~~~~~~~
Found 1 error in graphql-api/src/graphql/schema.ts:28
ELIFECYCLE Command failed with exit code 2.
I'd guess this was mistakenly introduced when cleaning up the code to get it PR ready.
- The variant pipeline has huge swaths of the pipeline commented out, most likely to aid in speedy development of the pipeline by ensuring you're annotating on top of the previous finished variant table. Can we un comment this, and add the step to the end of the pipeline, now that it has been run based off of the previous output in your development of this feature?
The types look well defined, and I like the solution of annotating the IDs in our own pipeline rather than waiting for someone upstream to do that -- just a few clean up things on this PR to my eyes.
vrs = exome_vrs.union(genome_vrs) | ||
vrs = vrs.group_by(vrs.locus, vrs.alleles).aggregate(vrs=hl.agg.collect(vrs.vrs)) | ||
ds = ds.join(vrs) | ||
ds.describe() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this describe()
be removed now that this is production ready?
# pipeline.add_task( | ||
# name="prepare_gnomad_v4_variants", | ||
# task_function=prepare_gnomad_v4_variants, | ||
# output_path=f"{output_sub_dir}/gnomad_v4_variants_base.ht", | ||
# inputs={ | ||
# "exome_variants_path": "gs://gcp-public-data--gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht", | ||
# "genome_variants_path": "gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht", | ||
# "variants_joint_frequency_path": "gs://gcp-public-data--gnomad/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht", | ||
# }, | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this pipeline has been run with the new task, can these steps be un-commented again?
I imagine that the purpose of commenting these out was to ensure that these steps didn't run, so that you could take the old output and annotate on top of that. It appears as though that has happened. As such, could we return the pipeline to the previous, un-commented state, now with the inclusion of your additional step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, I meant to do so but apparently either forgot or accidentally reverted it
graphql-api/src/graphql/schema.ts
Outdated
@@ -25,6 +25,7 @@ import variantResolvers from './resolvers/variants' | |||
import variantFieldResolvers from './resolvers/variant-fields' | |||
import variantCooccurrenceResolvers from './resolvers/variant-cooccurrence' | |||
import cnvCoverageResolvers from './resolvers/cnv-coverage' | |||
import vaResolvers from './resolvers/va' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a typescript error from here (as ./resolvers/va
doesn't have a default export).
❯ pnpm typecheck
> @ typecheck /Users/rgrant/dev/work-broad/gnomad-browser
> pnpm tsc --noEmit
graphql-api/src/graphql/schema.ts:28:8 - error TS1192: Module '"/Users/rgrant/dev/work-broad/gnomad-browser/graphql-api/src/graphql/resolvers/va"' has no default export.
28 import vaResolvers from './resolvers/va'
~~~~~~~~~~~
Found 1 error in graphql-api/src/graphql/schema.ts:28
ELIFECYCLE Command failed with exit code 2.
Is it possible that some refactoring happened and this error got mistakenly introduced? I was going to start a local API to check the VA fields that become queryable by this PR, and got this error.
As something not directly related:
I'm not sure why the graphql-api
CI didn't run on this PR, the yaml file makes it look like it should have run.
...
pull_request:
paths:
- 'graphql-api/**'
- 'dataset-metadata/**'
- 'package.json'
- 'pnpm-lock.yaml'
- '.github/workflows/**'
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed some unused code at the very end but missed a few references to it due to negelecting to run typecheck
, fixed
000c99f
to
6439818
Compare
@rileyhgrant ready for another look here, it just needed some post-rebase cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Started a local API and queried a few variants to check out the new VA data, all was looking good to my eyes.
validate_step4_output(pipeline) | ||
# write_schemas( | ||
# [pipeline], | ||
# os.path.expanduser("~/schemas"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh nice one, os.path.expanduser()
is a neat function
@rileyhgrant couple of updates here for your review, the big items are, we now support search by VRS ID, and the HT data gets massaged into a more pleasant shape before we export to ES. If you want to check this out you can use the index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started a local API and ran some test queries against a variant.
With the assumption that the temporary indice change will be reverted in favor of aliasing the more recent variant index, and with the comment about including caids in the load if possible, lgtm!
@@ -132,6 +132,7 @@ def add_liftover_document_id(ds): | |||
"locus", | |||
"transcript_consequences.gene_id", | |||
"transcript_consequences.transcript_id", | |||
"vrs.alt.allele_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related, but I realized that caid
is still commented out a few lines above this.
Might be worth it to uncomment that before (as I think this is still before) we do the next full load of the variants, as the table has that information in it already.
31a79e2
to
9f6bea8
Compare
9f6bea8
to
ef8be18
Compare
This allows querying of VA/VRS data (see https://vrs.ga4gh.org) through the gnomAD browser API.