Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VA/VRS GraphQL API #1426

Merged
merged 3 commits into from
Jun 24, 2024
Merged

Add VA/VRS GraphQL API #1426

merged 3 commits into from
Jun 24, 2024

Conversation

phildarnowsky-broad
Copy link
Contributor

This allows querying of VA/VRS data (see https://vrs.ga4gh.org) through the gnomAD browser API.

@phildarnowsky-broad
Copy link
Contributor Author

closes #1132

Copy link
Contributor

@rileyhgrant rileyhgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one teeny tiny nitpick, and one request:

Since there's a new config file that should be present for successful deployments, can we add to our CONTRIBUTING.md to include this information? Maybe a la these lines, or in some other way that seems fitting?

I'm not entirely sure how places normally handle this, but it would be good to get ahead of the issue of someone deploying without this file on accident.

.gitignore Outdated
@@ -29,3 +29,5 @@ hail-*.log

# vscode settings
**/.vscode/

graphql-api/unversioned_config.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (non-blocking): It looks like the other env/config type files bundled into various dockerfiles starting at line 16, maybe this is a better fit to put in there?

@phildarnowsky-broad
Copy link
Contributor Author

@rileyhgrant @mattsolo1 ready for another look. This is basically a full overhaul so probably best just to approach it fresh.

@rileyhgrant rileyhgrant self-requested a review June 4, 2024 02:08
Copy link
Contributor

@rileyhgrant rileyhgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good things in this PR, the commitment to strong types and the inclusion of tests to confirm behavior are very nice.

2 main changes requested, and one other small thing:

  1. I don't appear to be able to compile this on my machine, I fail with this error from ts.
❯ ELASTICSEARCH_USERNAME=elastic ELASTICSEARCH_PASSWORD=$ELASTICSEARCH_PASSWORD ./start.sh
/Users/rgrant/dev/work-broad/gnomad-browser/node_modules/.pnpm/ts-node@10.9.1_@types+node@20.7.0_typescript@5.2.2/node_modules/ts-node/src/index.ts:859
    return new TSError(diagnosticText, diagnosticCodes, diagnostics);
           ^
TSError: ⨯ Unable to compile TypeScript:
src/graphql/schema.ts:28:8 - error TS1192: Module '"/Users/rgrant/dev/work-broad/gnomad-browser/graphql-api/src/graphql/resolvers/va"' has no default export.

28 import vaResolvers from './resolvers/va'
          ~~~~~~~~~~~
...

pnpm typecheck also complains about this import:

❯ pnpm typecheck

> @ typecheck /Users/rgrant/dev/work-broad/gnomad-browser
> pnpm tsc --noEmit

graphql-api/src/graphql/schema.ts:28:8 - error TS1192: Module '"/Users/rgrant/dev/work-broad/gnomad-browser/graphql-api/src/graphql/resolvers/va"' has no default export.

28 import vaResolvers from './resolvers/va'
          ~~~~~~~~~~~


Found 1 error in graphql-api/src/graphql/schema.ts:28

ELIFECYCLE  Command failed with exit code 2.

I'd guess this was mistakenly introduced when cleaning up the code to get it PR ready.

  1. The variant pipeline has huge swaths of the pipeline commented out, most likely to aid in speedy development of the pipeline by ensuring you're annotating on top of the previous finished variant table. Can we un comment this, and add the step to the end of the pipeline, now that it has been run based off of the previous output in your development of this feature?

The types look well defined, and I like the solution of annotating the IDs in our own pipeline rather than waiting for someone upstream to do that -- just a few clean up things on this PR to my eyes.

vrs = exome_vrs.union(genome_vrs)
vrs = vrs.group_by(vrs.locus, vrs.alleles).aggregate(vrs=hl.agg.collect(vrs.vrs))
ds = ds.join(vrs)
ds.describe()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this describe() be removed now that this is production ready?

Comment on lines 51 to 60
# pipeline.add_task(
# name="prepare_gnomad_v4_variants",
# task_function=prepare_gnomad_v4_variants,
# output_path=f"{output_sub_dir}/gnomad_v4_variants_base.ht",
# inputs={
# "exome_variants_path": "gs://gcp-public-data--gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht",
# "genome_variants_path": "gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht",
# "variants_joint_frequency_path": "gs://gcp-public-data--gnomad/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht",
# },
# )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this pipeline has been run with the new task, can these steps be un-commented again?

I imagine that the purpose of commenting these out was to ensure that these steps didn't run, so that you could take the old output and annotate on top of that. It appears as though that has happened. As such, could we return the pipeline to the previous, un-commented state, now with the inclusion of your additional step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, I meant to do so but apparently either forgot or accidentally reverted it

@@ -25,6 +25,7 @@ import variantResolvers from './resolvers/variants'
import variantFieldResolvers from './resolvers/variant-fields'
import variantCooccurrenceResolvers from './resolvers/variant-cooccurrence'
import cnvCoverageResolvers from './resolvers/cnv-coverage'
import vaResolvers from './resolvers/va'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a typescript error from here (as ./resolvers/va doesn't have a default export).

❯ pnpm typecheck

> @ typecheck /Users/rgrant/dev/work-broad/gnomad-browser
> pnpm tsc --noEmit

graphql-api/src/graphql/schema.ts:28:8 - error TS1192: Module '"/Users/rgrant/dev/work-broad/gnomad-browser/graphql-api/src/graphql/resolvers/va"' has no default export.

28 import vaResolvers from './resolvers/va'
          ~~~~~~~~~~~


Found 1 error in graphql-api/src/graphql/schema.ts:28

ELIFECYCLE  Command failed with exit code 2.

Is it possible that some refactoring happened and this error got mistakenly introduced? I was going to start a local API to check the VA fields that become queryable by this PR, and got this error.


As something not directly related:

I'm not sure why the graphql-api CI didn't run on this PR, the yaml file makes it look like it should have run.

...
 pull_request:
    paths:
      - 'graphql-api/**'
      - 'dataset-metadata/**'
      - 'package.json'
      - 'pnpm-lock.yaml'
      - '.github/workflows/**'
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some unused code at the very end but missed a few references to it due to negelecting to run typecheck, fixed

@rileyhgrant rileyhgrant linked an issue Jun 4, 2024 that may be closed by this pull request
@phildarnowsky-broad
Copy link
Contributor Author

@rileyhgrant ready for another look here, it just needed some post-rebase cleanup

Copy link
Contributor

@rileyhgrant rileyhgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Started a local API and queried a few variants to check out the new VA data, all was looking good to my eyes.

validate_step4_output(pipeline)
# write_schemas(
# [pipeline],
# os.path.expanduser("~/schemas"),
Copy link
Contributor

@rileyhgrant rileyhgrant Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh nice one, os.path.expanduser() is a neat function

@phildarnowsky-broad
Copy link
Contributor Author

@rileyhgrant couple of updates here for your review, the big items are, we now support search by VRS ID, and the HT data gets massaged into a more pleasant shape before we export to ES. If you want to check this out you can use the index gnomad_v4_variants-2024-06-13--14-27; note that that is downsampled to only 100 variants, though. LMK if you want help finding variants to check this against.

@rileyhgrant rileyhgrant self-requested a review June 17, 2024 15:31
Copy link
Contributor

@rileyhgrant rileyhgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started a local API and ran some test queries against a variant.

With the assumption that the temporary indice change will be reverted in favor of aliasing the more recent variant index, and with the comment about including caids in the load if possible, lgtm!

@@ -132,6 +132,7 @@ def add_liftover_document_id(ds):
"locus",
"transcript_consequences.gene_id",
"transcript_consequences.transcript_id",
"vrs.alt.allele_id",
Copy link
Contributor

@rileyhgrant rileyhgrant Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related, but I realized that caid is still commented out a few lines above this.

Might be worth it to uncomment that before (as I think this is still before) we do the next full load of the variants, as the table has that information in it already.

@phildarnowsky-broad phildarnowsky-broad merged commit f451fb8 into main Jun 24, 2024
6 checks passed
@mattsolo1 mattsolo1 mentioned this pull request Jul 8, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose VA API via GraphQL
2 participants