Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sites table generation #692

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from
Draft

Sites table generation #692

wants to merge 32 commits into from

Conversation

michael-harper
Copy link
Contributor

To generate our final callset we will need to generate a list of high quality sites specific to our datasets (both exomes and genomes) that can be used to infer relatedness and ancestry. gnomAD v4 has performed this and while we cannot use their final sites table as a resource (due to the nature of our cohorts and ultimate goal of finding novel sites in certain populations) we can follow their process of developing a high quality sites table.

A couple of points:

  • I'm unsure if this should exist in the prod-pipes repo or the references repo mainly because I'm unsure if it's possible to run the script from within that repo
  • IntervalQC should be performed prior to this step to ensure high quality regions of exome data are used for sites analysis
  • The script used to generate the sites table uses the function get_qc_mt which provides all of the parameters and supplementary Hail functions to implement the filters.
    • @KatalinaBobowik has done this previously based on gnomAD v3 parameters though only using genomes from HGDP and 1KG. The current approach will be using genomes and exomes from our own cohorts.

@michael-harper
Copy link
Contributor Author

@katiedelange

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant