Add dedicated seqr sync module to db layers #869
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Seqr Sync module
This PR contains a module to manage seqr syncing between Metamist and seqr. The idea is to abstract the aspects of the seqr sync process so that the data synced to a seqr project can be highly customized / tailored.
Metamist already has an existing
seqr
db layer . This module looks to replace the existing layer with a more robust and abstracted implementation while reusing existing code where possible. A lot of the code in this module was also lifted from the sync_seqr.py script in/scripts
.To break down the module:
seqr_sync.py
The main part of the module that takes the transformed data and posts it to seqr. Any script used to sync data to seqr should instantiate an instance of this class and call thesync_dataset
methods.data_fetchers.py
Contains classesMetamistFetcher
andFileFetcher
. These classes contain methods to get data from Metamist and files respectively. The data will then need to be transformed into seqr's expected formats before being loaded.data_transformers.py
Contains theSeqrTransformer
class which converts from data formats output by Metamist into the data formats expected by seqr. e.g. processing ped sex & affected values, processing hpo terms, formatting the es-index json post, etc.config.py
The definitions and global variables that clutter the top of thesync_seqr.py
Metamist script have been put into this file for cleaner access.utils.py
Contains helper methods needed by parts of the sync process, e.g. writing the SG - PID map to the bucket, diffing sequencing groups when loading a new es-index.logging_config.py
Neatly contain the logging initialization for simple import and use.Still TODO:
generate_seqr_auth_token
,send_slack_notification