-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Pugh lab main [just to compare] #17
base: master
Are you sure you want to change the base?
Conversation
…insensitive match
@@ -613,7 +613,8 @@ def get_clinical_ids_from_sample_ids(self) -> Dict[ClinicalID, str]: | |||
self._clinical_data.items()} | |||
else: | |||
return {clinical_id: clinical_data['SAMPLE_ID'] for clinical_id, clinical_data in | |||
self._clinical_data.items() if clinical_data['VITAL_STATUS'] == 'alive'} | |||
self._clinical_data.items() if (clinical_data['VITAL_STATUS'] is not None and clinical_data['VITAL_STATUS'].lower() == 'alive')} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, we should have that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new ME version moves this logic around a bit. Now it ignores users with VITAL_STATUS "deceased," rather than including only users with VITAL_STATUS "alive" (i.e. everyone is alive by default).
# recompile the query to be case insensitive | ||
# convert the $in into a list of $or conditions so we can use $regex inside a $in | ||
# mongo has a limitation that cannot use $regex within a $in | ||
# using regex | ||
if "ONCOTREE_PRIMARY_DIAGNOSIS_NAME" in query_part.query: | ||
if "$in" in query_part.query['ONCOTREE_PRIMARY_DIAGNOSIS_NAME']: | ||
new_conditions = [ | ||
{'ONCOTREE_PRIMARY_DIAGNOSIS_NAME': {'$regex': f'^{old_query}$', '$options': 'i'}} for | ||
old_query in query_part.query['ONCOTREE_PRIMARY_DIAGNOSIS_NAME']['$in']] | ||
del query_part.query['ONCOTREE_PRIMARY_DIAGNOSIS_NAME'] # Remove old query from query_part | ||
query_part.query['$or'] = new_conditions # Add new conditions to query_part | ||
else: | ||
org_query = query_part.query['ONCOTREE_PRIMARY_DIAGNOSIS_NAME']; | ||
ignore_case_query = {'$regex': f'^{org_query}$', '$options': 'i'} | ||
query_part.query['ONCOTREE_PRIMARY_DIAGNOSIS_NAME'] = ignore_case_query | ||
|
||
# Exclude documents where 'ONCOTREE_PRIMARY_DIAGNOSIS_NAME' is 'NA' | ||
new_query = { | ||
'$and': [ | ||
{join_field: {'$in': list(need_new)}}, | ||
query_part.query, | ||
{'ONCOTREE_PRIMARY_DIAGNOSIS_NAME': {'$ne': 'NA'}} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonhansel we should discuss
@@ -34,7 +34,7 @@ def get_genomic_details(genomic_doc: Dict, trial_match: TrialMatch): | |||
alteration.append(hugo_symbol) | |||
|
|||
# add mutation | |||
if true_protein is not None: | |||
if true_protein is not None and true_protein: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. @jasonhansel do we already have that fixed or 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to incorporate this change without issues.
@@ -3529,6 +3530,7 @@ | |||
"Papillary Glioneuronal Tumor", | |||
"Ewing Sarcoma", | |||
"Renal Angiomyolipoma", | |||
"Large Cell Neuroendocrine Carcinoma" | |||
"Large Cell Neuroendocrine Carcinoma", | |||
"Breast Invasive Carcinoma, NOS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate with the one above?
@@ -315,6 +315,7 @@ | |||
"Invasive Breast Carcinoma", | |||
"Phyllodes Tumor of the Breast", | |||
"Breast Invasive Carcinosarcoma, NOS", | |||
"Breast Invasive Carcinoma, NOS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is generated programmatically from the OncoTree data. We should check with PMATCH to see why they want something other than what OncoTree provides; the config JSON file allows you to specify a separate path within this folder to use for this mapping.
@@ -235,6 +235,9 @@ def create_trial_matches(self, trial_match: TrialMatch, new_trial_match: Dict) - | |||
query = trial_match.match_reason.extract_raw_query() | |||
clinical_doc = self.cache.docs[trial_match.match_reason.clinical_id] | |||
new_trial_match.update({'cancer_type_match': get_cancer_type_match(trial_match)}) | |||
# Add in additional fields we need for frontend | |||
if ('arm_description' in trial_match.match_clause_data.match_clause_additional_attributes): | |||
new_trial_match.update({'arm_description': trial_match.match_clause_data.match_clause_additional_attributes['arm_description']}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to incorporate this change without issues.
elif trial_value.upper() == 'FALSE': | ||
return QueryTransformerResult({sample_key: 'Negative'}, False) | ||
else: | ||
return QueryTransformerResult({sample_key: trial_value}, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to incorporate this change without issues.
@@ -39,7 +39,8 @@ def main(run_args): | |||
me.update_all_matches() | |||
|
|||
if run_args.csv_output: | |||
me.create_output_csv() | |||
from matchengine.internals.utilities.output import create_output_csv | |||
create_output_csv(me) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest ME version fixes this.
@@ -71,7 +72,7 @@ def main(run_args): | |||
subp_p = subp.add_parser('load', help='Sets up your MongoDB for matching.') | |||
subp_p.add_argument('-t', dest='trial', default=None, help=param_trials_help) | |||
subp_p.add_argument('-c', dest='clinical', default=None, help=param_clinical_help) | |||
subp_p.add_argument('-g', dest='extended_attributes', default=None, help=param_genomic_help) | |||
subp_p.add_argument('-g', dest='genomic', default=None, help=param_genomic_help) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we can fix this, we should be discouraging using this "loading" functionality for anything other than trials, in favor of having users load data into MongoDB directly using e.g. an ETL process of some sort.
@@ -71,7 +72,7 @@ def main(run_args): | |||
subp_p = subp.add_parser('load', help='Sets up your MongoDB for matching.') | |||
subp_p.add_argument('-t', dest='trial', default=None, help=param_trials_help) | |||
subp_p.add_argument('-c', dest='clinical', default=None, help=param_clinical_help) | |||
subp_p.add_argument('-g', dest='extended_attributes', default=None, help=param_genomic_help) | |||
subp_p.add_argument('-g', dest='genomic', default=None, help=param_genomic_help) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we can fix this, we should be discouraging using this "loading" functionality for anything other than trials, in favor of having users load data into MongoDB directly using e.g. an ETL process of some sort.
# if isinstance(identifier, ObjectId) or identifier is None: | ||
# pass | ||
# else: | ||
# sort_array.append(int(identifier.replace("-", ""))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version of ME will move this into TrialMatchDocumentCreator where it can be modified more easily.
if matchengine.report_all_clinical_reasons or \ | ||
keys.issubset(matchengine.match_criteria_transform.valid_clinical_reasons): | ||
should_add_reason = True | ||
if should_add_reason: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version of ME moves this code into TrialMatchDocumentCreator where it can be modified more easily (though it seems like the change here may just be a refactor).
@@ -72,7 +73,30 @@ async def execute_clinical_queries(matchengine: MatchEngine, | |||
matchengine.cache.in_process.setdefault(query_hash, set()).update(need_new) | |||
|
|||
if need_new: | |||
new_query = {'$and': [{join_field: {'$in': list(need_new)}}, query_part.query]} | |||
# recompile the query to be case insensitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we don't want to incorporate. There may be ways of doing this through the query transformers, but it will never be performant because of the cost of regex lookups. Ideally this could be fixed at the data ingestion layer by (say) lowercasing all inputs and then searching for lowercase cancer types or by making cancer types match oncotree.
@@ -172,7 +172,7 @@ def load_file(db_rw, filetype: str, path: str, collection: str): | |||
db_rw[collection].insert_one(row) | |||
else: | |||
raw_file_data = file_handle.read() | |||
if filetype == 'yaml': | |||
if filetype == 'yml': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a bugfix that we can incorporate.
@@ -3,7 +3,7 @@ | |||
"trial_identifier": "protocol_no", | |||
"match_trial_link_id": "protocol_no", | |||
"trial_status_key": { | |||
"key_name": null, | |||
"key_name": "summary", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the changes to this file can be incorporated without issues. The one exception is the "trial_status_key," which determines how we decide if trials are open or closed; that may be something we need to keep separate for PMATCH.
add value match for wildtype when recording result
CTM-289: fix structural variation matching without report date
No description provided.