-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pathfinding approaches to ML-Guided knowledge exploration #9
Comments
TL;DR (earlier/longer notes from which the above summary was condensed) To date most CQ notebooks have explored relatively simple retrieval and faceting type queries. But the real utility of the Translator will be supporting more open-ended, exploration of data , enabling users to populate a blackboard with knowledge that drives serendipitous discovery and novel insight. Given the graph-based nature of much of the knowledge in the Translator system, 'pathfinding' operations are one potentially useful approach for this type of exploration and blackboard construction. Here, the system would return paths through the graph connecting entities of interest, and allow users to filter and facet these paths to hone in on those representing meaningful evidence in support of their larger question or use case. For example, given a set of candidate FA modifier genes, explore paths linking these genes to FA in the data to provide evidence for prioritizing/ranking these candidates, and suggesting possible mechanisms of action. Here, we have positive controls that we can start with, as ALDH2, ADH5, and TGFbeta are known modifies with established mechanisms. We will write pathfinding cypher queries that return all paths through the Monarch and SemMedDB data connecting these genes to FA, and explore requirements and approaches for refining/constraining these paths to hone in on those representing the most meaningful evidence. Example: Return all paths between Aldh2 and FA -> filter and facet and expand results to identify most meaningful paths that support this known fact, and might have led to its hypothesis before its official discovery Using these controls for pilot experiments, we can think about the types of evidence that would support inference to these answers, the types of data that would support these inferences, how the data would have to be modeled, queried, and presented to users to support such inference, and the tooling required to support these tasks.
Ultimately, we hope that this exercise will inform requirements for many aspects of Translator development:
Tasks:
|
Regarding Task 5 (but also probably 3 and 4), I'm thinking a machine learning approach may be useful here. How that would work could be similar to how prediction in drug repurposing works, where by using a set of known drug-disease pairs, the paths through the network connecting these known "true" connections are selected for and weighted more strongly than edge types that don't (or are less useful) for connecting these. A technique like that could be applied here to refine queries and try to select more meaningful paths. I or @veleritas could look into this more deeply... |
Use an integrated neo4j database to explore how human and machine learning agents might collaborate to extract evidence from knowledge graphs to derive predictions and mechanistic hypotheses.
Tasks
Goals
Valued Expertise
The text was updated successfully, but these errors were encountered: