-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding annotation of potential doublets to anndata obs #193
Conversation
e79fbf4
to
e08516d
Compare
…o_split_potential_doublets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some comments to take into account, but approving to not delay a post review merge once they are answered or tackled.
@@ -433,4 +437,12 @@ def anndata_metrics(adata: AnnData) -> AnnotateAnndataStatistics: | |||
if "doublet_size_threshold" in adata.uns: | |||
metrics["doublet_size_threshold"] = adata.uns["doublet_size_threshold"] | |||
|
|||
if "is_potential_doublet" in adata.obs: | |||
metrics["fraction_potential_doublets"] = adata.obs[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no is_potential_doublet
here, what will it happen when run mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a key error, that's why we check for it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when adata.obs["is_potential_doublet"]
returns empty and we called mean()
- what happens then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only run the mean() function if "is_potential_doublet" exists as a column in adata.obs. Otherwise that part is not run and "fraction_potential_doublets" remains the default value which is 0 right now but following the discussion we're going to change it to None.
metrics["fraction_potential_doublets"] = adata.obs[ | ||
"is_potential_doublet" | ||
].mean() | ||
metrics["n_edges_to_split_potential_doublets"] = adata.obs[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with sum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some small suggestions regarding the type annotations. I think the code is clear and simple. Nice job!
Co-authored-by: Johan Dahlberg <johan.dahlberg@pixelgen.tech>
Co-authored-by: Johan Dahlberg <johan.dahlberg@pixelgen.tech>
Co-authored-by: Johan Dahlberg <johan.dahlberg@pixelgen.tech>
Co-authored-by: Johan Dahlberg <johan.dahlberg@pixelgen.tech>
…l_doublets to None
During the graph step, we split some components that appear to be technical multiplets through community detection. However we set parameters in a relatively conservative way so to avoid breaking up cells into multiple components. In the annotation step we are adding a less conservative parametrization to mark "potential doublets".
What is added:
Fixes: EXE-2025
Type of change
How Has This Been Tested?
The unit tests.
PR checklist: