-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providing pre-generated clusters #6
Comments
Hi! The So install CHOIR from the dev branch, then use function
Alternately, without needing to install from the dev branch, you can apply two of CHOIR's hidden functions sequentially like this:
Let me know if you run into issues with this or the subsequent steps, since this is not the primary use-case for CHOIR. |
Thanks! I will try this and get back to you. By any chance, do you think this clustering method would work for general data as well? Or is it tailored specifically for scRNAseq data. |
Great! And yes, CHOIR is compatible with any data that is in the shape of one or more cell x feature matrices. So any single-cell data (both single- and multi-omic), including single-cell RNA-seq, ATAC-seq, proteomics data, etc. Outside of typical single-cell data, other data types that have this similar shape can also be used, though CHOIR has not been as thoroughly tested with these. For example, I've applied CHOIR to the MNIST dataset in the past (where each "cell" would be a digit sample, and the features would be the pixel values), with interesting results. |
Thanks for working through this! I've made some additional changes to the dev branch (so please re-install) to hopefully make starting with pre-generated clusters smoother. First, for running But since other inputs and parameters will not have been calculated or provided by the
I've tested this with two datasets myself, but please let me know if any other errors pop up. I'd also be curious how your final results compare to just running the entire default CHOIR algorithm with function |
After running: I can finally extract And if I remove the snn_matrix variable then I get this error again:
|
Great! Glad that worked for you.
The main parameter I'd try adjusting is the When run in full, CHOIR generates multiple dimensionality reductions as it builds out the hierarchical clustering tree. A “root” dimensionality reduction encompassing all cells is followed by multiple “subtree” dimensionality reductions that each encompass a subset of the total cells. These subsetted reductions, and their accompanying sets of highly variable features allow CHOIR to pick up on more nuanced differences between cell subtypes/states. So the lack of these subsetted reductions in your case may cause CHOIR to be more conservative and merge more clusters.
No, benchmarking can absolutely be done with real datasets, I just wouldn't use ARI for real datasets. There are certainly other methods that can be used to benchmark with real datasets, but I feel that benchmarking metrics that require a "ground truth" set of cell type labels are not ideal for real world data. You are welcome to try the simulated datasets from the paper, they're available here: https://files.corces.gladstone.org/Publications/2024_Petersen_CHOIR And yea, for those UMAP and plotting functions, you'd have to specify a dimensionality reduction that exists in your Seurat object, instead of "P0_reduction" (which is the default name given to the dimensionality reduction calculated during the I'm going to close this issue, but feel free to continue replying here if anything else comes up! |
Great. Thanks for all your help @catpetersen ! Even those 2 new clusters in the left, I am sure they could be biologically meaningful since Cytocipher discovered a similar thing in their paper. I will have to double check.
|
To keep these GitHub issue threads on topic, I'll confine my answers below to questions directly related to CHOIR. For your question related to benchmarking, I'll just point back to the pre-print, which contains a lot of my thought process around how to effectively benchmark clustering methods.
Here's an example with the default value
I'll take a look at this, but this function is probably best reserved for those who generate their dimensionality reductions using CHOIR, rather than providing pre-generated reductions.
Yes! Check out the
I'm not the best person to answer how to wrangle different types of data into a Seurat object. But you should really just need a cell x feature matrix with some sort of row and column names (e.g., 1, 2, 3..) to make this work. CHOIR is also compatible with SingleCellExperiment objects. |
Much appreciated! |
Glad to hear CHOIR is working well for you! No, I don't have any plans to add that feature. I think the easiest solution would be to manually overwrite the cluster labels with those original 3 clusters after the fact. You can then use the |
Hi @catpetersen , I finally managed to get CHOIR working on all scRNAseq clustering on my precomputed clusters. Here is what finally made them work:
Yes it takes much longer, but it correctly splits up the clusters as expected. Another observation I have is that when two clusters have equal number of data points, then it can split them using the default settings. However, when there is a class imbalance (e.g. cluster A has 600 points and another cluster B has 300 points that are neighbors and must be separate) then CHOIR fails to keep them separate under the default settings. Thats when the new settings work. I tried setting After benchmarking on a lot of precomputed clusters on scRNAseq and getting good results, I am moving forward with using CHOIR on pregenerated clusters of general data (not scRNAseq related). However, since CHOIR is made to work with Seurat scRNAseq, I am having trouble making it work like you suggested above. I am starting off with a very simple datset, the blobs dataset. My new clustering algorithm finds 8 clusters, ad-hoc merging makes them 3. But I want to use CHOIR to help merge instead. Everything is the same as scRNAseq, I have precomputed labels for each of the 1000 datapoints.
Then I do dimensionality reduction using a different PCA package since the seurat one is tailored for scRNAseq.
Find neighbors for the nearest neighbor and shared nearest neighbors graphs requied for
Simple importing clusters as discussed a few weeks ago.
Since there is no highly variable genes, I just resort to using the
Everything looks the same as that from scRNAseq, but
The error appears to be from the ranger function for the decision trees.
|
Glad you're having good results using CHOIR!
Set
If you only have 2 features, I'm really not sure CHOIR is the best tool for this dataset. Random forest classifiers are typically used with a larger feature space.
Does your input matrix have row & column names for the features & cells? If not, that might be causing the ranger error (see imbs-hl/ranger#597) |
I think the problem is that this method needs many features (columns) in general data that is not RNAseq. I tried a dataset with 18 features, it runs to 33% and it gives an error. So I guess the problem is that CHOIR needs really large datasets making it ideal for scRNA seq. |
Hello,
I am trying to apply CHOIR on my precomputed clusters from a method I developed. I am working with the PBMC3k dataset in fact.
Preprocessing is exactly followed step by step from here:
https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
After I perform my clustering, I obtain a list of cluster labels for each of the 2638 cells in the order they appear in the seurat object. I do not have any information about their hierarchical trees, because my method does not use trees. My question is: how do I provide this cluster_tree dataframe of my precomputed cluster labels? All I have is my cluster labels and I want to use CHOIR to correctly merge them. In scSHC and Cytocipher I can just simply pass my list of cluster labels to their functions and pop out a final answer. But I am not sure how to do the same with CHOIR. I could not find a vignette or example.
https://www.choirclustering.com/articles/CHOIR.html#providing-pre-generated-clusters
The text was updated successfully, but these errors were encountered: