Is there a way we can add in our own reference as training data #46

Sudheshna30 · 2024-06-28T14:56:17Z

Description of feature

Adding our own reference would be a great way to run this pipeline.

marcovarrone · 2024-07-02T07:36:58Z

Hi @Sudheshna30, what do you mean exactly by using our own reference?

Do you mean for generating the embedding or for clustering samples?
For the first one you can simply train your own scVI or trVAE model using the official tutorials of the packages.

For fitting the clustering model on a dataset and then clustering on a different dataset you can use
You can use tl.Cluster.fit on the first dataset and then tl.Cluster.predict on the other one.

I hope I understood the question, let me know if you meant something else :)

Sudheshna30 · 2024-07-02T18:55:52Z

Thank you for responding Marco! I really appreciate it! Im interested in the second method of clustering model on a reference dataset and apply that knowledge to the actual dataset. We tried with the ceelcharter on our pancreatic cosmx dataset and didn't see good results of clustering so Im looking into see if we can actually train the model on a reference dataset and use that to cluster the original dataset. can you help me with an example on how to apply tl.cluster.fit and tl.Cluster.predict <https://cellcharter.readthedocs.io/en/latest/generated/cellcharter.tl.Cluster.html#cellcharter.tl.Cluster.predict> ? Best

…

On Tue, Jul 2, 2024 at 3:37 AM Marco ***@***.***> wrote: Hi @Sudheshna30 <https://github.com/Sudheshna30>, what do you mean exactly by using our own reference? Do you mean for generating the embedding or for clustering samples? For the first one you can simply train your own scVI or trVAE model using the official tutorials of the packages. For fitting the clustering model on a dataset and then clustering on a different dataset you can use You can use tl.Cluster.fit <https://cellcharter.readthedocs.io/en/latest/generated/cellcharter.tl.Cluster.html#cellcharter.tl.Cluster.fit> on the first dataset and then tl.Cluster.predict <https://cellcharter.readthedocs.io/en/latest/generated/cellcharter.tl.Cluster.html#cellcharter.tl.Cluster.predict> on the other one. I hope I understood the question, let me know if you meant something else :) — Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZC5HITY2TBJKX347C2JFHTZKJKC7AVCNFSM6AAAAABKCB2EX6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBSGE4DEMZQG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

marcovarrone · 2024-07-03T05:50:53Z

Hi @Sudheshna30. If I can ask, what was not good in your results for the pancreatic CosMx? You are actually the second person who told me that CellCharter didn't work so well on pancreatic tissue, so I am curious about whether there is something specific in the tissue structure that requires different parameters for CellCharter.
If you want to show me some images to better understand the problem you can send me an email at marco.varrone@unil.ch.

Regarding fit and predict you can look at the CosMx tutorial . There I used them on the same dataset but nothing prevents you from processing the two datasets in the same way and using fit on the reference dataset and predict based on your dataset. In the tutorial I used ClusterAutoK rather than Cluster to estimate the best number of clusters (but it requires more runtime, so if you are just exploring I would suggest you to use Cluster).

So basically what you would do is:

Compute the spatial neighbors for both datasets
Train a scVI model on the reference dataset and extract the features for both datasets
Run cc.tl.Cluster.fit on the reference dataset
Run cc.tl.Cluster.predict on your dataset

However, this implies that there are no strong batch effects between the reference dataset and your datasets, otherwise the features from scVI trained on the reference dataset will not work well for your dataset.
If there are batch effects, you may want concatenate the two dataset and set adata.obs['dataset'] equal to the dataset associated to every cell, and then train a scVI model on both datasets together using batch_key='dataset'.
Then do cc.Cluster.fit at this point on both datasets together and cc.Cluster.predict on your dataset.

It may be a bit of work and not necessarily help a lot unless the reference dataset is quite similar to your dataset, so as I mentioned at the beginning I suggest you to share with me why you think the results are not good, so that we can figure out together how to improve it instead of using a reference dataset.

marcovarrone · 2024-07-11T13:08:22Z

After interacting privately I want to clarify a common misconception that I am seeing people have with CellCharter, even though it should be clear by reading the paper.

CellCharter has not been initially designed to find cell types but to find cell niches, which are areas with the same combination of cell types and cell states. You can identify cell types by running it with n_layers=0 and it could be convenient because it's very scalable, but this is not its original purpose.

Sudheshna30 added the enhancement New feature or request label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way we can add in our own reference as training data #46

Is there a way we can add in our own reference as training data #46

Sudheshna30 commented Jun 28, 2024

marcovarrone commented Jul 2, 2024

Sudheshna30 commented Jul 2, 2024 via email

marcovarrone commented Jul 3, 2024

marcovarrone commented Jul 11, 2024

Is there a way we can add in our own reference as training data #46

Is there a way we can add in our own reference as training data #46

Comments

Sudheshna30 commented Jun 28, 2024

Description of feature

marcovarrone commented Jul 2, 2024

Sudheshna30 commented Jul 2, 2024 via email

marcovarrone commented Jul 3, 2024

marcovarrone commented Jul 11, 2024