Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way we can add in our own reference as training data #46

Open
Sudheshna30 opened this issue Jun 28, 2024 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@Sudheshna30
Copy link

Description of feature

Adding our own reference would be a great way to run this pipeline.

@Sudheshna30 Sudheshna30 added the enhancement New feature or request label Jun 28, 2024
@marcovarrone
Copy link
Collaborator

Hi @Sudheshna30, what do you mean exactly by using our own reference?

Do you mean for generating the embedding or for clustering samples?
For the first one you can simply train your own scVI or trVAE model using the official tutorials of the packages.

For fitting the clustering model on a dataset and then clustering on a different dataset you can use
You can use tl.Cluster.fit on the first dataset and then tl.Cluster.predict on the other one.

I hope I understood the question, let me know if you meant something else :)

@Sudheshna30
Copy link
Author

Sudheshna30 commented Jul 2, 2024 via email

@marcovarrone
Copy link
Collaborator

Hi @Sudheshna30. If I can ask, what was not good in your results for the pancreatic CosMx? You are actually the second person who told me that CellCharter didn't work so well on pancreatic tissue, so I am curious about whether there is something specific in the tissue structure that requires different parameters for CellCharter.
If you want to show me some images to better understand the problem you can send me an email at marco.varrone@unil.ch.

Regarding fit and predict you can look at the CosMx tutorial . There I used them on the same dataset but nothing prevents you from processing the two datasets in the same way and using fit on the reference dataset and predict based on your dataset. In the tutorial I used ClusterAutoK rather than Cluster to estimate the best number of clusters (but it requires more runtime, so if you are just exploring I would suggest you to use Cluster).

So basically what you would do is:

  • Compute the spatial neighbors for both datasets
  • Train a scVI model on the reference dataset and extract the features for both datasets
  • Run cc.tl.Cluster.fit on the reference dataset
  • Run cc.tl.Cluster.predict on your dataset

However, this implies that there are no strong batch effects between the reference dataset and your datasets, otherwise the features from scVI trained on the reference dataset will not work well for your dataset.
If there are batch effects, you may want concatenate the two dataset and set adata.obs['dataset'] equal to the dataset associated to every cell, and then train a scVI model on both datasets together using batch_key='dataset'.
Then do cc.Cluster.fit at this point on both datasets together and cc.Cluster.predict on your dataset.

It may be a bit of work and not necessarily help a lot unless the reference dataset is quite similar to your dataset, so as I mentioned at the beginning I suggest you to share with me why you think the results are not good, so that we can figure out together how to improve it instead of using a reference dataset.

@marcovarrone
Copy link
Collaborator

After interacting privately I want to clarify a common misconception that I am seeing people have with CellCharter, even though it should be clear by reading the paper.

CellCharter has not been initially designed to find cell types but to find cell niches, which are areas with the same combination of cell types and cell states. You can identify cell types by running it with n_layers=0 and it could be convenient because it's very scalable, but this is not its original purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants