Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance for how to select optimal max_iter when running cellcharter.tl.ClusterAutoK #53

Open
paularstrpo opened this issue Oct 21, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@paularstrpo
Copy link

Description of feature

Hi. Thanks for making this tool, it's really useful!

I was wondering if you can give any recommendations for what to set as the minimum number of maximum iterations depending on how many cells/samples a user might have while balancing computational efficiency.

I have 810k cells in my MERFISH data with around 110 samples, and my k stability plot at max_iter=50 is wildly different than when I set max_iter=1000. However, at max_iter=1000, the autok.stability matrix has 90 columns, not 1000. Does that mean it took 90 iterations to reach convergence_tol?

Is there a way to estimate a reasonable number of iterations to use when running tol.ClusterAutoK to maximize stability/convergence while balancing iterations? I suspect it depends on how many cells/samples are being run, but some guidance on how to select the optimal number of iterations would be greatly appreciated.

Thank you!!

@paularstrpo paularstrpo added the enhancement New feature or request label Oct 21, 2024
@marcovarrone
Copy link
Collaborator

Hi @paularstrpo,

The convergence of the stability depends a lot on the dataset.
For this reason, I implemented the convergence_tol parameter. It basically checks the values of stability between adjacent iterations, and if the curve doesn't change by a certain relative amount (computed using Mean Average Percentage Error), it stops.

If at the end of the execution, you have 90 columns, it's very likely that you reached the convergence after 90 iterations. In theory, you should also see a warning when convergence is reached.
The reason why iter=50 is very different from iter=90 is exactly why the method doesn't stop earlier :) Instead, I assume that iter=89 and iter=90 will be quite similar.
If not, probably you should decrease the convergence_tol (for example, from 1e-2 to 1e-3).

In general, I would suggest relying mainly on convergence_tol, rather than max_iter, as it's much more dataset-independent. For example, in all of my tests, I always reached convergence in less than 5-10 iterations, which is quite different than your 90.
On the other hand, with convergence_tol you know that if the process stopped automatically at 90 iterations, the curve didn't change much between consecutive iterations.

I hope I have answered your question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants