Guidance for how to select optimal max_iter when running cellcharter.tl.ClusterAutoK #53

paularstrpo · 2024-10-21T14:14:51Z

Description of feature

Hi. Thanks for making this tool, it's really useful!

I was wondering if you can give any recommendations for what to set as the minimum number of maximum iterations depending on how many cells/samples a user might have while balancing computational efficiency.

I have 810k cells in my MERFISH data with around 110 samples, and my k stability plot at max_iter=50 is wildly different than when I set max_iter=1000. However, at max_iter=1000, the autok.stability matrix has 90 columns, not 1000. Does that mean it took 90 iterations to reach convergence_tol?

Is there a way to estimate a reasonable number of iterations to use when running tol.ClusterAutoK to maximize stability/convergence while balancing iterations? I suspect it depends on how many cells/samples are being run, but some guidance on how to select the optimal number of iterations would be greatly appreciated.

Thank you!!

marcovarrone · 2024-10-22T15:17:09Z

Hi @paularstrpo,

The convergence of the stability depends a lot on the dataset.
For this reason, I implemented the convergence_tol parameter. It basically checks the values of stability between adjacent iterations, and if the curve doesn't change by a certain relative amount (computed using Mean Average Percentage Error), it stops.

If at the end of the execution, you have 90 columns, it's very likely that you reached the convergence after 90 iterations. In theory, you should also see a warning when convergence is reached.
The reason why iter=50 is very different from iter=90 is exactly why the method doesn't stop earlier :) Instead, I assume that iter=89 and iter=90 will be quite similar.
If not, probably you should decrease the convergence_tol (for example, from 1e-2 to 1e-3).

In general, I would suggest relying mainly on convergence_tol, rather than max_iter, as it's much more dataset-independent. For example, in all of my tests, I always reached convergence in less than 5-10 iterations, which is quite different than your 90.
On the other hand, with convergence_tol you know that if the process stopped automatically at 90 iterations, the curve didn't change much between consecutive iterations.

I hope I have answered your question!

paularstrpo added the enhancement New feature or request label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance for how to select optimal max_iter when running cellcharter.tl.ClusterAutoK #53

Guidance for how to select optimal max_iter when running cellcharter.tl.ClusterAutoK #53

paularstrpo commented Oct 21, 2024

marcovarrone commented Oct 22, 2024

Guidance for how to select optimal max_iter when running cellcharter.tl.ClusterAutoK #53

Guidance for how to select optimal max_iter when running cellcharter.tl.ClusterAutoK #53

Comments

paularstrpo commented Oct 21, 2024

Description of feature

marcovarrone commented Oct 22, 2024