different results each the CITE-seq count is run #165

colin986 · 2022-03-11T21:05:04Z

Hi,

I'm getting a different output each time CITE-seq count is run. My whitelist and parameters do not each each time.

Is this expected? Is there anyway to control this in terms of reproducibility (i.e. setting a seed) ?

Thanks,
Colin

Hoohm · 2022-03-11T22:21:10Z

Hey Colin, This is really strange as there is no randomness in the code, it should pretty much be the exact same output each time for the same parameters. Could you show me some examples?

…

On Fri, 11 Mar 2022, 22:05 colin986, ***@***.***> wrote: Hi, I'm getting a different output each time CITE-seq count is run. My whitelist and parameters do not each each time. Is this expected? Is there anyway to control this in terms of reproducibility (i.e. setting a seed) ? Thanks, Colin — Reply to this email directly, view it on GitHub <#165>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJVO2CYBEN5O25JQ467P33U7OYQXANCNFSM5QQV3YFA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

colin986 · 2022-03-14T17:44:26Z

Hi Hoohm,

Thanks for coming back to me.

You were right. The CITE-seq count output is the same each time.

The variation in the result seems to come from the HTODemux function in Seurat when using clara clustering option (When using kmeans clustering the output is consistent). The result changes each time I run CITE-Seq count. The function has an option to set the seed, but I've still found that the output changes each time. So what I mean here is that HTODemux is reproducible with the same CITE-seq count output. CITE-seq count is also reproducible. However, when I re-run CITE-Seq count and HTODemux I get a different result - I don't understand why this is happening.

I know HTODemux draws 100 samples from the dataset for clara clustering - I wonder if during the CITE-Seq count the samples, while the same, the data are written in a a different order and the 100 samples are drawn in a different order - and that gives rise to variability in the output?

Thanks,
Colin

johnyaku · 2024-09-05T01:15:22Z

I can verify "different" CITE-seq-count results on different runs.

The difference is in the column order, not in the actual content of the count matrices. Reordering the columns to match each other (or the whitelist) results in identical matrices.

I haven't been able to pin down the source of the variation. I can't see any random functions. Initially I suspected parallelization, with different chunks finishing in different orders depending on the run, but the problem persists even with only one thread.

This difference in ordering produces different assignments from Seurat::HTODemux() when kfunc='clara' (the default). In the good quality dataset where I have been testing this, assignments are different for about 5% of total barcodes. In a low or even medium quality dataset I suspect the variability might be worse.

I haven't looked at why, but @colin986's suggestion that different ordering might produce different sampling (even with the same seed) seems plausible to me.

Setting kfunc = 'kmeans' results in consist demux assignments, despite the difference in ordering.

For now I am reordering CITE-seq-count outputs based on the whitelist, and also using kmeans rather than clara.

Hoohm · 2024-09-05T06:32:04Z

Thank you for looking into this. I was afraid there was a bug I missed in my code but the downstream issues seem more plausible. Btw, if you are interested to test it out, I have a beta branch rewritten in Polars that is available. Some inputs names have changed but it should overall decrease memory usage and improve speeds.

johnyaku · 2024-09-11T01:39:38Z

Thanks @Hoohm. I'll check out the beta branch when I get a moment.

I'm not sure if it is worth making a feature request, but I do think it would be helpful if CITE-seq-count produced identical output for identical input (including sort order).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different results each the CITE-seq count is run #165

different results each the CITE-seq count is run #165

colin986 commented Mar 11, 2022

Hoohm commented Mar 11, 2022 via email

colin986 commented Mar 14, 2022

johnyaku commented Sep 5, 2024 •

edited

Loading

Hoohm commented Sep 5, 2024 •

edited

Loading

johnyaku commented Sep 11, 2024

different results each the CITE-seq count is run #165

different results each the CITE-seq count is run #165

Comments

colin986 commented Mar 11, 2022

Hoohm commented Mar 11, 2022 via email

colin986 commented Mar 14, 2022

johnyaku commented Sep 5, 2024 • edited Loading

Hoohm commented Sep 5, 2024 • edited Loading

johnyaku commented Sep 11, 2024

johnyaku commented Sep 5, 2024 •

edited

Loading

Hoohm commented Sep 5, 2024 •

edited

Loading