Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow paired reads mode for transcriptome #31

Open
siddharthab opened this issue Sep 3, 2024 · 1 comment · May be fixed by #32
Open

Very slow paired reads mode for transcriptome #31

siddharthab opened this issue Sep 3, 2024 · 1 comment · May be fixed by #32

Comments

@siddharthab
Copy link

Hi!

I am trying to make UMICollapse the default tool in one of the popular RNAseq analysis pipelines -- nf-core/rnaseq#1087.

Not sure if this is covered by #5 already, but when using paired reads aligned to the human transcriptome, it seems like UMICollapse is 20x slower when compared to umi-tools. UMICollapse takes between 9-10 hours for the BAM files we are considering, whereas umi-tools takes ~30 minutes. The slowness is present in both two-pass and single pass modes.

I have not gone through how UMICollapse works, so I do not have an opinion on whether this is expected or not. If it is expected, some commentary on this in the README would be appreciated.

I have made some test data available in Google Drive. You will notice that the BAM file has 44319354 read pairs with 8 bp UMIs.

Thank you for continuing to follow up on your work from a long time ago.

@siddharthab
Copy link
Author

On profiling, it seems like 98% of the CPU is spent in write.

Screenshot 2024-09-03 at 9 44 06 PM

@siddharthab siddharthab linked a pull request Sep 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant