Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion question -- is it normal to take so long? #178

Open
dagarfield opened this issue Apr 11, 2023 · 3 comments
Open

Discussion question -- is it normal to take so long? #178

dagarfield opened this issue Apr 11, 2023 · 3 comments

Comments

@dagarfield
Copy link

So I am abusing this tool a bit to look not at CITE-seq barcodes, but at a TRACE-seq barcoding experiment in which my specific experiment has (empirically) about 66,000 barcodes. The result is....slower than expected. Any thoughts? As you can see, the current pace isn't really scalable....

(this worked great with our pilot, but it was many, many fewer cells -- here I've cranked it up to include the expected ~11k cells plus more for ambient correction/estimation)

% CITE-seq-Count -R1 $read1 -R2 $read2 -t output.csv -cbf 1 -cbl 16 -umif 17 -umil 28 -cells 30000 -trim 25 -o cite_out --threads 7
Counting number of reads
Started mapping
Processing 66,484,476 reads
CITE-seq-Count is running with 7 cores.
Processed 1,000,000 reads in 10.0 hours, 12.0 minutes, 4.962 seconds. Total reads: 1,000,000 in child 26029
Processed 1,000,000 reads in 10.0 hours, 14.0 minutes, 47.28 seconds. Total reads: 1,000,000 in child 26030
Processed 1,000,000 reads in 10.0 hours, 16.0 minutes, 14.01 seconds. Total reads: 1,000,000 in child 26033
Processed 1,000,000 reads in 10.0 hours, 17.0 minutes, 18.27 seconds. Total reads: 1,000,000 in child 26035
Processed 1,000,000 reads in 10.0 hours, 17.0 minutes, 48.96 seconds. Total reads: 1,000,000 in child 26032
Processed 1,000,000 reads in 10.0 hours, 18.0 minutes, 18.83 seconds. Total reads: 1,000,000 in child 26031
Processed 1,000,000 reads in 10.0 hours, 20.0 minutes, 44.28 seconds. Total reads: 1,000,000 in child 26034
Processed 1,000,000 reads in 10.0 hours, 9.0 minutes, 53.68 seconds. Total reads: 2,000,000 in child 26029
Processed 1,000,000 reads in 10.0 hours, 10.0 minutes, 52.67 seconds. Total reads: 2,000,000 in child 26030
Processed 1,000,000 reads in 10.0 hours, 14.0 minutes, 17.73 seconds. Total reads: 2,000,000 in child 26033
Processed 1,000,000 reads in 10.0 hours, 13.0 minutes, 21.22 seconds. Total reads: 2,000,000 in child 26032
Processed 1,000,000 reads in 10.0 hours, 13.0 minutes, 53.29 seconds. Total reads: 2,000,000 in child 26035
Processed 1,000,000 reads in 10.0 hours, 13.0 minutes, 51.79 seconds. Total reads: 2,000,000 in child 26031
Processed 1,000,000 reads in 10.0 hours, 15.0 minutes, 43.0 seconds. Total reads: 2,000,000 in child 26034
@dagarfield
Copy link
Author

I should probably mention that my version is conda installed (https://anaconda.org/bioconda/cite-seq-count) so v1.4.4 I think it is. The python version is 3.7.12 (as installed by mamba/conda)

@dagarfield
Copy link
Author

And there are 64k tags in that -t file...which I am starting to think is the essential issue here.

@Hoohm
Copy link
Owner

Hoohm commented Jul 22, 2023

Hello @dagarfield,
I'm guessing this would be too heavy. Have you tried to run it without cell barcode and UMI correction?
This software was not built for big datasets like this one I'm afraid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants