Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lr-kallisto quant-tcc seg fault with bulk ONT #463

Open
sbresnahan opened this issue Sep 26, 2024 · 8 comments
Open

lr-kallisto quant-tcc seg fault with bulk ONT #463

sbresnahan opened this issue Sep 26, 2024 · 8 comments

Comments

@sbresnahan
Copy link

sbresnahan commented Sep 26, 2024

Version: kallisto 0.51.1

I'm following a workflow outlined in issue 456 for using lr-kallisto with bulk ONT. kallisto bus, bustools sort, and bustools count steps complete without errors. However, the kallisto quant-tcc step is being dumped by LSF with 554689 Segmentation fault shortly after processing sample/cell N.

I'm using a kallisto index with kmer-length=63 built from transcripts pulled from the GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta and gencode v45 gtf using gffread. An index built from these transcripts with kmer-length=31 have no issues with kallisto quant using short reads.

@bound-to-love
Copy link
Collaborator

Hi, Sean, since you are processing bulk, it should only print out processing sample/cell 0; is this the case? Can you please post the full output?

@sbresnahan
Copy link
Author

sbresnahan commented Sep 27, 2024

If I run with --threads=1, it is indeed only processing sample/cell 0 before the seg fault:

[index] k-mer length: 63
[index] number of targets: 252,723
[index] number of k-mers: 157,178,936
[index] number of equivalence classes loaded from file: 327,292
[tcc] Parsing transcript-compatibility counts (TCC) file as a matrix file
[tcc] Matrix dimensions: 72 x 327,292
[quant] Running EM algorithm...
[   em] reading priors from file ONT
[quant] Processing sample/cell 0
/home/stbresnahan/.lsbatch/1727389319.16590285.shell: line 39: 55903 Segmentation fault     (core dumped) kallisto quant-tcc -t 1 --long -p ONT -f ${DIR_OUT}/flens.txt -i kallisto_index/gencode_v45 -e ${DIR_OUT}/count.ec.txt -o ${DIR_OUT}/quant-tcc ${DIR_OUT}/count.mtx

However, if I set --threads to anything other than 1 (in this case, 12), it is:

[index] k-mer length: 63
[index] number of targets: 252,723
[index] number of k-mers: 157,178,936
[index] number of equivalence classes loaded from file: 327,292
[tcc] Parsing transcript-compatibility counts (TCC) file as a matrix file
[tcc] Matrix dimensions: 72 x 327,292
[quant] Running EM algorithm...
[   em] reading priors from file ONT
[quant] Processing sample/cell 0quant] Processing sample/cell [quant] Processing sample/cell 2[quant] Processing sample/cell [quant] Processing sample/cell quant] Processing sample/cell 5
[quant] Processing sample/cell 3[quant] Processing sample/cell [quant] Processing sample/cell 6
[quant] Processing sample/cell 4
[quant] Processing sample/cell 77



[quant] Processing sample/cell 88
[[[

quant] Processing sample/cell 11
[quant] Processing sample/cell 9quant] Processing sample/cell [quant] Processing sample/cell 11uant] Processing sample/cell [quant] Processing sample/cell [quant] Processing sample/cell 1
0
0


/home/stbresnahan/.lsbatch/1727384386.16588742.shell: line 38: 3476442 Segmentation fault     (core dumped) kallisto quant-tcc -t 12 --long -p ONT -f ${DIR_OUT}/flens.txt -i kallisto_index/gencode_v45 -e ${DIR_OUT}/count.ec.txt -o ${DIR_OUT}/quant-tcc ${DIR_OUT}/count.mtx

This occurs regardless of whether I start the process with a single .fastq or multiple .fastq files.

@Yenaled
Copy link
Collaborator

Yenaled commented Oct 9, 2024

@sbresnahan can you post the exact commands you’re running?

And can you try the official binaries on the Releases page to make sure it’s not a compilation error?

kallisto_LongKmer_NoOpt-v0.51.1.tar.gz

@sbresnahan
Copy link
Author

@sbresnahan can you post the exact commands you’re running?

And can you try the official binaries on the Releases page to make sure it’s not a compilation error?

kallisto_LongKmer_NoOpt-v0.51.1.tar.gz

Building transcriptome index:

gffread -F -w GCA_000001405.15_GRCh38_no_alt_analysis_set_gencode_v45.fasta \
   -g GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
   gencode.v45.annotation.gtf

kallisto index -k 63 -t 10 -i gencode_v45 GCA_000001405.15_GRCh38_no_alt_analysis_set_gencode_v45.fasta

Running lr-kallisto:

kallisto bus -t 8 --long --threshold 0.8 -x bulk -i gencode_v45 \
  -o kallisto_out fullLength.and.rescued.fastq 

bustools sort -t 8 kallisto_out/output.bus \
 -o kallisto_out/sorted.bus
 
bustools count kallisto_out/sorted.bus \
 -t kallisto_out/transcripts.txt \
 -e kallisto_out/matrix.ec \
 -g kallisto_out/gencode_v45_tx2g.tsv \
 -o kallisto_out/count --cm -m

kallisto quant-tcc -t 8 \
	--long -p ONT -f kallisto_out/flens.txt \
	-i kallisto_index/gencode_v45 \
	-e kallisto_out/count.ec.txt \
	-o kallisto_out/quant-tcc \
	--matrix-to-files \
	kallisto_out/count.mtx

I will try the linked binary and get back to you.

@MustafaElshani
Copy link

MustafaElshani commented Nov 6, 2024

I do get a similar error line 76: 5394 Segmentation fault I have tried both compiling myself and using the @Yenaled

[index] k-mer length: 63
[index] number of targets: 385,659
[index] number of k-mers: 186,649,435
[index] number of equivalence classes loaded from file: 193,836
[tcc] Parsing transcript-compatibility counts (TCC) file as a matrix file
[tcc] Matrix dimensions: 1 x 193,836
[quant] Running EM algorithm...
[   em] reading priors from file ONT
[quant] Processing sample/cell 0
/var/spool/slurm/job23490963/slurm_script: line 76:  5394 Segmentation fault      (core dumped) $SCRATCH/bioinformatic_tools/kallisto/kallisto/kallisto_linux-v0.51.1_kmer64 quant-tcc --long -p ONT -t $SLURM_CPUS_PER_TASK -i "$INDEX_PATH" -o "$OUTPUT_DIR/$SAMPLE_NAME" --matrix-to-files -f "$OUTPUT_DIR/$SAMPLE_NAME/flens.txt" -e "$OUTPUT_DIR/$SAMPLE_NAME/count.ec.txt" "$OUTPUT_DIR/$SAMPLE_NAME/count.mtx"
Is this an issue mainly with with `v0.51.1?

@Yenaled
Copy link
Collaborator

Yenaled commented Nov 6, 2024

Very strange — quant-tcc seems to have issues with the input files supplied. If you are able to upload the files somewhere (the files supplied to quant-tcc) and email them to me, I can help debug.

@MustafaElshani
Copy link

MustafaElshani commented Nov 6, 2024

Sorted it
It was the -p I was reading https://pachterlab.github.io/kallisto/manual where the -p was for platform while actually its -P for platform

@Yenaled
Copy link
Collaborator

Yenaled commented Nov 6, 2024

Oh good catch! And yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants