-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kallisto 0.50.0 can't index #411
Comments
Thanks for bringing up this issue; it seems to be specific for the lamanno workflow (which we're deprecating in favor of something better). The way you're running the programs is correct, so it's an issue on our end. I'm looking into this. |
I've identified the issue: it's because the lamanno workflow is poorly implemented. Every single intron is considered a unique "transcript". You have k-mers in the index that can potentially map to over 200K different transcripts. That's one reason we're deprecating lamanno (if you look at the devel version of kb-python; we have a new --workflow=nac that will entirely supersede lamanno). Basically, building lamanno now takes 152 gigabytes of memory (though it only takes 25 gb to actually load it in, which is a substantial improvement). I admittedly did not consider lamanno when writing the index construction step of kallisto, since we were deprecating it anyway. We just released the newest version of kallisto (0.50.1) which still contains this issue. However, I'll narrow down the issue in the code and see if there's a trivial way to fix it (if so, a new release will be put out very shortly; if not, well, consider upgrading ;) ). I hope that makes sense! Let me know if you have any questions! |
Thank you for answering so quickly. I do have some more questions. But first, I have to give you some context for why I opened this issue with kallisto 0.50.0 in the first place: I'm trying to perform RNA velocity on some 10x v2 scRNA-seq files that have 160-210M reads each (12-15GB per file). At first I tried using version 0.50.0, but because it didn't work, I resorted to version 0.48.0. This version can create an index within my available 125GB of RAM (the generated index file takes 45GB of storage) and can also pseudoalign. The problem is that the pseudoaligning step with kallisto 0.48.0 takes days for some of my files and I have to go one at a time with the current resources that I have. So, I tried to see if I could make kallisto 0.50.0 work because it had the improved indexing method. But because it didn't work by any means, I opened this issue and currently continue to use kallisto 0.48.0. My questions:
I'm sorry if I deviated the topic slightly from the original issue. And thank you again for answering so fast. |
I've finished writing a detailed manual -- will release it sometime this month. Happy to answer any questions in the meantime or walk you through things. |
I had the same issue on an M2 mac. I had to conda install an older version of kallisto to get it to work (0.46.2) |
Hello All: I have a question related to the discussion above. https://www.kallistobus.tools/tutorials/kb_velocity/python/kb_velocity/ using the new version of kb (--workflow nac). As of Dec 14, I can't run it with kb-python 0.28.0 because kb count runs out of memory (with over 100 Gb available).
Regards, |
Can you show the commands you’re using and how you’re building the index (what FASTA/GTF files are being used) as well as the commands you’re using for kb count? kb count (0.28.0 nac index) will not consume even a third of that amount of memory, so something is wrong on your end. |
The commands are:
The error message from kb count is:
Further, I used dmesg command and its output implies the memory consumption was over 100 Gb. |
Can you run /usr/bin/time -v (include the -v) And then use --verbose when using kb count? It works on my end. Edit: It takes 18 gb on my end for the human index (10 gb for the mouse index). |
I tried the following:
The output is:
|
It says "Maximum resident set size (kbytes): 548888" That is only 0.5 gigabytes. The "Signals.SIGILL: 4" means illegal instruction. That likely means that the prepackaged binaries do NOT work on your system and that you need to compile kallisto (and possibly bustools) from source. See the instructions on the first page here: https://www.biorxiv.org/content/biorxiv/early/2023/11/22/2023.11.21.568164/DC1/embed/media-1.pdf for information on how to compile from source on your system and how to use your source-compiled kallisto+bustools within kb-python, |
Hello Delaney: I used the pdf instruction and it works. Regards, |
How do you download the specific older version of 0.48.0 in conda? |
To install a specific version using conda or mamba you can do:
This is explained in the conda documentation To see the available versions of a package you can search here. The option |
I'm trying to perform RNA velocity with kallisto, bustools and their wrapper kb-python following the instructions in this R Notebook. But I'm unable to generate an index with kallisto 0.50.0.
Summary of what I tried
Illegal instruction (core dumped)
died with <Signals.SIGKILL: 9>
)died with <Signals.SIGKILL: 9>
I include more information of hardware and commands in the next section if you need it.
According to the release notes for kallisto 0.50.0 "The improved kallisto index reduces memory consumption for large FASTA files", but with this version I can't generate an index because it collapses the RAM and with version 0.48.0 I can.
Is it normal for it to use up so much RAM? Am I missing something?
Supporting information
I have run all commands in a computer with an Intel i7-6950X, 125GB of RAM, 120GB of free storage space, and Ubuntu 22.04.3 installed.
1) Using kallisto 0.50.0 from bioconda.
I tested the version from bioconda using the test folder from kallisto's GitHub page. I also tested this on a different computer with AMD Ryzen 7-5800H and 16GB of RAM and I got the same error.
I ran:
Output:
2 & 3) Using kallisto 0.50.0 binary from GitHub and compiled from source
Both of these versions can process the files in the test folder without errors. But when I try to index the RNA velocity transcriptome it overflows the 125GB of RAM.
I ran:
This is the output I get:
The text was updated successfully, but these errors were encountered: