Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kallisto|bustools 0.28.2 update #294

Merged
merged 39 commits into from
Mar 7, 2024
Merged

kallisto|bustools 0.28.2 update #294

merged 39 commits into from
Mar 7, 2024

Conversation

gennadyFauna
Copy link
Contributor

Starting work on bumping up kb version to 0.28.2. The nucleus and lamanno workflows are obsolete as of 0.28.0, so the interface for generating count matrices needs to be updated.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Jan 23, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 600619f

+| ✅ 171 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.13.1
  • Run at 2024-03-07 11:52:16

@gennadyFauna
Copy link
Contributor Author

gennadyFauna commented Jan 24, 2024

I am working on testing the current version to have a baseline. There is a potential existing bug: using the kallisto pseudoaligner with the lamanno workflow works, but only if Nextflow builds a new reference. Specifying an existing reference using kallisto_gene_map (the t2g.txt file?) and kallisto_index (the kb_ref_out.idx file) does not work, and throws an error pointing at

protocol_config['protocol'],
with the following explanation:

Missing process or function mix([DataflowStream[?], [DataflowGetPropertyExpression(value=null)]])

I suspect that this is because KALLISTOBUSTOOLS_COUNT takes in the cDNA/intron FASTA files in this line:


But I do not believe these variables (t1c and t2c) are ever defined if the index is provided:
if (!kallisto_index) {

I assume fixing this bug would require changing the kallisto workflow parameters. Though it does look as though these are already present in the following line:

That is not really what this argument does – it just concatenates the gene_version and transcript_version fields to IDs.
The txp2gene file should really be generated by kallisto|bustools. The t2g.py script generates a 3-column rather than 7-column t2g, which seems to be incompatible with kb count.
This .collect() is incompatible with passing in an existing t2g, and superfluous given the .collect() after the kb ref call.
@gennadyFauna
Copy link
Contributor Author

gennadyFauna commented Feb 12, 2024

Fixed some issues in kallisto_bustools.nf: the way the code was set up made it impossible to pass in an existing kallisto reference. Removed t2g.py. This script is from 2017 and a few versions behind; it may be more reliable to just use the t2g.txt generated by kallisto.

In the test run, kb exits with a SIGKILL error: https://github.com/nf-core/scrnaseq/actions/runs/7876828275/job/21493304177#step:5:93. This is actually likely related to the binary issue in pachterlab/kallisto#411.

t1c and t2c are required to run nac and lamanno count workflows. Since they are already defined when running kb ref, they cannot be easily defined through ext.args when using an existing reference
scrnaseq does not actually use kallisto_gene_map anywhere.
Changed kallisto icon from a salmon to a rainbow (there is no bear icon, but bustools uses the Pink Floyd-esque prism image, so it seems most appropriate)
Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gennadyFauna,

thanks for working on this! I'd like to get in #291 before we merge this to ensure everything is properly tested.

You can request to be added to the nf-core github organization in the #github-invitations channel - then github actions should run for you without anyone approving every single time.

Comment on lines -7 to -8
'https://depot.galaxyproject.org/singularity/kb-python:0.27.2--pyhdfd78af_0' :
'biocontainers/kb-python:0.27.2--pyhdfd78af_0' }"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once everything works, please update the modules in nf-core/modules and re-install them into the pipeline using nf-core modules update. We don't allow modules in the pipelines to diverge from the central version stored in modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have put in a request.
I appreciate the clarification regarding the nf-core/modules source – so I would need to make changes in https://github.com/nf-core/modules/tree/master/modules/nf-core/kallistobustools, then pull them in.

If I understand correctly, these module essentially have the input/output conventions and a link to the docker image, which does not actually work.
Is there any way to rebuild the binary for the kallisto dependency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Whenever possible, we try to use the docker images provided by Biocontainers, which are derived from the bioconda recipe. So if the container is broken, it would be ideal to make it work there. If this is not possible for whatever reason, we can also use custom docker images in nf-core/modules.

To just rebuild, you can make a PR to bioconda recipes that increases the build number: https://github.com/bioconda/bioconda-recipes/blob/2094e1c611eedfe500a145bd442640f0440f68a5/recipes/kb-python/meta.yaml#L14

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bioconda recipe has been fixed. I have opened a PR at nf-core/modules#4981; this should be complete relatively soon. In the meantime, I have updated the tests and adjusted the downstream mtx operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grst the nf-core module PR is merged and I have updated the kb modules in the PR branch. It still fails linting due to other files – not sure what to do; would love any suggestions. Running nf-core lint locally does not give any failures.

gennadyFauna and others added 7 commits February 27, 2024 13:01
mtx should take in nascent/ambiguous/mature matrices.
The genes.names txt contains gene names, which are typically what is desired (e.g. cellranger has names rather than IDs)
@grst
Copy link
Member

grst commented Mar 4, 2024

@nf-core-bot fix linting

Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

@grst
Copy link
Member

grst commented Mar 5, 2024

Don't worry about the linting failures. This is related to an update in the nf-core template that has not been yet integrated into the scrnaseq pipeline.

@grst
Copy link
Member

grst commented Mar 7, 2024

@nf-core-bot fix linting

@grst grst merged commit ae5b28b into nf-core:dev Mar 7, 2024
10 checks passed
@grst grst mentioned this pull request Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants