Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trying to fix the concatenation of fastq files #46

Merged
merged 1 commit into from
Oct 31, 2023
Merged

Conversation

jemten
Copy link
Contributor

@jemten jemten commented Oct 30, 2023

When running tomte with samples having multiple fastq pairs linked to them, Tomte wasn't able to concatenate fastq files belonging to the same sample. This should fix that issue

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Oct 30, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 09628ac

+| ✅ 130 tests passed       |+
#| ❔ 258 tests were ignored |#
!| ❗  16 tests had warnings |!

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in tomte.nf: change this to use the plugin nf-validation
  • schema_lint - Schema $id should be https://raw.githubusercontent.com/genomic-medicine-sweden/tomte/master/nextflow_schema.json
    Found https://raw.githubusercontent.com/nf-core/tomte/master/nextflow_schema.json

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-10-30 15:36:44

@jemten jemten marked this pull request as ready for review October 30, 2023 15:43
@jemten jemten requested a review from Lucpen as a code owner October 30, 2023 15:43
@jemten
Copy link
Contributor Author

jemten commented Oct 30, 2023

@ramprasadn, would very much appreciate your thoughts on this PR :)

Comment on lines +105 to +121
def branchFastqToSingleAndMulti(ch_reads) {

return ch_reads
.map {
meta, fastq ->
original_id = meta.id.split('_T')[0..-2].join('_')
[ meta + [id: original_id], fastq ]
}
.groupTuple()
.branch {
meta, fastq ->
single_fq: fastq.size() == 1
return [ meta, fastq.flatten() ]
multiple_fq: fastq.size() > 1
return [ meta, fastq.flatten() ]
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It´s true, it would concatenate all the time as it is written.
They do like this in RNAseq. It is pretty much the same as you have done, but what if sample id does not end in _T? Could we instead ask for input fastq from R1 to be separated by " "?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think of changing the T to something else. Right now it is added to the sample name in this function.

In addition to the validation, also rename all samples to have a suffix of _T{n}, where n is the

So the sample name will always have this suffix. When we switch to nf-validation we can change this :)

Copy link
Collaborator

@Lucpen Lucpen Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense :)

Comment on lines +116 to +119
single_fq: fastq.size() == 1
return [ meta, fastq.flatten() ]
multiple_fq: fastq.size() > 1
return [ meta, fastq.flatten() ]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is branching needed here? Asking because the returned output looks the same for both branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We feed the multiple branch into the cat CAT process and then mix it later. They should not be the same 😅
This is mostly taken from the rna fusion pipeline which has taken the code from the rnaseq pipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright cool! :D

@jemten jemten merged commit 896fc64 into master Oct 31, 2023
7 checks passed
@jemten jemten deleted the fixing_cat branch October 31, 2023 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants