trying to fix the concatenation of fastq files #46

jemten · 2023-10-30T15:34:46Z

When running tomte with samples having multiple fastq pairs linked to them, Tomte wasn't able to concatenate fastq files belonging to the same sample. This should fix that issue

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

github-actions · 2023-10-30T15:37:06Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 09628ac

+| ✅ 130 tests passed       |+
#| ❔ 258 tests were ignored |#
!| ❗  16 tests had warnings |!

❗ Test warnings:

readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
pipeline_todos - TODO string in tomte.nf: change this to use the plugin nf-validation
schema_lint - Schema $id should be https://raw.githubusercontent.com/genomic-medicine-sweden/tomte/master/nextflow_schema.json
Found https://raw.githubusercontent.com/nf-core/tomte/master/nextflow_schema.json

❔ Tests ignored:

files_exist - File is ignored: f
files_exist - File is ignored: a
files_exist - File is ignored: l
files_exist - File is ignored: s
files_exist - File is ignored: e
files_exist - File is ignored:
files_exist - File is ignored: -
files_exist - File is ignored:
files_exist - File is ignored: a
files_exist - File is ignored: s
files_exist - File is ignored: s
files_exist - File is ignored: e
files_exist - File is ignored: t
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: n
files_exist - File is ignored: f
files_exist - File is ignored: -
files_exist - File is ignored: c
files_exist - File is ignored: o
files_exist - File is ignored: r
files_exist - File is ignored: e
files_exist - File is ignored: -
files_exist - File is ignored: t
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: t
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: l
files_exist - File is ignored: o
files_exist - File is ignored: g
files_exist - File is ignored: o
files_exist - File is ignored: _
files_exist - File is ignored: l
files_exist - File is ignored: i
files_exist - File is ignored: g
files_exist - File is ignored: h
files_exist - File is ignored: t
files_exist - File is ignored: .
files_exist - File is ignored: p
files_exist - File is ignored: n
files_exist - File is ignored: g
files_exist - File is ignored:
files_exist - File is ignored: -
files_exist - File is ignored:
files_exist - File is ignored: d
files_exist - File is ignored: o
files_exist - File is ignored: c
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: i
files_exist - File is ignored: m
files_exist - File is ignored: a
files_exist - File is ignored: g
files_exist - File is ignored: e
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: n
files_exist - File is ignored: f
files_exist - File is ignored: -
files_exist - File is ignored: c
files_exist - File is ignored: o
files_exist - File is ignored: r
files_exist - File is ignored: e
files_exist - File is ignored: -
files_exist - File is ignored: t
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: t
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: l
files_exist - File is ignored: o
files_exist - File is ignored: g
files_exist - File is ignored: o
files_exist - File is ignored: _
files_exist - File is ignored: l
files_exist - File is ignored: i
files_exist - File is ignored: g
files_exist - File is ignored: h
files_exist - File is ignored: t
files_exist - File is ignored: .
files_exist - File is ignored: p
files_exist - File is ignored: n
files_exist - File is ignored: g
files_exist - File is ignored:
files_exist - File is ignored: -
files_exist - File is ignored:
files_exist - File is ignored: d
files_exist - File is ignored: o
files_exist - File is ignored: c
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: i
files_exist - File is ignored: m
files_exist - File is ignored: a
files_exist - File is ignored: g
files_exist - File is ignored: e
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: n
files_exist - File is ignored: f
files_exist - File is ignored: -
files_exist - File is ignored: c
files_exist - File is ignored: o
files_exist - File is ignored: r
files_exist - File is ignored: e
files_exist - File is ignored: -
files_exist - File is ignored: t
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: t
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: l
files_exist - File is ignored: o
files_exist - File is ignored: g
files_exist - File is ignored: o
files_exist - File is ignored: _
files_exist - File is ignored: d
files_exist - File is ignored: a
files_exist - File is ignored: r
files_exist - File is ignored: k
files_exist - File is ignored: .
files_exist - File is ignored: p
files_exist - File is ignored: n
files_exist - File is ignored: g
files_exist - File is ignored:
files_exist - File is ignored: -
files_exist - File is ignored:
files_exist - File is ignored: d
files_exist - File is ignored: o
files_exist - File is ignored: c
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: i
files_exist - File is ignored: m
files_exist - File is ignored: a
files_exist - File is ignored: g
files_exist - File is ignored: e
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: t
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: t
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: l
files_exist - File is ignored: o
files_exist - File is ignored: g
files_exist - File is ignored: o
files_exist - File is ignored: .
files_exist - File is ignored: e
files_exist - File is ignored: p
files_exist - File is ignored: s
files_exist - File is ignored:
files_exist - File is ignored: -
files_exist - File is ignored:
files_exist - File is ignored: d
files_exist - File is ignored: o
files_exist - File is ignored: c
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: i
files_exist - File is ignored: m
files_exist - File is ignored: a
files_exist - File is ignored: g
files_exist - File is ignored: e
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: t
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: t
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: p
files_exist - File is ignored: i
files_exist - File is ignored: p
files_exist - File is ignored: e
files_exist - File is ignored: l
files_exist - File is ignored: i
files_exist - File is ignored: n
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: m
files_exist - File is ignored: e
files_exist - File is ignored: t
files_exist - File is ignored: r
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: a
files_exist - File is ignored: p
files_exist - File is ignored: .
files_exist - File is ignored: e
files_exist - File is ignored: p
files_exist - File is ignored: s
files_exist - File is ignored:
files_exist - File is ignored: -
files_exist - File is ignored:
files_exist - File is ignored: d
files_exist - File is ignored: o
files_exist - File is ignored: c
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: i
files_exist - File is ignored: m
files_exist - File is ignored: a
files_exist - File is ignored: g
files_exist - File is ignored: e
files_exist - File is ignored: s
files_exist - File is ignored: /
files_exist - File is ignored: t
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: t
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: p
files_exist - File is ignored: i
files_exist - File is ignored: p
files_exist - File is ignored: e
files_exist - File is ignored: l
files_exist - File is ignored: i
files_exist - File is ignored: n
files_exist - File is ignored: e
files_exist - File is ignored: _
files_exist - File is ignored: m
files_exist - File is ignored: e
files_exist - File is ignored: t
files_exist - File is ignored: r
files_exist - File is ignored: o
files_exist - File is ignored: m
files_exist - File is ignored: a
files_exist - File is ignored: p
files_exist - File is ignored: .
files_exist - File is ignored: p
files_exist - File is ignored: n
files_exist - File is ignored: g
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-tomte_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-tomte_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-tomte_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
multiqc_config - multiqc_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowTomte.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-tomte_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (181 files)
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: ci.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.10
Run at 2023-10-30 15:36:44

jemten · 2023-10-30T16:06:39Z

@ramprasadn, would very much appreciate your thoughts on this PR :)

Lucpen · 2023-10-31T06:24:47Z

subworkflows/local/alignment.nf

+def branchFastqToSingleAndMulti(ch_reads) {
+
+    return ch_reads
+        .map {
+            meta, fastq ->
+                original_id = meta.id.split('_T')[0..-2].join('_')
+                [ meta + [id: original_id], fastq ]
+        }
+        .groupTuple()
+        .branch {
+            meta, fastq ->
+                single_fq: fastq.size() == 1
+                    return [ meta, fastq.flatten() ]
+                multiple_fq: fastq.size() > 1
+                    return [ meta, fastq.flatten() ]
+        }
+}


It´s true, it would concatenate all the time as it is written.
They do like this in RNAseq. It is pretty much the same as you have done, but what if sample id does not end in _T? Could we instead ask for input fastq from R1 to be separated by " "?

We could think of changing the T to something else. Right now it is added to the sample name in this function.

tomte/bin/check_samplesheet.py

Line 147 in ac85767

In addition to the validation, also rename all samples to have a suffix of _T{n}, where n is the

So the sample name will always have this suffix. When we switch to nf-validation we can change this :)

Makes sense :)

ramprasadn · 2023-10-31T11:07:23Z

subworkflows/local/alignment.nf

+                single_fq: fastq.size() == 1
+                    return [ meta, fastq.flatten() ]
+                multiple_fq: fastq.size() > 1
+                    return [ meta, fastq.flatten() ]


Is branching needed here? Asking because the returned output looks the same for both branches.

We feed the multiple branch into the cat CAT process and then mix it later. They should not be the same 😅
This is mostly taken from the rna fusion pipeline which has taken the code from the rnaseq pipe.

https://github.com/nf-core/rnaseq/blob/3bec2331cac2b5ff88a1dc71a21fab6529b57a0f/workflows/rnaseq.nf#L211

Alright cool! :D

trying to fix the concatenation of fastq files

09628ac

jemten marked this pull request as ready for review October 30, 2023 15:43

jemten requested a review from Lucpen as a code owner October 30, 2023 15:43

Lucpen reviewed Oct 31, 2023

View reviewed changes

Lucpen approved these changes Oct 31, 2023

View reviewed changes

ramprasadn reviewed Oct 31, 2023

View reviewed changes

ramprasadn approved these changes Oct 31, 2023

View reviewed changes

jemten merged commit 896fc64 into master Oct 31, 2023
7 checks passed

jemten deleted the fixing_cat branch October 31, 2023 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trying to fix the concatenation of fastq files #46

trying to fix the concatenation of fastq files #46

jemten commented Oct 30, 2023 •

edited

Loading

github-actions bot commented Oct 30, 2023 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

jemten commented Oct 30, 2023

Lucpen Oct 31, 2023

jemten Oct 31, 2023

Lucpen Oct 31, 2023 •

edited

Loading

ramprasadn Oct 31, 2023

jemten Oct 31, 2023

jemten Oct 31, 2023

ramprasadn Oct 31, 2023

trying to fix the concatenation of fastq files #46

trying to fix the concatenation of fastq files #46

Conversation

jemten commented Oct 30, 2023 • edited Loading

PR checklist

github-actions bot commented Oct 30, 2023 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

jemten commented Oct 30, 2023

Lucpen Oct 31, 2023

Choose a reason for hiding this comment

jemten Oct 31, 2023

Choose a reason for hiding this comment

Lucpen Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

ramprasadn Oct 31, 2023

Choose a reason for hiding this comment

jemten Oct 31, 2023

Choose a reason for hiding this comment

jemten Oct 31, 2023

Choose a reason for hiding this comment

ramprasadn Oct 31, 2023

Choose a reason for hiding this comment

jemten commented Oct 30, 2023 •

edited

Loading

github-actions bot commented Oct 30, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

Lucpen Oct 31, 2023 •

edited

Loading