Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trying to fix the concatenation of fastq files #46

Merged
merged 1 commit into from
Oct 31, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 35 additions & 2 deletions subworkflows/local/alignment.nf
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,13 @@ workflow ALIGNMENT {
main:
ch_versions = Channel.empty()

CAT_FASTQ(reads)
ch_fastq = branchFastqToSingleAndMulti(reads)

FASTP(CAT_FASTQ.out.reads,[],false,false)
CAT_FASTQ(ch_fastq.multiple_fq)
.reads.mix(ch_fastq.single_fq)
.set { ch_cat_fastq }

FASTP(ch_cat_fastq, [], false, false)

STAR_ALIGN(FASTP.out.reads, star_index, gtf, false, 'illumina', false)

Expand Down Expand Up @@ -86,3 +90,32 @@ workflow ALIGNMENT {
salmon_info = SALMON_QUANT.out.json_info
versions = ch_versions
}


// Custom functions

/**
* Branch the read channel into differnt channels,
* depending on whether the sample has multiple fastq files or not.
* The resulting channels gets the original sample id in meta.
*
* @param ch_reads Channel containing meta and fastq reads
* @return Channel containing meta with original id and branched on number of fastq files
*/
def branchFastqToSingleAndMulti(ch_reads) {

return ch_reads
.map {
meta, fastq ->
original_id = meta.id.split('_T')[0..-2].join('_')
[ meta + [id: original_id], fastq ]
}
.groupTuple()
.branch {
meta, fastq ->
single_fq: fastq.size() == 1
return [ meta, fastq.flatten() ]
multiple_fq: fastq.size() > 1
return [ meta, fastq.flatten() ]
Comment on lines +116 to +119

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is branching needed here? Asking because the returned output looks the same for both branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We feed the multiple branch into the cat CAT process and then mix it later. They should not be the same 😅
This is mostly taken from the rna fusion pipeline which has taken the code from the rnaseq pipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright cool! :D

}
}
Comment on lines +105 to +121
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It´s true, it would concatenate all the time as it is written.
They do like this in RNAseq. It is pretty much the same as you have done, but what if sample id does not end in _T? Could we instead ask for input fastq from R1 to be separated by " "?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think of changing the T to something else. Right now it is added to the sample name in this function.

In addition to the validation, also rename all samples to have a suffix of _T{n}, where n is the

So the sample name will always have this suffix. When we switch to nf-validation we can change this :)

Copy link
Collaborator

@Lucpen Lucpen Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense :)