-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trying to fix the concatenation of fastq files #46
Conversation
|
@ramprasadn, would very much appreciate your thoughts on this PR :) |
def branchFastqToSingleAndMulti(ch_reads) { | ||
|
||
return ch_reads | ||
.map { | ||
meta, fastq -> | ||
original_id = meta.id.split('_T')[0..-2].join('_') | ||
[ meta + [id: original_id], fastq ] | ||
} | ||
.groupTuple() | ||
.branch { | ||
meta, fastq -> | ||
single_fq: fastq.size() == 1 | ||
return [ meta, fastq.flatten() ] | ||
multiple_fq: fastq.size() > 1 | ||
return [ meta, fastq.flatten() ] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It´s true, it would concatenate all the time as it is written.
They do like this in RNAseq. It is pretty much the same as you have done, but what if sample id does not end in _T? Could we instead ask for input fastq from R1 to be separated by " "?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could think of changing the T to something else. Right now it is added to the sample name in this function.
tomte/bin/check_samplesheet.py
Line 147 in ac85767
In addition to the validation, also rename all samples to have a suffix of _T{n}, where n is the |
So the sample name will always have this suffix. When we switch to nf-validation we can change this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense :)
single_fq: fastq.size() == 1 | ||
return [ meta, fastq.flatten() ] | ||
multiple_fq: fastq.size() > 1 | ||
return [ meta, fastq.flatten() ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is branching needed here? Asking because the returned output looks the same for both branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We feed the multiple branch into the cat CAT process and then mix it later. They should not be the same 😅
This is mostly taken from the rna fusion pipeline which has taken the code from the rnaseq pipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright cool! :D
When running tomte with samples having multiple fastq pairs linked to them, Tomte wasn't able to concatenate fastq files belonging to the same sample. This should fix that issue
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).