-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sRNA spike-ins #116
Comments
You will just need your spikein sequences in your mature file and precursor file definition (length doesn't matter) cat mature.fa
>spikein1
CTACGATCGATGCTAGCTAGCAGGCGATCAGCATCGATGATCGATTA
cat pres.fa
>spikein1
CTACGATCGATGCTAGCTAGCAGGCGATCAGCATCGATGATCGATTA just run it in its most simple form and give the mature.fa as the precursor input as well perl quantifier.pl -r reads_collapsed.fa -m mature.fa -p mature.fa so just add your spikins to your mature file and to the precursor file |
Thank you. I can confirm that copying the 'mature' sequences to the 'precursor' FASTA results in miRDeep2 picking up the RNA spike-ins for quantification. |
Sorry for re-opening this: I noticed that only reads matching the spike-in exactly are quantified using this approach. I noticed because the results did not add up and by a simple edit: Just a quick update: It looks like indeed I missed a leading |
Then just put a few A‘s in the end of the precursor. Mismatches can be allowed up to 2 if I remember correctly. If the mature is longer than the precursor then bowtie will fail to map I guess |
That's what I am currently guessing as well. Do you know if N's would be allowed as well? I would prefer not to bias the mapper towards any specific untemplated base. Thank you for the fast response. |
I can confirm the spike-in counts changed by orders of magnitude while miRNA counts remains almost unchanged. So the extreme sensitivity to non-templated bases appears to be an issue for the spike-ins only, likely due to what was discussed here. I will leave this issue open until as a reminder to update it with my final conclusions in case anyone else searches this tracker looking for advice how to include spike-ins. |
When padding the 'mature' spike-ins with ten N's on either side for the 'hairpins' I do not get any spike-in quantification. IIRC, miRDeep2 discards sequences with non ACTG(U?) characters (at least by default). edit: As expected, replacing the N's with A's did yield spike-in results again, with slightly more reads mapping to them as before. I feel like some padding should be done to remove the bias agains spike-ins (compared to endogenous miRNAs) in case of imperfect pre-processing but I am concerned that picking a specific nucleotides introduces another bias. @Drmirdeep: Could you provide your point of view? |
If the preprocessing is the problem then this should be solved on the preprocessing side not on a tool that expects the input data to follow certain conventions |
I guess I was not clear enough: The pre-processing issue was the reason the quantification was off orders of magnitude. That is fixed and not my issue. However, that made me realise that a read that, for whatever reason, has an extra base maps just fine for actual miRNAs but not for spike-ins. The whole point of spike-ins is to serve as controls. So they should be treated as close to identitcal as possible. So in a way the problem is that my pre-processing mistake affected spike-ins but not miRNAs (or at least much less). Padding the spike ins with A's helps, because now they are treated more similar as miRNAs. But as along as I fix that padding, there is still a bias that is spike-in specific. And ideally, nothing should be specific for the thing you normalize to. 😉 |
Dear @Drmirdeep,
More and more sRNA-seq projects utilize RNA spike-ins as controls.
What is the recommended workflow to quantify these along miRNAs using miRDeep2's quantifier module?
I added the corresponding sequences to the mature miRNA input FASTA but they don't get picked up. I am guessing I need to at least make up some 'hairpins' for them as well (How? Can I just use the mature sequence or do I need to actually craft legit hairpin-structures to bypass some internal filtering? And what do I do with spike-ins that are longer than your typical mature miRNA but well within the read length - say 35nt RNA spike-in @ 51 cycles single-end Illumina sequencing?).
But will I need to make up a fake chromsome as well that hosts those fake hairpins and re-build an new genome bowtie index just to quantify the spike-ins along endogenous miRNAs?
Any advise would be highly appreciated.
Also, would you be up to adding a new optional input FASTA with 'mature' spike-in sequences that are quantied as-is by the quantifier module?
The text was updated successfully, but these errors were encountered: