-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise STAR command for riboseq data #30
Comments
I will explore this when I have time. Below are some thoughts on it. Any input would be helpful. Looks like this param is a bit permissive. I often refer to what ORFik does and they use I also wonder what the trade off is between
I am unclear how we get the 'too short' as it was my understanding that |
Could you PR any modifications you feel may be appropriate, and post the above plot to show the impact of the change with the 'too short' is a bit misleading with STAR, it doesn't quite mean what it seems. |
Hey everyone, I’ll chip in case helpful. My apologies @JackCurragh if you’ve already figured these things out for your upcoming PR.
The stringent requirement of EndToEnd alignment likely fails when ends of the reads do not perfectly match the reference, thus "too short". However, these alterations might affect the distribution of reads in other categories as well - i.e. if we allow too many mismatches we might get many more multi-mappers. I recommend checking the multiQC/STAR log output alongside the BAMs’ riboseq QC metrics to fully assess the impact of any parameter adjustments. |
Hi Ira, thanks for the input. I have this ready to PR with the updated params but I agree that it is worth further investigation. Unfortunately I probably won't have time to do this until later this week but will be sure to report back when we have results to discuss. |
I messed around quite a bit (see e.g. #37) and didn't manage to make any improvements to the mapping statistics, so didn't proceed further. Of course the parameter space is large, so I'm sure there are improvements to be made, we just need to demonstrate the utility. |
Description of feature
I'm noticing a lot of multi-mapping reads and those unmapped as 'too short'. Some of this is just down to the short read lengths I guess, but we should try to add some STAR optimisations. See the STAR MultiQC plot below. The top runs are RNA-seq, bottom Riboseq.
The text was updated successfully, but these errors were encountered: