Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate contaminants file #28

Open
pinin4fjords opened this issue Feb 20, 2024 · 2 comments
Open

Generate contaminants file #28

pinin4fjords opened this issue Feb 20, 2024 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@pinin4fjords
Copy link
Member

Description of feature

sortmerna is implemented in the pipeline and runs by default. There will also be a bunch of other short RNA species we should remove, which we can use the (also inherited) bbsplit functionality.

But we do need to derive a list of contaminant sequences and figure out where to store it.

@pinin4fjords pinin4fjords added the enhancement New feature or request label Feb 20, 2024
@pinin4fjords pinin4fjords added this to the v1.1.0 milestone Feb 20, 2024
@JackCurragh
Copy link
Contributor

So is it just rRNA that is removed by default? I am not clear on what the combination of bbsplit and sortmerna achieve so it is hard to know what kinds of contaminants you have in mind (tRNA, phiX?).

@pinin4fjords
Copy link
Member Author

I came to the conclusion that a blanket cross-species set was not practical.

For test_full I used the usual rRNA complement with human tRNA sequences added (https://github.com/nf-core/test-datasets/blob/riboseq/testdata/rrna-db-full.txt), but this will be down to the user I think- so maybe this is a documentation issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants