Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more data to recount3 #50

Open
4 tasks
lcolladotor opened this issue Dec 1, 2023 · 0 comments
Open
4 tasks

Add more data to recount3 #50

lcolladotor opened this issue Dec 1, 2023 · 0 comments

Comments

@lcolladotor
Copy link
Member

This is a recurrent goal as new data is deposited nearly every day to the Sequence Read Archive.

  • To add more data to recount3, we first need computing credits at some large computing clusters such as ACCESS (formerly called XSEDE) https://access-ci.org/.

  • Next, we have to run Monorail https://github.com/langmead-lab/monorail-external to process new data.

  • The outputs are then transferred to a local cluster where we can keep a backup of the data. On the recount3 paper, this is called the aggregation node. There files across studies are aggregated.

  • The data is then uploaded to IDIES, AWS Open Data Sponsorship Program https://aws.amazon.com/marketplace/pp/prodview-t3rflz3f557jq#resources, AnVIL, or any other active mirrors. It has to follow the data structure that the recount3 R package expects.

There are additional steps that are part of the recount3 world such as:

This goal really falls outside the recount3 R package, though the R package is one of the most commonly used interfaces for the data. Accomplishing this goal will likely need its own support and/or coordination with Wilks et al and/or Razi et al

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

1 participant