Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need to design a tree store for large studies/trees that are integrated into OpenTree via scripts #111

Open
mtholder opened this issue Sep 23, 2016 · 6 comments

Comments

@mtholder
Copy link
Member

@josephwb has a supertree study with >1000 trees.
@snacktavish will soon have automatically generated gene trees.
Brian O'Meara has some very large trees for which he has/wants to have the OTU mapping done with scripts.

These don't really fit nicely into the design of phylesystem as data store for manually curated trees (most of which are presumably intended for synthesis).

Huge studies will make us hit our git repo limit sooner (we can use shards then, but still a bit of an annoyance). Plus they won't load in the curator app (they'll require to much RAM).

Automatically curated trees are a generated product that probably should not be versioned.

So basically we need a data store that:

  1. is updatable,
  2. plays well with synthesis,
  3. makes it easy for users to find the trees,
  4. but which does not put everything into one NexSON that is pulled down by the curator app.
@mtholder
Copy link
Member Author

Just to offer one vague suggestion: We could have tiny NexSON stub for the study that lives in phylesystem and has the reference info, other associated data, and (in some cases) perhaps the OTU mapping. And then just have links to newicks that are stored elsewhere.

We'd need to work on a syntax for augmenting the raw newick with extra info (e.g. ingroup and perhaps interpretation of branch lengths). But it shouldn't be too hard to do that.

@josephwb
Copy link
Member

Probably not relevant, but most of my trees are tiny (there are just a lot of them):
rplot

@snacktavish
Copy link
Member

+1 to the nexson stub idea, especially for the published large trees. and maybe a fully separate repo for the automatically generated trees that haven't been peer reviewed. I'll think on this!

@jar398
Copy link
Member

jar398 commented Sep 23, 2016

s3 storage was $.03 / Gb / month last I checked. The command line tool for
reading and writing is called 's3cmd'. I think the files are readable via
HTTP, but not sure how.

On Fri, Sep 23, 2016 at 1:19 PM, Emily Jane McTavish <
notifications@github.com> wrote:

+1 to the nexson stub idea, especially for the published large trees. and
maybe a fully separate repo for the automatically generated trees that
haven't been peer reviewed. I'll think on this!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#111 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB8Qkm_pjshMt10K6_uw-r_SD0OLt1dUks5qtAo7gaJpZM4KEjP2
.

@jar398
Copy link
Member

jar398 commented Oct 5, 2016

@jar398
Copy link
Member

jar398 commented Oct 6, 2016

A use case from @bomeara: OpenTreeOfLife/opentree#788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants