need to design a tree store for large studies/trees that are integrated into OpenTree via scripts #111

mtholder · 2016-09-23T02:23:34Z

@josephwb has a supertree study with >1000 trees.
@snacktavish will soon have automatically generated gene trees.
Brian O'Meara has some very large trees for which he has/wants to have the OTU mapping done with scripts.

These don't really fit nicely into the design of phylesystem as data store for manually curated trees (most of which are presumably intended for synthesis).

Huge studies will make us hit our git repo limit sooner (we can use shards then, but still a bit of an annoyance). Plus they won't load in the curator app (they'll require to much RAM).

Automatically curated trees are a generated product that probably should not be versioned.

So basically we need a data store that:

is updatable,
plays well with synthesis,
makes it easy for users to find the trees,
but which does not put everything into one NexSON that is pulled down by the curator app.

mtholder · 2016-09-23T02:36:04Z

Just to offer one vague suggestion: We could have tiny NexSON stub for the study that lives in phylesystem and has the reference info, other associated data, and (in some cases) perhaps the OTU mapping. And then just have links to newicks that are stored elsewhere.

We'd need to work on a syntax for augmenting the raw newick with extra info (e.g. ingroup and perhaps interpretation of branch lengths). But it shouldn't be too hard to do that.

josephwb · 2016-09-23T10:59:54Z

Probably not relevant, but most of my trees are tiny (there are just a lot of them):

snacktavish · 2016-09-23T17:19:54Z

+1 to the nexson stub idea, especially for the published large trees. and maybe a fully separate repo for the automatically generated trees that haven't been peer reviewed. I'll think on this!

jar398 · 2016-09-23T19:14:44Z

s3 storage was $.03 / Gb / month last I checked. The command line tool for
reading and writing is called 's3cmd'. I think the files are readable via
HTTP, but not sure how.

On Fri, Sep 23, 2016 at 1:19 PM, Emily Jane McTavish <
notifications@github.com> wrote:

+1 to the nexson stub idea, especially for the published large trees. and
maybe a fully separate repo for the automatically generated trees that
haven't been peer reviewed. I'll think on this!

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#111 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB8Qkm_pjshMt10K6_uw-r_SD0OLt1dUks5qtAo7gaJpZM4KEjP2
.

jar398 · 2016-10-05T20:56:29Z

Perhaps of interest: https://help.github.com/articles/about-git-large-file-storage/

jar398 · 2016-10-06T23:07:05Z

A use case from @bomeara: OpenTreeOfLife/opentree#788

jar398 mentioned this issue Oct 3, 2016

Better storage for supporting files #114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need to design a tree store for large studies/trees that are integrated into OpenTree via scripts #111

need to design a tree store for large studies/trees that are integrated into OpenTree via scripts #111

mtholder commented Sep 23, 2016

mtholder commented Sep 23, 2016

josephwb commented Sep 23, 2016

snacktavish commented Sep 23, 2016

jar398 commented Sep 23, 2016

jar398 commented Oct 5, 2016

jar398 commented Oct 6, 2016 •

edited

Loading

need to design a tree store for large studies/trees that are integrated into OpenTree via scripts #111

need to design a tree store for large studies/trees that are integrated into OpenTree via scripts #111

Comments

mtholder commented Sep 23, 2016

mtholder commented Sep 23, 2016

josephwb commented Sep 23, 2016

snacktavish commented Sep 23, 2016

jar398 commented Sep 23, 2016

jar398 commented Oct 5, 2016

jar398 commented Oct 6, 2016 • edited Loading

jar398 commented Oct 6, 2016 •

edited

Loading