-
Notifications
You must be signed in to change notification settings - Fork 0
Development: Meetings: 2017 03 10th Friday
Tony E Lewis edited this page Sep 12, 2017
·
1 revision
- Christine
- Ian
- Natalie
- Sayoni
- Tony
- It'd be worth Tony putting some effort into ensuring new GeMMA code runs on Legion as well as the CS cluster.
- The code must remain flexible to build a tree from any bunch of starting clusters. This way it will allow for future work to potentially subgroup superfamilies, eg according to MDA, SSG etc.
- We like the tree benchmark of average evalue. Sayoni would also be interested to see comparisons of evalue distributions.
- We like the idea of a benchmark that assesses the number of EC codes within a FunFam. We hope that this would be more meaningful than the tree benchmark but faster/simpler than other function benchmarks.
- Other benchmarks could use the SFLD set and look at CSA residues.
- We think it could be good to focus on three superfamilies: one small, one medium and one large.
- Sayoni's emailed out these key benchmarking superfamilies that were mentioned in the meeting:
- HUPs, 3.40.50.620
- TPPs, 3.40.50.970
- Enolase, 3.20.20.120 and 3.30.390.10
- If we find ourselves trying to understand what's going on in trees, it may be worth investing in reusable ways of visualising etc (eg write out in Newick format, use standard tree viewers - eg Ubuntu has several viewers: eg figtree, treeviewx & njplot and there are loads more on the Wikipedia page List of phylogenetic tree visualization software.
- Since Natalie and Sayoni are likely to be responsible for running GeMMA in the future, Tony should work with them to ensure they find the new code usable.
- We won't pursue the overlap issue further for now.
- We are interested in the idea of possibly not realigning all levels of sub-clusters, and think it could possibly be of value to Sayoni's FunFHMMer protocol.
- Either way, we think it definitely makes sense for GeMMA to provide alignments for each of the nodes in the final tree as part of the output because otherwise Sayoni's FunFHMMer has to waste time regenerating the exact same alignments.
- We're open to investigating whether we can get closer to the "pure greedy" tree by avoiding the evalue intervals, eg (1e-15 to 1e-40)
After leaving Christine's office, the rest of us discussed the FunFam renumbering issue:
- We think that it will be useful/necessary to be able to map numberings from old sets of FunFams to new so that largely unchanged FunFams keep their numbers.
- We agreed that it'd be good if Tony is able to write a tool to do this, but we don't consider it an essential part of the remit of this work.
- We think it'd be good to avoid confusion between the working IDs that we use before this number-mapping and the official numbers that we use from then on. So we propose to use working IDs like "working_378" to avoid confusing with the final "378". This should only affect Tony and Sayoni's code because the number-mapping can be performed after FunFHMMer.
- Later, Ian and I discussed considering the idea of doing a one-off reset to the numbering, ie for one CATH release, reset the numbering for FunFams to run from 1 upwards (perhaps largest first?) and provide a mapping from the IDs in the previous release.
TBD
At the meeting, we may wish to briefly re-address the overlap issue.
(Originally scheduled for 16:30, Tuesday 14th March, but then cancelled)