-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split gather #502
Split gather #502
Conversation
now to update to make output cleaner
openfecli/commands/gather.py
Outdated
@@ -59,10 +200,20 @@ def legacy_get_type(res_fn): | |||
type=click.Path(dir_okay=True, file_okay=False, | |||
path_type=pathlib.Path), | |||
required=True) | |||
@click.option( | |||
'--report', | |||
type=click.Choice(['dg', 'ddg', 'leg'], case_sensitive=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very open to other names here, both for --report
and for dg/ddg/leg
(maybe abs/rel/leg
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, dg/ddg are very intuitive. "Leg" was less intuitive for me, but I'm not sure what might be better, how others felt about it. Maybe "legs" could be better, since then you expect more than one?
Regarding precision of the values: In the Mobleylab we had the style guide to always report uncertainties with only one significant digit of precision, so e.g. 17.418 ± 0.058 would become 17.42 ± 0.06 or 17.418 ± 0.008 would remain 17.418 ± 0.008,... I'm not sure what the most common way of handling this is though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed "leg" (or even "legs") is rather undescriptive, especially to non-english speakers.
I'm not sure what a better word would be, "individual"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: uncertainties, I don't have strong feelings but I definitely would at least advise for setting the uncertainty to the same level of precision as the reported number (i.e. the amount of significant figures you truncate at indicates an implicit view of the uncertainty of your results, so 17.14 has an implicit minimum error of 0.01).
writer.writerow([ligA, ligB, DDGhyd, hyd_unc]) | ||
|
||
def _write_raw_dg(legs, writer): | ||
writer.writerow(["leg", "ligand_i", "ligand_j", "DG(i->j) (kcal/mol)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the optimal case I would like this to write out all the individual replicas. Any thoughts on the necessary changes to results to ensure this? (I suspect it's a method we'd have to add at the gufe level)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be protocol-specific. Our default protocol has, in its outputs dict, outputs["unit_estimate"]
, which we could extract. But we make no promise that every unit will have this. Indeed, it doesn't make sense to do so: if you used a separate parametrization unit, there would be no meaning to that unit having a "unit_estimate" result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be something we need to add to the abstract class - a method for getting a breakdown by repeat.
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
openfecli/commands/gather.py
Outdated
default="dg", show_default=True, | ||
help=( | ||
"What data to report. 'dg' gives maximum-likelihood estimate of " | ||
"asbolute deltaG, 'ddg' gives delta-delta-G, and 'leg' gives the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"asbolute deltaG, 'ddg' gives delta-delta-G, and 'leg' gives the " | |
"asbolute deltaG, 'ddg' gives delta-delta-G, and 'legs' gives the " |
ddg_legs? environments? transformations? raw?
@hannahbaumann @RiesBen please make a choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for dg_raw, though it's not a strong preference.
I've updated this to use the names Since this is a PR into another PR, and tests won't trigger until it is in that PR, I might merge this without further review. I will leave this open for review for at least 18 hours, merging no earlier than Wed 02 Aug 13:00 GMT (08:00 my local). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
This is a PR into #495. This is a major refactor of that PR, so I wanted to give some space for this to be reviewed separately.
Big changes:
openfe --gather dg/ddg/leg
, with dg (absolute estimates from MLE) as default. Each table has its own set of columns.17.0 ± 0.058
when our actual estimate was at17.418
(so the actual estimate was well outside our reported error bars!) Now that reports as17.418 ± 0.058
.Also, although we still use the same TSV format, we now do with tools from the stdlib
csv
, which will make it much easier to provide other approaches in the future.Example of output from running these commands now
Biggest concern is that right now, if you combine RBFE and and RHFE results, you can't tell which are which in the DDG setup. I think I'd like to completely disallow combining those in one analysis; I'm not entirely sure that there's a good use case for that in a single output file.
Developers certificate of origin