Split gather #502

dwhswenson · 2023-07-24T19:45:47Z

This is a PR into #495. This is a major refactor of that PR, so I wanted to give some space for this to be reviewed separately.

Big changes:

Instead of trying to dump everything into one table, we have openfe --gather dg/ddg/leg, with dg (absolute estimates from MLE) as default. Each table has its own set of columns.
Estimates are now reported to the same precision as the uncertainty estimates. Previously, we would report e.g, 17.0 ± 0.058 when our actual estimate was at 17.418 (so the actual estimate was well outside our reported error bars!) Now that reports as 17.418 ± 0.058.

Also, although we still use the same TSV format, we now do with tools from the stdlib csv, which will make it much easier to provide other approaches in the future.

Example of output from running these commands now

(openfe) Yvette:...tmp-inspect-results/results dwhs$ openfe gather results
ligand	DG(MLE) (kcal/mol)	uncertainty (kcal/mol)
lig_ejm_31	-0.086	0.046
lig_ejm_42	0.67	0.11
lig_ejm_46	-0.977	0.054
lig_ejm_47	-0.06	0.14
lig_ejm_48	0.528	0.092
lig_ejm_50	0.914	0.061
lig_ejm_43	2.02	0.18
lig_jmc_23	-0.682	0.095
lig_jmc_27	-1.08	0.11
lig_jmc_28	-1.250	0.076
(openfe) Yvette:...tmp-inspect-results/results dwhs$ openfe gather --report ddg results
ligand_i	ligand_j	DDG(i->j) (kcal/mol)	uncertainty (kcal/mol)
lig_ejm_31	lig_ejm_42	0.76	0.13
lig_ejm_31	lig_ejm_46	-0.891	0.065
lig_ejm_31	lig_ejm_47	0.02	0.15
lig_ejm_31	lig_ejm_48	0.614	0.089
lig_ejm_31	lig_ejm_50	1.000	0.044
lig_ejm_42	lig_ejm_43	1.35	0.16
lig_ejm_46	lig_jmc_23	0.295	0.087
lig_ejm_46	lig_jmc_27	-0.10	0.10
lig_ejm_46	lig_jmc_28	-0.273	0.060
(openfe) Yvette:...tmp-inspect-results/results dwhs$ openfe gather --report leg results
leg	ligand_i	ligand_j	DG(i->j) (kcal/mol)	uncertainty (kcal/mol)
complex	lig_ejm_31	lig_ejm_42	-14.95	0.12
solvent	lig_ejm_31	lig_ejm_42	-15.707	0.031
complex	lig_ejm_31	lig_ejm_46	-40.749	0.036
solvent	lig_ejm_31	lig_ejm_46	-39.858	0.054
complex	lig_ejm_31	lig_ejm_47	-27.81	0.13
solvent	lig_ejm_31	lig_ejm_47	-27.829	0.060
complex	lig_ejm_31	lig_ejm_48	-16.141	0.085
solvent	lig_ejm_31	lig_ejm_48	-16.755	0.026
complex	lig_ejm_31	lig_ejm_50	-57.333	0.038
solvent	lig_ejm_31	lig_ejm_50	-58.333	0.024
complex	lig_ejm_42	lig_ejm_43	-18.92	0.15
solvent	lig_ejm_42	lig_ejm_43	-20.277	0.026
complex	lig_ejm_46	lig_jmc_23	17.418	0.058
solvent	lig_ejm_46	lig_jmc_23	17.123	0.064
complex	lig_ejm_46	lig_jmc_27	15.813	0.086
solvent	lig_ejm_46	lig_jmc_27	15.914	0.053
complex	lig_ejm_46	lig_jmc_28	23.140	0.037
solvent	lig_ejm_46	lig_jmc_28	23.413	0.047

Biggest concern is that right now, if you combine RBFE and and RHFE results, you can't tell which are which in the DDG setup. I think I'd like to completely disallow combining those in one analysis; I'm not entirely sure that there's a good use case for that in a single output file.

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

now to update to make output cleaner

dwhswenson · 2023-07-24T19:54:58Z

openfecli/commands/gather.py

@@ -59,10 +200,20 @@ def legacy_get_type(res_fn):
                type=click.Path(dir_okay=True, file_okay=False,
                                         path_type=pathlib.Path),
                required=True)
+@click.option(
+    '--report',
+    type=click.Choice(['dg', 'ddg', 'leg'], case_sensitive=False),


I'm very open to other names here, both for --report and for dg/ddg/leg (maybe abs/rel/leg?)

For me, dg/ddg are very intuitive. "Leg" was less intuitive for me, but I'm not sure what might be better, how others felt about it. Maybe "legs" could be better, since then you expect more than one?

Regarding precision of the values: In the Mobleylab we had the style guide to always report uncertainties with only one significant digit of precision, so e.g. 17.418 ± 0.058 would become 17.42 ± 0.06 or 17.418 ± 0.008 would remain 17.418 ± 0.008,... I'm not sure what the most common way of handling this is though.

Agreed "leg" (or even "legs") is rather undescriptive, especially to non-english speakers.

I'm not sure what a better word would be, "individual"?

Re: uncertainties, I don't have strong feelings but I definitely would at least advise for setting the uncertainty to the same level of precision as the reported number (i.e. the amount of significant figures you truncate at indicates an implicit view of the uncertainty of your results, so 17.14 has an implicit minimum error of 0.01).

openfecli/commands/gather.py

IAlibay · 2023-07-26T10:16:54Z

openfecli/commands/gather.py

+            writer.writerow([ligA, ligB, DDGhyd, hyd_unc])
+
+def _write_raw_dg(legs, writer):
+    writer.writerow(["leg", "ligand_i", "ligand_j", "DG(i->j) (kcal/mol)",


I think in the optimal case I would like this to write out all the individual replicas. Any thoughts on the necessary changes to results to ensure this? (I suspect it's a method we'd have to add at the gufe level)

This may be protocol-specific. Our default protocol has, in its outputs dict, outputs["unit_estimate"], which we could extract. But we make no promise that every unit will have this. Indeed, it doesn't make sense to do so: if you used a separate parametrization unit, there would be no meaning to that unit having a "unit_estimate" result.

This might be something we need to add to the abstract class - a method for getting a breakdown by repeat.

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

IAlibay · 2023-07-31T15:26:28Z

openfecli/commands/gather.py

+    default="dg", show_default=True,
+    help=(
+        "What data to report. 'dg' gives maximum-likelihood estimate of "
+        "asbolute deltaG,  'ddg' gives delta-delta-G, and 'leg' gives the "


Suggested change

"asbolute deltaG, 'ddg' gives delta-delta-G, and 'leg' gives the "

"asbolute deltaG, 'ddg' gives delta-delta-G, and 'legs' gives the "

ddg_legs? environments? transformations? raw?

@hannahbaumann @RiesBen please make a choice

I vote for dg_raw, though it's not a strong preference.

…split-gather

…ther

… split-gather

dwhswenson · 2023-08-01T18:49:34Z

I've updated this to use the names dg (default), ddg, and dg-raw. It'll still need more updates to tests, but those will only be triggered for CI once this is merged into #495.

Since this is a PR into another PR, and tests won't trigger until it is in that PR, I might merge this without further review. I will leave this open for review for at least 18 hours, merging no earlier than Wed 02 Aug 13:00 GMT (08:00 my local).

IAlibay

lgtm!

dwhswenson added 4 commits July 24, 2023 09:58

rough draft refactor gather

f7c1860

version that seems to work

c368ca9

now to update to make output cleaner

improve estimate/uncertainty rounding/reporting

7d6887a

update to simplify output tables

5d181e5

dwhswenson commented Jul 24, 2023

View reviewed changes

dwhswenson requested review from RiesBen, IAlibay and hannahbaumann July 24, 2023 19:57

IAlibay requested changes Jul 26, 2023

View reviewed changes

dwhswenson and others added 2 commits July 27, 2023 14:17

use 1 digit in uncertainty; leg => legs

b3591f2

Apply suggestions from code review

515ad62

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

IAlibay requested changes Jul 31, 2023

View reviewed changes

IAlibay added this to the Release 0.12.0 milestone Jul 31, 2023

dwhswenson added 7 commits July 31, 2023 10:58

Tuple => tuple

16b2a34

Merge branch 'split-gather' of github.com:OpenFreeEnergy/openfe into …

352c972

…split-gather

add tests for uncertainty calc stuff

cc2feca

Add HyphenAwareChoice

bf0661a

Merge branch 'main' of github.com:OpenFreeEnergy/openfe into split-ga…

c7a49a7

…ther

clean up a bit of gather

d13e175

Merge branch 'MLE_on_gather' of github.com:OpenFreeEnergy/openfe into…

22fdbb5

… split-gather

dwhswenson requested a review from IAlibay August 1, 2023 18:50

IAlibay approved these changes Aug 2, 2023

View reviewed changes

dwhswenson merged commit dbe79d2 into MLE_on_gather Aug 2, 2023

dwhswenson deleted the split-gather branch August 2, 2023 14:15

dwhswenson mentioned this pull request Aug 4, 2023

Adds MLE estimates on gather command #495

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split gather #502

Split gather #502

dwhswenson commented Jul 24, 2023

dwhswenson Jul 24, 2023

hannahbaumann Jul 25, 2023 •

edited

Loading

IAlibay Jul 26, 2023

IAlibay Jul 26, 2023

IAlibay Jul 26, 2023

dwhswenson Aug 1, 2023

IAlibay Aug 2, 2023

IAlibay Jul 31, 2023

hannahbaumann Aug 1, 2023

dwhswenson commented Aug 1, 2023 •

edited

Loading

IAlibay left a comment

	"asbolute deltaG, 'ddg' gives delta-delta-G, and 'leg' gives the "
	"asbolute deltaG, 'ddg' gives delta-delta-G, and 'legs' gives the "

Split gather #502

Split gather #502

Conversation

dwhswenson commented Jul 24, 2023

Developers certificate of origin

Choose a reason for hiding this comment

hannahbaumann Jul 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwhswenson commented Aug 1, 2023 • edited Loading

IAlibay left a comment

Choose a reason for hiding this comment

hannahbaumann Jul 25, 2023 •

edited

Loading

dwhswenson commented Aug 1, 2023 •

edited

Loading