Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split gather #502

Merged
merged 13 commits into from
Aug 2, 2023
Merged

Split gather #502

merged 13 commits into from
Aug 2, 2023

Conversation

dwhswenson
Copy link
Member

This is a PR into #495. This is a major refactor of that PR, so I wanted to give some space for this to be reviewed separately.

Big changes:

  • Instead of trying to dump everything into one table, we have openfe --gather dg/ddg/leg, with dg (absolute estimates from MLE) as default. Each table has its own set of columns.
  • Estimates are now reported to the same precision as the uncertainty estimates. Previously, we would report e.g, 17.0 ± 0.058 when our actual estimate was at 17.418 (so the actual estimate was well outside our reported error bars!) Now that reports as 17.418 ± 0.058.

Also, although we still use the same TSV format, we now do with tools from the stdlib csv, which will make it much easier to provide other approaches in the future.

Example of output from running these commands now
(openfe) Yvette:...tmp-inspect-results/results dwhs$ openfe gather results
ligand	DG(MLE) (kcal/mol)	uncertainty (kcal/mol)
lig_ejm_31	-0.086	0.046
lig_ejm_42	0.67	0.11
lig_ejm_46	-0.977	0.054
lig_ejm_47	-0.06	0.14
lig_ejm_48	0.528	0.092
lig_ejm_50	0.914	0.061
lig_ejm_43	2.02	0.18
lig_jmc_23	-0.682	0.095
lig_jmc_27	-1.08	0.11
lig_jmc_28	-1.250	0.076
(openfe) Yvette:...tmp-inspect-results/results dwhs$ openfe gather --report ddg results
ligand_i	ligand_j	DDG(i->j) (kcal/mol)	uncertainty (kcal/mol)
lig_ejm_31	lig_ejm_42	0.76	0.13
lig_ejm_31	lig_ejm_46	-0.891	0.065
lig_ejm_31	lig_ejm_47	0.02	0.15
lig_ejm_31	lig_ejm_48	0.614	0.089
lig_ejm_31	lig_ejm_50	1.000	0.044
lig_ejm_42	lig_ejm_43	1.35	0.16
lig_ejm_46	lig_jmc_23	0.295	0.087
lig_ejm_46	lig_jmc_27	-0.10	0.10
lig_ejm_46	lig_jmc_28	-0.273	0.060
(openfe) Yvette:...tmp-inspect-results/results dwhs$ openfe gather --report leg results
leg	ligand_i	ligand_j	DG(i->j) (kcal/mol)	uncertainty (kcal/mol)
complex	lig_ejm_31	lig_ejm_42	-14.95	0.12
solvent	lig_ejm_31	lig_ejm_42	-15.707	0.031
complex	lig_ejm_31	lig_ejm_46	-40.749	0.036
solvent	lig_ejm_31	lig_ejm_46	-39.858	0.054
complex	lig_ejm_31	lig_ejm_47	-27.81	0.13
solvent	lig_ejm_31	lig_ejm_47	-27.829	0.060
complex	lig_ejm_31	lig_ejm_48	-16.141	0.085
solvent	lig_ejm_31	lig_ejm_48	-16.755	0.026
complex	lig_ejm_31	lig_ejm_50	-57.333	0.038
solvent	lig_ejm_31	lig_ejm_50	-58.333	0.024
complex	lig_ejm_42	lig_ejm_43	-18.92	0.15
solvent	lig_ejm_42	lig_ejm_43	-20.277	0.026
complex	lig_ejm_46	lig_jmc_23	17.418	0.058
solvent	lig_ejm_46	lig_jmc_23	17.123	0.064
complex	lig_ejm_46	lig_jmc_27	15.813	0.086
solvent	lig_ejm_46	lig_jmc_27	15.914	0.053
complex	lig_ejm_46	lig_jmc_28	23.140	0.037
solvent	lig_ejm_46	lig_jmc_28	23.413	0.047

Biggest concern is that right now, if you combine RBFE and and RHFE results, you can't tell which are which in the DDG setup. I think I'd like to completely disallow combining those in one analysis; I'm not entirely sure that there's a good use case for that in a single output file.

Developers certificate of origin

@@ -59,10 +200,20 @@ def legacy_get_type(res_fn):
type=click.Path(dir_okay=True, file_okay=False,
path_type=pathlib.Path),
required=True)
@click.option(
'--report',
type=click.Choice(['dg', 'ddg', 'leg'], case_sensitive=False),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very open to other names here, both for --report and for dg/ddg/leg (maybe abs/rel/leg?)

Copy link
Contributor

@hannahbaumann hannahbaumann Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, dg/ddg are very intuitive. "Leg" was less intuitive for me, but I'm not sure what might be better, how others felt about it. Maybe "legs" could be better, since then you expect more than one?

Regarding precision of the values: In the Mobleylab we had the style guide to always report uncertainties with only one significant digit of precision, so e.g. 17.418 ± 0.058 would become 17.42 ± 0.06 or 17.418 ± 0.008 would remain 17.418 ± 0.008,... I'm not sure what the most common way of handling this is though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed "leg" (or even "legs") is rather undescriptive, especially to non-english speakers.

I'm not sure what a better word would be, "individual"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: uncertainties, I don't have strong feelings but I definitely would at least advise for setting the uncertainty to the same level of precision as the reported number (i.e. the amount of significant figures you truncate at indicates an implicit view of the uncertainty of your results, so 17.14 has an implicit minimum error of 0.01).

openfecli/commands/gather.py Outdated Show resolved Hide resolved
openfecli/commands/gather.py Show resolved Hide resolved
openfecli/commands/gather.py Show resolved Hide resolved
openfecli/commands/gather.py Show resolved Hide resolved
openfecli/commands/gather.py Show resolved Hide resolved
writer.writerow([ligA, ligB, DDGhyd, hyd_unc])

def _write_raw_dg(legs, writer):
writer.writerow(["leg", "ligand_i", "ligand_j", "DG(i->j) (kcal/mol)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the optimal case I would like this to write out all the individual replicas. Any thoughts on the necessary changes to results to ensure this? (I suspect it's a method we'd have to add at the gufe level)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be protocol-specific. Our default protocol has, in its outputs dict, outputs["unit_estimate"], which we could extract. But we make no promise that every unit will have this. Indeed, it doesn't make sense to do so: if you used a separate parametrization unit, there would be no meaning to that unit having a "unit_estimate" result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be something we need to add to the abstract class - a method for getting a breakdown by repeat.

dwhswenson and others added 2 commits July 27, 2023 14:17
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
default="dg", show_default=True,
help=(
"What data to report. 'dg' gives maximum-likelihood estimate of "
"asbolute deltaG, 'ddg' gives delta-delta-G, and 'leg' gives the "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"asbolute deltaG, 'ddg' gives delta-delta-G, and 'leg' gives the "
"asbolute deltaG, 'ddg' gives delta-delta-G, and 'legs' gives the "

ddg_legs? environments? transformations? raw?

@hannahbaumann @RiesBen please make a choice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for dg_raw, though it's not a strong preference.

@IAlibay IAlibay added this to the Release 0.12.0 milestone Jul 31, 2023
@dwhswenson
Copy link
Member Author

dwhswenson commented Aug 1, 2023

I've updated this to use the names dg (default), ddg, and dg-raw. It'll still need more updates to tests, but those will only be triggered for CI once this is merged into #495.

Since this is a PR into another PR, and tests won't trigger until it is in that PR, I might merge this without further review. I will leave this open for review for at least 18 hours, merging no earlier than Wed 02 Aug 13:00 GMT (08:00 my local).

@dwhswenson dwhswenson requested a review from IAlibay August 1, 2023 18:50
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@dwhswenson dwhswenson merged commit dbe79d2 into MLE_on_gather Aug 2, 2023
@dwhswenson dwhswenson deleted the split-gather branch August 2, 2023 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants