Add draft of "yank analyze cluster" CLI #1020

jchodera · 2018-06-18T07:08:43Z

This PR adds an early experimental version of the yank analyze cluster CLI option needed by @steven-albanese for analyzing Id1 data.

The algorithm:

Compute per-snapshot weights (using MBAR) representing the relative weight of each snapshot in the fully interacting state
Cluster the remaining snapshots
Assign relative populations to the clusters
Sort clusters by population, writing only most populous clusters
Sample representative snapshots from the clusters proportional to their weights, writing out PDB files
Write out cluster populations

The code could use a lot of refactoring. It currently accesses the complex.nc NetCDF directly, but this could be greatly simplified as we reorganize and refactor the MultiStateAnalyzer to expose smaller operational chunks via an API.

jchodera · 2018-07-20T17:46:08Z

@Lnaden and @andrrizzi : Maybe you can can pull in the latest changes from master since this was opened and review this at some point?

jchodera · 2018-07-20T17:48:28Z

Alan Graves suggests we add an option to cluster the whole receptor+ligand complex coordinates, rather than just the ligand coordinates, in case the ligand and protein change conformation together.

jchodera · 2018-07-20T17:49:34Z

Here's the CLI help for @sgill2:

yank analyze cluster --refpdb=REFPDB --complexnetcdf=FILEPATH [--prefix=PREFIX] [--filter=FILTERDIST] [--cutoff=CUTOFFDIST] [--nsnapshots=NSNAPSHOTS] [--threshold=THRESHOLD] [-v | --verbose]
 
Cluster Required Arguments:
  --refpdb=REFPDB               Reference PDB filename for solvated complex
  --complexnetcdf=FILEPATH      Path to the complex analysis NetCDF file
  --prefix=PREFIX               Prefix to use for output cluster PDB files and populations (default: cluster)
  --filter=FILTERDIST           Discard snapshots where the ligand is farther than this minimum heavy atom distance from the protein, in nanometers (default: 0.3)
  --cutoff=CUTOFFDIST           Heavy-atom RMSD separation between clusters, in nanometers (default: 0.3)
  --nsnapshots=NUM_SNAPSHOTS    Number of snapshots per cluster to write (default: 5)
  --cluster_filter_threshold=THRESHOLD    Threshold to use for which clusters to include (default: 0.95)

steven-albanese

I can't really comment on the code itself, but I was able to use this for the id1 project. I don't think I ran into any issues or bugs, and the results looked reasonable to me.

jchodera · 2018-07-30T17:19:15Z

Thanks! Would be great to get this into the next feature release!

Add draft of "yank analyze cluster"

33eaed9

jchodera requested review from Lnaden and steven-albanese June 18, 2018 07:08

steven-albanese approved these changes Jul 30, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add draft of "yank analyze cluster" CLI #1020

Add draft of "yank analyze cluster" CLI #1020

jchodera commented Jun 18, 2018

jchodera commented Jul 20, 2018

jchodera commented Jul 20, 2018

jchodera commented Jul 20, 2018 •

edited

Loading

steven-albanese left a comment

jchodera commented Jul 30, 2018

Add draft of "yank analyze cluster" CLI #1020

Are you sure you want to change the base?

Add draft of "yank analyze cluster" CLI #1020

Conversation

jchodera commented Jun 18, 2018

jchodera commented Jul 20, 2018

jchodera commented Jul 20, 2018

jchodera commented Jul 20, 2018 • edited Loading

steven-albanese left a comment

Choose a reason for hiding this comment

jchodera commented Jul 30, 2018

jchodera commented Jul 20, 2018 •

edited

Loading