Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft of "yank analyze cluster" CLI #1020

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

jchodera
Copy link
Member

This PR adds an early experimental version of the yank analyze cluster CLI option needed by @steven-albanese for analyzing Id1 data.

The algorithm:

  • Compute per-snapshot weights (using MBAR) representing the relative weight of each snapshot in the fully interacting state
  • Cluster the remaining snapshots
  • Assign relative populations to the clusters
  • Sort clusters by population, writing only most populous clusters
  • Sample representative snapshots from the clusters proportional to their weights, writing out PDB files
  • Write out cluster populations

The code could use a lot of refactoring. It currently accesses the complex.nc NetCDF directly, but this could be greatly simplified as we reorganize and refactor the MultiStateAnalyzer to expose smaller operational chunks via an API.

@jchodera
Copy link
Member Author

@Lnaden and @andrrizzi : Maybe you can can pull in the latest changes from master since this was opened and review this at some point?

@jchodera
Copy link
Member Author

Alan Graves suggests we add an option to cluster the whole receptor+ligand complex coordinates, rather than just the ligand coordinates, in case the ligand and protein change conformation together.

@jchodera
Copy link
Member Author

jchodera commented Jul 20, 2018

Here's the CLI help for @sgill2:

yank analyze cluster --refpdb=REFPDB --complexnetcdf=FILEPATH [--prefix=PREFIX] [--filter=FILTERDIST] [--cutoff=CUTOFFDIST] [--nsnapshots=NSNAPSHOTS] [--threshold=THRESHOLD] [-v | --verbose]
 
Cluster Required Arguments:
  --refpdb=REFPDB               Reference PDB filename for solvated complex
  --complexnetcdf=FILEPATH      Path to the complex analysis NetCDF file
  --prefix=PREFIX               Prefix to use for output cluster PDB files and populations (default: cluster)
  --filter=FILTERDIST           Discard snapshots where the ligand is farther than this minimum heavy atom distance from the protein, in nanometers (default: 0.3)
  --cutoff=CUTOFFDIST           Heavy-atom RMSD separation between clusters, in nanometers (default: 0.3)
  --nsnapshots=NUM_SNAPSHOTS    Number of snapshots per cluster to write (default: 5)
  --cluster_filter_threshold=THRESHOLD    Threshold to use for which clusters to include (default: 0.95)

Copy link
Member

@steven-albanese steven-albanese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really comment on the code itself, but I was able to use this for the id1 project. I don't think I ran into any issues or bugs, and the results looked reasonable to me.

@jchodera
Copy link
Member Author

Thanks! Would be great to get this into the next feature release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants