Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smturzo/rosetta gamess doc update #68

Open
wants to merge 47 commits into
base: vmullig/rosetta_gamess_documentation
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
7705cc0
Adding documentation for RosettaQM
smturzo May 1, 2024
bed9fb3
Writing tutorial about SFXN
smturzo May 2, 2024
b3a75c0
Taking over GeomOpt tutorial from Noora
smturzo May 3, 2024
d875b99
GeomOpt Tutorial
smturzo May 3, 2024
1af03df
Made figure for geomopt, need to make figure for GeomOpt region 2.
smturzo May 3, 2024
fc0218a
GeometryOptimizationRosettaQM.md
smturzo May 3, 2024
3edf558
GeometryOptimizationRosettaQM.md update
smturzo May 3, 2024
0b65e55
GeometryOptimizationRosettaQM.md update
smturzo May 3, 2024
c069698
GeometryOptimizationRosettaQM.md update
smturzo May 3, 2024
128edc1
GeometryOptimizationRosettaQM.md update
smturzo May 3, 2024
544d653
Updating geomopt tutorial
smturzo May 3, 2024
64c9460
Updating geomopt tutorial
smturzo May 3, 2024
f25b4e1
Updating geomopt tutorial
smturzo May 3, 2024
61ecfee
Updating geomopt tutorial
smturzo May 3, 2024
3121837
Updating geomopt tutorial
smturzo May 3, 2024
fbb1d8b
Updating geomopt tutorial
smturzo May 3, 2024
927bc58
Updating geomopt tutorial
smturzo May 3, 2024
a303a3c
Updating geomopt tutorial
smturzo May 3, 2024
0e60daa
Updating geomopt tutorial
smturzo May 3, 2024
ee618eb
Updating geomopt tutorial
smturzo May 3, 2024
087d209
Updating geomopt tutorial
smturzo May 3, 2024
eda5436
Updating geomopt tutorial
smturzo May 3, 2024
332e288
Updating geomopt tutorial
smturzo May 3, 2024
e4b6e21
Updating geomopt tutorial
smturzo May 3, 2024
7e2fe2f
Updating geomopt tutorial
smturzo May 3, 2024
c8fb86d
Updating geomopt tutorial
smturzo May 3, 2024
61483d1
Updating geomopt tutorial
smturzo May 3, 2024
40dc268
Updating geomopt tutorial
smturzo May 3, 2024
9d4dd77
Updating geomopt tutorial
smturzo May 3, 2024
46889f8
Updating geomopt tutorial
smturzo May 3, 2024
89b9a4c
Updating geomopt tutorial
smturzo May 3, 2024
952592a
Updating geomopt tutorial
smturzo May 3, 2024
14fce4b
markdown format issue brrrrr
smturzo May 3, 2024
babd0b0
markdown format issue brrrrr
smturzo May 3, 2024
4fad911
markdown format issue brrrrr
smturzo May 3, 2024
a193fbe
markdown format issue brrrrr
smturzo May 3, 2024
4a44cb1
markdown format issue brrrrr
smturzo May 3, 2024
2f1c86a
markdown format issue brrrrr
smturzo May 3, 2024
1cb98fd
markdown format issue brrrrr
smturzo May 3, 2024
a71bf3c
markdown format issue brrrrr
smturzo May 3, 2024
2ad77c1
markdown format issue brrrrr
smturzo May 3, 2024
dc4fafd
markdown format issue brrrrr
smturzo May 3, 2024
dbe4c10
markdown format issue brrrrr
smturzo May 3, 2024
63d63af
markdown format issue brrrrr
smturzo May 3, 2024
a8eb63b
markdown format issue brrrrr
smturzo May 3, 2024
257e2c0
markdown format issue brrrrr
smturzo May 3, 2024
d58ab59
Updated ScoreFunction dc
smturzo May 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added images/GeometryOptimizationRosettaQM_region1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/GeometryOptimizationRosettaQM_region2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/region1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
55 changes: 55 additions & 0 deletions rosetta_basics/scoring/ScoreFunction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Score Function (SCOREFXNS)
Documentation created by SM Bargeen Alam Turzo (bturzo@flatironinstitute.org), Flatiron Institute on 05/02/2024

<i>Note: This page documents the explanation of the ```SCOREFXNS``` in RosetttaScripts. And specifically it dives into how to set up a RosettaQM (Quantum Mechanics) Score Function.</i>

<b><i>Also note: The ```RosettaQM``` is currently unpublished. If you use this in your project, please contact Vikram Mulligan (vmulligan@flatironinstitute.org) to list all the others of RosettaQM in your project.</i></b>

[[_TOC_]]

## Purpose and algorithm
The Rosetta score function approximates the energy of a biomolecular conformation. The score function is generally composed of a linear combination of many energy terms (called score terms).
Each score term corresponds to some physical and/or experimental property of the biomolecule.

The score function is generally used in many application of biomolecular proptocol in Rosetta. Use cases could be in protein structure prediction, design, conformational sampling etc.

Therefore a score function can be used in conjunction with any existing design algorithm that uses the packer (including the [[FastDesign|FastDesignMover]] and [[PackRotamers|PackRotamersMover]] movers, or the [[fixbb]] application).
## User control

### RosettaScripts control
The initial block of a score function in RosettaScripts looks something like this:

```xml
<SCOREFXNS>
<ScoreFunction name="r15" weights="ref2015_cart.wts"/>
</SCOREFXNS>
```
In the above example RosettaScripts block, we define the main Score Function block as ```<SCOREFXNS> ... </SCOREFXNS>``` and then within this block we define the rosetta default scorefunction ref2015 with the sub-block ```<ScoreFunction>```
In the ``` <ScoreFunction> ``` sub-block we have the first tag ``` name ``` which is used to give this score function a name that we can use within our full protocol later on. In this case we are calling it ``` r15 ```.
The next tag is the ```weights``` tag. This is important and this is where we are telling which weights file to pick. Here the ```ref2015_cart.wts``` refers to the ref2015 weights file. More weights file can be found in this part of your Rosetta directory ``` main/database/scoring/weights ```.

Similarly you can also create a RosettaQM scorefunction with slightly more options. A template for setting up RosettaQM scorefunction is shown below:

```xml
<SCOREFXNS>
<ScoreFunction name="sfxn_qm" weights="empty" >
<Reweight scoretype="gamess_qm_energy" weight="1.0"/>
<Set gamess_electron_correlation_treatment="HF"
gamess_ngaussian="3" gamess_basis_set="N21"
gamess_npfunc="1" gamess_ndfunc="1"
gamess_threads="4" gamess_use_scf_damping="true"
gamess_use_smd_solvent="true" gamess_max_scf_iterations="50"
gamess_multiplicity="1" />
</ScoreFunction>
</SCOREFXNS>
```
In the above xml block. We are now defining a ScoreFunction block named `sfxn_qm` where we first assign an empty weights, so that no Rosetta energy weights is passed and then we reweight it with the option "gamess_qm_energy" that is energy method defined for the QM software package GAMESS (installed externally by the user).
Next we set up the QM method (Hartree Fock [HF]) with the basis set 3-21gand to run that in parallel over 4 threads. We are using the smd solvation model and a max SCF iterations of 50. Here the multiplicity is set to 1.
All these options are system dependent and the above template is an example of what needs to be done to set up a QM scorefunction. This SHOULD NOT be used as default settings to run RosettaQM.

To get a list of all the available tags and options for ScoreFunction you can excute the rosettascripts executable with the `-info ScoreFunction` flag. So it will look something like this:

``` rosetta_scripts.linuxgccrelease -info ScoreFunction ```


[[include:sfxn_loader_ScoreFunction_type]]
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
# Geometry Optimization using RosettaQM

## Summary
In this tutorial, we are going to give an example of geometry optimization for a protein.

## Tutorial
In this case, we have part of a zinc-finger protein. We want to optimize the geometry of protein:
* in proximity of zinc, with quantum mechanics,
* its neighbourhood with a lower level of theory quantum mechanics,
* and the rest of protein with Rosetta's score function.

<figure align="center">
<img src="../../../images/GeometryOptimizationRosettaQM_region1.png" alt="drawing" width="600"/>
<figcaption>Figure 1. Atomistic view of region 1 is shown, where the zinc atom is surrounded by two cystein residues and two histidine residues.</figcaption>
</figure>

Therefore, the protein will be seperated into three regions, which will be called `qm_region1`, `qm_region2` and `region3` (that will not be explicitly defined), respectively.

Then, different score function will be applied to each reigon with capping rules.

And then finally geometry optimization will be applied based on the regions and the capping rules applied for each region.

Residue selectors ae used to specify each region.qm_region1 is selected by residue numbers, and qm_region2 is selected by Neighborhood selector.

This selector compares the distance between beta carbons of selection (in this case, qm_region1).
If the distance is less than or equal to the threshold (in this case, 6.0A) it selects that residue.
This is selected with the tag `distance` and setting to 6.0 (units for this tag are in angstroms).
Setting the include_focus_in_subset tag to false means that qm_region2 excludes qm_region1. Figure 2. highights qm_region1 with orange, and qm_region2 with red and the remaining region (region3) with blue.
Note that `region3` is not defined because the MultiScoreFunction will automatically define what is remaining and define that as region3.


Therefore, the residue selector block looks like this:

```xml
<RESIDUE_SELECTORS>
<!-- Define regions of interest -->
<!-- qm_region1 is the region where the most computationally expensive calculation is going to take place. -->
<Index name="qm_region1" resnums="8,11,24,28,29"/>
<!-- qm_region2 is the region where the second most computationally expensive calculation is going to take place. -->
<!-- Within the Neighborhood qm_region2 defines the region that is within 6.0 from qm_region1 and not including the atoms in qm_region1 (which is done using the tag include_focus_in_subset -->
<Neighborhood name="qm_region2" selector="qm_region1" distance="6.0" include_focus_in_subset="false" />
</RESIDUE_SELECTORS>
```

<figure align="center">
<img src="../../../images/GeometryOptimizationRosettaQM_region2.png" alt="drawing" width="600"/>
<figcaption>Figure 2. Different regions of the multiscore function are shown here. qm_region1 defined with qm_hf is shown in orange. qm_region2 defined with qm_hf3c_fmo is shown in red and remaining region defined with Ref2015 is shown in blue. </figcaption>
</figure>
For each region, a different method defined within the ScoreFunction of SCOREFXNS of RosettaScripts needs to be used.
The following table gives a summary of that.
<br />

| Region's name | Score function |
|-----------------------------------------|------------------|
| qm_region1 (orange) | HF/3-21G |
| qm_region2 (red) | HF-3c/FMO |
| region3 (blue, not explicitly defined) | Rosetta ref2015 |

<br />
<br />

The following SCOREFXNS shows how each of the ScoreFunction block is set up for each region and how the MultiScoreFunction blocks are set up. To learn more about:
- ScoreFunction block: [Go here]
- MultiScoreFunction block: [Go here]

```xml
<SCOREFXNS>
<!-- In order to do multi-scale modeling, we need to set up the different score function that will used for the different regions of your system. -->

<!-- Here we define ref15 aka Rosetta's score function -->
<ScoreFunction name="r15" weights="ref2015_cart.wts"/>
<!-- Here we define the score function with RosettaQM setting for the most rigorous part aka qm_region1 -->
<!-- Although note that we don't mention qm_region1 in our score function. That is later.-->
<ScoreFunction name="qm_hf" weights="empty" >
<Reweight scoretype="gamess_qm_energy" weight="1.0" />
<Set gamess_electron_correlation_treatment="HF"
gamess_ngaussian="3" gamess_basis_set="N21"
gamess_npfunc="1" gamess_ndfunc="1"
gamess_threads="4" gamess_use_scf_damping="true"
gamess_use_smd_solvent="true" gamess_max_scf_iterations="%%scf_iter%%"
gamess_multiplicity="1" />
</ScoreFunction>

<!-- Here we define the score function with RosettaQM setting for the second rigorous part aka qm_region2 -->
<!-- Although note that we don't mention qm_region2 in our score function. That is later.-->
<!-- Also note that here we are using hybrid molecular orbital approx (HMO) -->
<ScoreFunction name="qm_hf3c_fmo" weights="empty" >
<Reweight scoretype="gamess_qm_energy" weight="1.0" />
<Set gamess_electron_correlation_treatment="SE"
gamess_basis_set="HF-3C" gamess_use_smd_solvent="true"
gamess_use_scf_damping="true"
gamess_use_h_bond_length_constraints="true"
gamess_h_bond_length_constraint_force="10"
gamess_threads="4" gamess_max_scf_iterations="%%scf_iter%%"
gamess_multiplicity="1" gamess_fmo_calculation="true" gamess_hybrid_molecular_orbital="HF-3c"
gamess_hybrid_molecular_orbital_file="/mnt/home/bturzo/ceph/Applications/gamess/tools/fmo/HMO/HMOs.txt"
gamess_max_fmo_monomer_scf_iterations="%%scf_iter%%" />
</ScoreFunction>

<!-- In the MucltiScoreFunction block is where you tell which region is going to be treated with which level calculation-->
<!-- Since this a onion layer style calculation dump_pdbs="true" option will dump out all the layers of system that has been defined-->
<MultiScoreFunction name="msfxn" dump_pdbs="true" >
<!-- SimpleCombinationRule is how to combine the region. And how to subtract of the energies from each region in order avoid double counting-->
<!-- There are other more complicated rules but that will not be discussed here and is a treat for another tutorial-->
<SimpleCombinationRule />
<!-- CappedBondResolutionRule by default does sensible capping, other capping rules available and can be passed with options-->
<Region scorefxn="qm_hf" residue_selector="qm_region1" >
<CappedBondResolutionRule/>
</Region>
<!--Same CappedBondResolutionRule for qm_region2-->
<Region scorefxn="qm_hf3c_fmo" residue_selector="qm_region2">
<CappedBondResolutionRule/>
</Region>
<!-- Whatever region is left after qm_region2 automatically gets defined to region3 and is now scored with r15 (as in rosetta score function) -->
<Region scorefxn="r15" >
<CappedBondResolutionRule/>
</Region>
</MultiScoreFunction>

</SCOREFXNS>
```

Next we are going to set up TASKOPERATIONS block that we will later use to do a quick FastRelax (defined in the MOVERS block) in Rosetta before QM geometry optimization.

For mone on how to setup:
- TASKOPERATIONS: [Go here]
- IncludeCurrent: [Go here]
- ExtraRotamersGeneric: [Go here]

```xml
<TASKOPERATIONS>
<IncludeCurrent name="include_current_rotamer" />
<!-- EXTRA ROTAMERS GENERIC during fast relax ex1 and ex2 adjust number of extra chi cuttoff-->
<ExtraRotamersGeneric name="extra_sample_rotamers" ex1="1" ex2="1" ex1aro="1" ex2aro="1"/>
</TASKOPERATIONS>
```


Next we are going to set up the `MOVERS` block. This is where we will call the `FastRelax`. We will also call the `GamessQMGeometryOptimizationMover` and pass the `MultiScoreFunction` that we named `msfxn` to the mover. We do this as follows:
For more on how to setup:
- `FastRelax`: [Go here]
- `GamessQMGeometryOptimizationMover`: [Go here]

```xml
<MOVERS>
<FastRelax name="fast_relax" scorefxn="r15" disable_design="true" task_operations="include_current_rotamer,extra_sample_rotamers" repeats="100" relaxscript="default" min_type="lbfgs_armijo_nonmonotone"/>
<GamessQMGeometryOptimizationMover name="qm_geo_opt_msfxn" gamess_threads="4" msfxn_name="msfxn" msfxn_classical_cartmin="true" />
</MOVERS>
```

Next we are goint to set up the `PROTOCOLS` block. In this block we are calling the `FastRelax` and then the `GamessQMGeometryOptimizationMover` sequentially.
So once `FastRelax` is done Rosetta will pass the Rosetta relaxed structure to `GamessQMGeometryOptimizationMover` for QM + RosettaMM geometry optimization.

```xml
<PROTOCOLS>
<Add mover="fast_relax" />
<Add mover="qm_geo_opt_msfxn"/>
</PROTOCOLS>
```

The full RosettaScripts looks like this:

```xml

<ROSETTASCRIPTS>
<RESIDUE_SELECTORS>
<!-- Define regions of interest -->
<!-- qm_region1 is the region where the most computationally expensive calculation is going to take place. -->
<Index name="qm_region1" resnums="8,11,24,28,29"/>
<!-- qm_region2 is the region where the second most computationally expensive calculation is going to take place. -->
<!-- Within the Neighborhood qm_region2 defines the region that is within 6.0A from qm_region1 and not including the atoms in qm_region1 (which is done using the tag include_focus_in_subset -->
<Neighborhood name="qm_region2" selector="qm_region1" distance="6.0" include_focus_in_subset="false" />
</RESIDUE_SELECTORS>
<SCOREFXNS>
<!-- In order to do multi-scale modeling, we need to set up the different score function that will used for the different regions of your system. -->

<!-- Here we define ref15 aka Rosetta's score function -->
<ScoreFunction name="r15" weights="ref2015_cart.wts"/>
<!-- Here we define the score function with RosettaQM setting for the most rigorous part aka qm_region1 -->
<!-- Although note that we don't mention qm_region1 in our score function. That is later.-->
<ScoreFunction name="qm_hf" weights="empty" >
<Reweight scoretype="gamess_qm_energy" weight="1.0" />
<Set gamess_electron_correlation_treatment="HF"
gamess_ngaussian="3" gamess_basis_set="N21"
gamess_npfunc="1" gamess_ndfunc="1"
gamess_threads="4" gamess_use_scf_damping="true"
gamess_use_smd_solvent="true" gamess_max_scf_iterations="%%scf_iter%%"
gamess_multiplicity="1" />
</ScoreFunction>

<!-- Here we define the score function with RosettaQM setting for the second rigorous part aka qm_region2 -->
<!-- Although note that we don't mention qm_region2 in our score function. That is later.-->
<!-- Also note that here we are using hybrid molecular orbital approx (HMO) -->
<ScoreFunction name="qm_hf3c_fmo" weights="empty" >
<Reweight scoretype="gamess_qm_energy" weight="1.0" />
<Set gamess_electron_correlation_treatment="SE"
gamess_basis_set="HF-3C" gamess_use_smd_solvent="true"
gamess_use_scf_damping="true"
gamess_use_h_bond_length_constraints="true"
gamess_h_bond_length_constraint_force="10"
gamess_threads="4" gamess_max_scf_iterations="%%scf_iter%%"
gamess_multiplicity="1" gamess_fmo_calculation="true" gamess_hybrid_molecular_orbital="HF-3c"
gamess_hybrid_molecular_orbital_file="/mnt/home/bturzo/ceph/Applications/gamess/tools/fmo/HMO/HMOs.txt"
gamess_max_fmo_monomer_scf_iterations="%%scf_iter%%" />
</ScoreFunction>

<!-- In the MucltiScoreFunction block is where you tell which region is going to be treated with which level calculation-->
<!-- Since this a onion layer style calculation dump_pdbs="true" option will dump out all the layers of system that has been defined-->
<MultiScoreFunction name="msfxn" dump_pdbs="true" >
<!-- SimpleCombinationRule is how to combine the region. And how to subtract of the energies from each region in order avoid double counting-->
<!-- There are other more complicated rules but that will not be discussed here and is a treat for another tutorial-->
<SimpleCombinationRule />
<!-- CappedBondResolutionRule by default does sensible capping, other capping rules available and can be passed with options-->
<Region scorefxn="qm_hf" residue_selector="qm_region1" >
<CappedBondResolutionRule/>
</Region>
<!--Same CappedBondResolutionRule for qm_region2-->
<Region scorefxn="qm_hf3c_fmo" residue_selector="qm_region2">
<CappedBondResolutionRule/>
</Region>
<!-- Whatever region is left after qm_region2 automatically gets defined to region3 and is now scored with r15 (as in rosetta score function) -->
<Region scorefxn="r15" >
<CappedBondResolutionRule/>
</Region>
</MultiScoreFunction>

</SCOREFXNS>
<TASKOPERATIONS>
<IncludeCurrent name="include_current_rotamer" />
<!-- EXTRA ROTAMERS GENERIC during fast relax ex1 and ex2 adjust number of extra chi cuttoff-->
<ExtraRotamersGeneric name="extra_sample_rotamers" ex1="1" ex2="1" ex1aro="1" ex2aro="1"/>
</TASKOPERATIONS>
<SIMPLE_METRICS>
</SIMPLE_METRICS>
<MOVERS>
<FastRelax name="fast_relax" scorefxn="r15" disable_design="true" task_operations="include_current_rotamer,extra_sample_rotamers" repeats="100" relaxscript="default" min_type="lbfgs_armijo_nonmonotone"/>
<GamessQMGeometryOptimizationMover name="qm_geo_opt_msfxn" gamess_threads="4" msfxn_name="msfxn" msfxn_classical_cartmin="true" />
</MOVERS>
<PROTOCOLS>
<Add mover="fast_relax" />
<Add mover="qm_geo_opt_msfxn"/>
</PROTOCOLS>
</ROSETTASCRIPTS>

```


In order to run this we will run it with `rosetta_scripts.cxx11thread.linuxgccrelease` executable found in the bin folder (once Rosetta has been compiled) with the following flags:
```
-in:file:fullatom
-in:auto_setup_metals true
-relax:constrain_relax_to_start_coords true
-coord_constrain_sidechains false
-ramp_constraints false
-in:file:s ./ZF_Cys2His2_0001_0001.pdb
-in:file:native ./ZF_Cys2His2_0001_0001.pdb
-out:prefix ./geom_opt_
-out:file:scorefile ./geom_opt.sc
-parser:protocol ./geom_opt2.xml
-parser:script_vars scf_iter=50
-rosetta_GAMESS_bridge_temp_directory ./temp/
-quantum_mechanics:GAMESS:gamess_error_on_nonideal_bond_length_after_opt false
-clean_rosetta_GAMESS_bridge_temp_directory false
-GAMESS:gamess_qm_energy_geo_opt true
-GAMESS:default_max_scf_iterations 50
-GAMESS:GAMESS_path ./gamess_openmp_2023_09/
-GAMESS:GAMESS_executable_version 00
```
So we can save the above flags in `flags file` called `QMGEOMOPT.flags` and excute the command as such:
```
rosetta_scripts.cxx11thread.linuxgccrelease QMGEOMOPT.flags
```