-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
label atoms in output pdb using bfactor column #483
Conversation
0.25 for unique A 0.50 for core 0.75 for unique B
# TODO: ??? Protein & crystals == 1.0 | ||
# cofactor = ??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cofactors should be treated the same was as protein
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate we have to do this but yeah.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #483 +/- ##
==========================================
- Coverage 91.99% 91.63% -0.37%
==========================================
Files 106 110 +4
Lines 6324 6542 +218
==========================================
+ Hits 5818 5995 +177
- Misses 506 547 +41
☔ View full report in Codecov by Sentry. |
@@ -527,10 +527,29 @@ def run(self, *, dry=False, verbose=True, | |||
) | |||
|
|||
# b. Write out a PDB containing the subsampled hybrid state | |||
A_atoms = {hybrid_factory.old_to_hybrid_atom_map[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IAlibay Is it worth adding this to HTF? Or is it already there and I missed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything you need should already be there under:
openfe/openfe/protocols/openmm_rfe/_rfe_utils/relative.py
Lines 439 to 442 in 987009e
self._atom_classes = {'unique_old_atoms': set(), | |
'unique_new_atoms': set(), | |
'core_atoms': set(), | |
'environment_atoms': set()} |
So you'd be looking for _atom_classes['unique_new_atoms']
and _atom_classes['unique_old_atoms']
and _atom_classes['core_atoms']
A_unique = list(hybrid_factory._atom_classes['unique_old_atoms']) | ||
AB_core = list(hybrid_factory._atom_classes['core_atoms']) | ||
B_unique = list(hybrid_factory._atom_classes['unique_new_atoms']) | ||
protein = [] # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't where you should be tracking your protein atoms imho. It's a PDB output so you have residue names, those should be used.
As previously mentioned - tracking crystallographic waters is also not an amazing idea since water exchange is a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but even if water exchange is a thing, it's useful to know the origin of the water molecules, which were from addSolvent and which from the initial protein prep. The point here is to be able to understand how each of the Components that were added are propagated into the topology.
A_unique = list(A_atoms - B_atoms) | ||
AB_core = list(A_atoms & B_atoms) | ||
B_unique = list(B_atoms - A_atoms) | ||
A_unique = list(hybrid_factory._atom_classes['unique_old_atoms']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to circle back on a similar comment you made last time re: extra assignments - Why do the assignment? Could you not just do the list call in the np.in1d
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, now it's a hideously long line that isn't easy to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just... break up the line? formatting is a thing
our output PDB topology made it hard to distinguish between the contributions from different ligands. This (mis)uses the bfactor column to label as:
0.25 for unique A
0.50 for core
0.75 for unique B
Then "selecting" A/B is possible in downstream tools