Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

label atoms in output pdb using bfactor column #483

Closed
wants to merge 5 commits into from

Conversation

richardjgowers
Copy link
Contributor

@richardjgowers richardjgowers commented Jul 10, 2023

our output PDB topology made it hard to distinguish between the contributions from different ligands. This (mis)uses the bfactor column to label as:

0.25 for unique A
0.50 for core
0.75 for unique B

Then "selecting" A/B is possible in downstream tools

0.25 for unique A
0.50 for core
0.75 for unique B
Comment on lines 541 to 542
# TODO: ??? Protein & crystals == 1.0
# cofactor = ???
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cofactors should be treated the same was as protein

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate we have to do this but yeah.

@codecov
Copy link

codecov bot commented Jul 10, 2023

Codecov Report

Patch coverage: 71.93% and project coverage change: -0.37 ⚠️

Comparison is base (9e6766e) 91.99% compared to head (0b42f8d) 91.63%.

❗ Current head 0b42f8d differs from pull request most recent head 0494b9d. Consider uploading reports for the commit 0494b9d to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #483      +/-   ##
==========================================
- Coverage   91.99%   91.63%   -0.37%     
==========================================
  Files         106      110       +4     
  Lines        6324     6542     +218     
==========================================
+ Hits         5818     5995     +177     
- Misses        506      547      +41     
Impacted Files Coverage Δ
openfe/utils/handle_trajectories.py 0.00% <0.00%> (ø)
openfecli/tests/commands/test_gather.py 100.00% <ø> (ø)
openfe/tests/protocols/test_openmmutils.py 96.72% <78.57%> (-3.28%) ⬇️
openfecli/commands/gather.py 92.04% <88.46%> (+6.12%) ⬆️
openfe/protocols/openmm_rfe/_rfe_utils/relative.py 82.98% <95.23%> (+0.92%) ⬆️
openfe/protocols/openmm_rfe/equil_rfe_methods.py 94.14% <100.00%> (+4.61%) ⬆️
openfe/protocols/openmm_rfe/equil_rfe_settings.py 86.82% <100.00%> (+5.54%) ⬆️
...tests/protocols/test_openmm_equil_rfe_protocols.py 89.79% <100.00%> (+0.21%) ⬆️
openfe/tests/protocols/test_rfe_tokenization.py 97.14% <100.00%> (ø)
openfe/tests/utils/test_remove_oechem.py 100.00% <100.00%> (ø)
... and 4 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -527,10 +527,29 @@ def run(self, *, dry=False, verbose=True,
)

# b. Write out a PDB containing the subsampled hybrid state
A_atoms = {hybrid_factory.old_to_hybrid_atom_map[i]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IAlibay Is it worth adding this to HTF? Or is it already there and I missed it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything you need should already be there under:

self._atom_classes = {'unique_old_atoms': set(),
'unique_new_atoms': set(),
'core_atoms': set(),
'environment_atoms': set()}

So you'd be looking for _atom_classes['unique_new_atoms'] and _atom_classes['unique_old_atoms'] and _atom_classes['core_atoms']

A_unique = list(hybrid_factory._atom_classes['unique_old_atoms'])
AB_core = list(hybrid_factory._atom_classes['core_atoms'])
B_unique = list(hybrid_factory._atom_classes['unique_new_atoms'])
protein = [] # TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't where you should be tracking your protein atoms imho. It's a PDB output so you have residue names, those should be used.

As previously mentioned - tracking crystallographic waters is also not an amazing idea since water exchange is a thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but even if water exchange is a thing, it's useful to know the origin of the water molecules, which were from addSolvent and which from the initial protein prep. The point here is to be able to understand how each of the Components that were added are propagated into the topology.

A_unique = list(A_atoms - B_atoms)
AB_core = list(A_atoms & B_atoms)
B_unique = list(B_atoms - A_atoms)
A_unique = list(hybrid_factory._atom_classes['unique_old_atoms'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to circle back on a similar comment you made last time re: extra assignments - Why do the assignment? Could you not just do the list call in the np.in1d?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, now it's a hideously long line that isn't easy to read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just... break up the line? formatting is a thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants