Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add O-Mannosylation to Sails #2

Merged
merged 14 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/Writerside/images/example-snfg.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
29 changes: 16 additions & 13 deletions docs/Writerside/topics/Automated-Glycan-Building.topic
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@
</code-block>

<p>Advanced Example:</p>
Attempt to build N-glycans into ABCD.cif with structure factors.
Attempt to build O-mannoses into ABCD.cif with structure factors.
<code-block lang="shell">
sails -pdbin ABCD.cif -mtzin ABCD-sf.cif.gz -cycles 4 -colin-fo F,SIGF -nglycan
sails -pdbin ABCD.cif -mtzin ABCD-sf.cif.gz -type o-mannosylate -cycles 1 -colin-fo F,SIGF
</code-block>
</chapter>

Expand All @@ -51,30 +51,33 @@
</def>
<def title="-pdbout">
Path to output CIF file.
<p>Default: sails-model-out.cif</p>
<p>Default: <code>sails-model-out.cif</code></p>
</def>
<def title="-mtzout">
Path to output MTZ file.
<p>Default: sails-refln-out.mtz</p>
<p>Default: <code>sails-refln-out.mtz</code></p>
</def>
<def title="-logout">
Path to output MTZ file.
<p>Default: sails-log.json</p>
<p>Default: <code>sails-log.json</code></p>
</def>
<def title="-cycles">
Number of internal cycles to run.
<p>Default: 2</p>
<p>Default: <code>2</code></p>
</def>
<def title="-n-glycan">
Build N-glycans into the given structure
<p>Default Option</p>
<def title="-type">
Type of glycosylation
<p>Options: </p>
<list>
<li><code>n-glycosylate</code></li>
<li><code>c-glycosylate</code></li>
<li><code>o-mannosylate</code></li>
</list>
<p>Default: <code>n-glycosylate</code></p>
</def>
<def title="-c-glycan">
Build C-glycans (tryptophan mannosylation) into the given structure
</def>
<def title="-colin-fo">
Comma separated SF observations and associated uncertainty.
<p>Default: FP,SIGFP</p>
<p>Default: <code>FP,SIGFP</code></p>
</def>
<def title="-colin-fwt">
Comma separated SF weight and associated phase.
Expand Down
8 changes: 7 additions & 1 deletion docs/Writerside/topics/Generate-SNFG-Diagrams.topic
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,15 @@
title="Generate SNFG Diagrams" id="Generate-SNFG-Diagrams" help-id="Generate-SNFG-Diagrams">

<p>
Sails provides a fast utility tool to generate SNFG diagrams using Sails' internal diagram engine. A specific
Sails provides a fast utility tool to generate Symbol Nomeclature for Glycans Diagram (SNFG) diagrams using Sails' internal diagram engine. A specific
site can be specified with -chain and -seqid parameters, or, all the detectable SNFGs will be generated.
</p>

<p>
Example SNFG Diagram for a high mannose glycan
<img src="example-snfg.png" alt="Symbol Nomeclature for Glycans Diagram for a high mannose structure"/>
</p>

<chapter title="Command" id="command">
<p>Syntax:</p>

Expand All @@ -27,6 +32,7 @@
sails-snfg -model built.cif --all -snfgout snfgs
</code-block>
</chapter>


<chapter title="Options" id="options">
<p>Describe what each option is used for:</p>
Expand Down
2 changes: 2 additions & 0 deletions package/src/bindings/python_sails.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ NB_MODULE(sails_module, m) {

m.def("n_glycosylate_from_objects", &n_glycosylate, "structure"_a, "mtz"_a, "cycles"_a, "resource_dir"_a, "verbose"_a);
m.def("c_glycosylate_from_objects", &c_glycosylate, "structure"_a, "mtz"_a, "cycles"_a, "resource_dir"_a, "verbose"_a);
m.def("o_mannosylate_from_objects", &o_mannosylate, "structure"_a, "mtz"_a, "cycles"_a, "resource_dir"_a, "verbose"_a);

m.def("test_snfg", &test);

m.def("get_snfg", &get_snfg, "chain"_a, "seqid"_a , "structure"_a, "resource_dir"_a);
Expand Down
4 changes: 2 additions & 2 deletions package/src/cpp/sails-density.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ void Sails::Density::recalculate_map(gemmi::Structure &structure) {
new_mtz.add_column("FWT", 'F', -1, -1, true);
new_mtz.add_column("PHWT", 'P', -1, -1, true);
new_mtz.add_column("DELFWT", 'F', -1, -1, true);
new_mtz.add_column("PHDELWT", 'P', -1, -1, true);;
new_mtz.add_column("PHDELWT", 'P', -1, -1, true);
new_mtz.set_data(recalculated_data.data(), recalculated_data.size());
new_mtz.ensure_asu();

Expand Down Expand Up @@ -217,7 +217,7 @@ void Sails::Density::calculate_po_pc_map(gemmi::Structure &structure) {
new_mtz.add_dataset("SAILS");
new_mtz.add_base();
new_mtz.add_column("FDIFFCALC", 'F', -1, -1, true);
new_mtz.add_column("PDIFFCALC", 'P', -1, -1, true);;
new_mtz.add_column("PDIFFCALC", 'P', -1, -1, true);
new_mtz.set_data(recalculated_data.data(), recalculated_data.size());
new_mtz.ensure_asu();

Expand Down
8 changes: 3 additions & 5 deletions package/src/cpp/sails-linkage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -134,15 +134,13 @@ void Sails::Model::remove_leaving_atom(Sails::LinkageData &data, gemmi::Residue
void Sails::Model::add_sugar_to_structure(const Sugar *terminal_sugar, SuperpositionResult &favoured_addition,
ChainType &chain_type) {
int chain_idx = terminal_sugar->site.chain_idx;
int residue_idx = terminal_sugar->site.residue_idx;

if (chain_type == protein) {
const size_t last_chain_idx = structure->models[terminal_sugar->site.model_idx].chains.size();
chain_idx = static_cast<int>(last_chain_idx);
gemmi::Chain chain;
chain.name = Utils::get_next_string(
structure->models[terminal_sugar->site.model_idx].chains[last_chain_idx - 1].name);
residue_idx = -1;
structure->models[terminal_sugar->site.model_idx].chains.emplace_back(chain);
}

Expand Down Expand Up @@ -220,8 +218,8 @@ std::optional<Sails::SuperpositionResult> Sails::Model::add_residue(
auto library_monomer = get_monomer(data.acceptor, true);

if (!library_monomer.has_value()) { throw std::runtime_error("Could not get required monomer, "
"ensure that CCP4 monomer library is sourced."
"If you have a local monomer library, ensure that you"
"ensure that CCP4 monomer library is sourced. "
"If you have a local monomer library, ensure that you "
"have CLIBD set"); }
auto reference_library_monomer = gemmi::Residue(library_monomer.value());

Expand Down Expand Up @@ -338,7 +336,7 @@ void Sails::Model::extend_if_possible(Density &density, bool debug, ChainType &c
if (!opt_result.has_value()) {
if (debug) print_rejection_log();
continue;
};
}

possible_additions[data.donor_number].emplace_back(opt_result.value());

Expand Down
43 changes: 31 additions & 12 deletions package/src/cpp/sails-sequence.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,25 @@
#include "../include/sails-sequence.h"

Sails::Glycosites Sails::find_n_glycosylation_sites(const gemmi::Structure &structure) {

std::vector<Glycosite> glycosites;
auto models = structure.children();
for (int model_idx = 0; model_idx < models.size(); model_idx++) {

auto chains = models[model_idx].children();
for (int chain_idx = 0; chain_idx < chains.size(); chain_idx++) {

auto residues = chains[chain_idx].children();

// if there are less than three residues, skip
if (residues.size() < 3) continue;

for (int residue_idx = 0; residue_idx < residues.size() - 2; residue_idx++) {
char first = gemmi::find_tabulated_residue(residues[residue_idx].name).one_letter_code;
if (first != 'N') {continue;}
if (first != 'N') { continue; }

char third = gemmi::find_tabulated_residue(residues[residue_idx + 2].name).one_letter_code;
if (third != 'S' && third != 'T') {continue;}
if (third != 'S' && third != 'T') { continue; }

char second = gemmi::find_tabulated_residue(residues[residue_idx + 1].name).one_letter_code;
if (second == 'P') {continue;}
if (second == 'P') { continue; }

glycosites.emplace_back(model_idx, chain_idx, residue_idx);
}
Expand All @@ -39,24 +36,46 @@ Sails::Glycosites Sails::find_c_glycosylation_sites(const gemmi::Structure &stru
std::vector<Glycosite> glycosites;
auto models = structure.children();
for (int model_idx = 0; model_idx < models.size(); model_idx++) {

auto chains = models[model_idx].children();
for (int chain_idx = 0; chain_idx < chains.size(); chain_idx++) {

auto residues = chains[chain_idx].children();
// if there are less than three residues, skip
if (residues.size() < 4) continue;

for (int residue_idx = 0; residue_idx < residues.size() - 3; residue_idx++) {

char first = gemmi::find_tabulated_residue(residues[residue_idx].name).one_letter_code;
if (first != 'W') {continue;}
if (first != 'W') { continue; }

char fourth = gemmi::find_tabulated_residue(residues[residue_idx + 3].name).one_letter_code;
if (fourth != 'W') {continue;}
if (fourth != 'W') { continue; }

glycosites.emplace_back(model_idx, chain_idx, residue_idx);
glycosites.emplace_back(model_idx, chain_idx, residue_idx+3);
glycosites.emplace_back(model_idx, chain_idx, residue_idx + 3);
}
}
}

return glycosites;
}

Sails::Glycosites Sails::find_o_mannosylation_sites(const gemmi::Structure &structure,
std::map<Glycosite, double> &solvent_accessibility_map) {
std::vector<Glycosite> glycosites;
auto models = structure.children();
for (int model_idx = 0; model_idx < models.size(); model_idx++) {
auto chains = models[model_idx].children();
for (int chain_idx = 0; chain_idx < chains.size(); chain_idx++) {
auto residues = chains[chain_idx].children();
// if there are less than three residues, skip
if (residues.size() < 4) continue;
for (int residue_idx = 0; residue_idx < residues.size() - 3; residue_idx++) {
std::string name = residues[residue_idx].name;
if (name != "SER" && name != "THR") continue;
Glycosite g = {model_idx, chain_idx, residue_idx};
double solvent_accessibility = solvent_accessibility_map[g];
if (solvent_accessibility > 10) {
glycosites.emplace_back(g);
}
}
}
}
Expand Down
9 changes: 0 additions & 9 deletions package/src/cpp/sails-telemetry.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,4 @@ std::optional<std::string> Sails::Telemetry::format_log(gemmi::Structure *struct
return stream.str();
}
return std::nullopt;

// for (const auto &[cycle, entries]: log) {
// std::cout << "Cycle " << cycle << "\n";
// for (const auto &entry: entries) {
// std::cout << "Added " << entry.residue_id << "\t" << entry.rscc_score << "\t" << entry.rsr_score << "\t"
// << entry.dds_score << "\n";
// }
// std::cout << std::endl;
// }
}
2 changes: 1 addition & 1 deletion package/src/cpp/sails-topology.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ void Sails::Topology::find_residue_near_donor(Glycosite &glycosite, Glycan &glyc
// }
// std::cout << std::endl;
continue;
};
}

// Add the sugar, and then linkage
// This is required to ensure the sugar objects live until the Glycan goes out of scope.
Expand Down
54 changes: 31 additions & 23 deletions package/src/cpp/sails.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include <chrono>
#include <iostream>
#include <src/include/sails-gemmi-bindings.h>
#include <src/include/sails-solvent.h>


void print_rejection_dds(const Sails::Glycosite& s1, const Sails::Glycosite& s2, gemmi::Structure* structure, float score) {
Expand All @@ -30,9 +31,10 @@ void print_removal_rscc(const gemmi::Residue& residue, float rscc) {
std::cout << "Removing " << Sails::Utils::format_residue_key(&residue) << " because of low RSCC =" << rscc << std::endl;
}

void remove_erroneous_sugars(gemmi::Structure *structure, Sails::Density *density, Sails::Glycan *glycan, bool debug) {
constexpr float rscc_threshold = 0.5;
constexpr float dds_threshold = 1.1;
void remove_erroneous_sugars(gemmi::Structure *structure, Sails::Density *density, Sails::Glycan *glycan, bool strict,
bool debug) {
const float rscc_threshold = strict ? 0.65: 0.5;
const float dds_threshold = strict ? 1.0: 1.1;

std::vector<Sails::Sugar *> to_remove;
for (const auto &[fst, snd]: *glycan) {
Expand Down Expand Up @@ -96,8 +98,8 @@ void check_spacegroup(gemmi::Mtz* mtz, gemmi::Structure* structure) {
if (mtz->spacegroup_name.empty()) mtz->spacegroup_name = structure->spacegroup_hm;
}

Sails::Output run_cycle(Sails::Glycosites& glycosites, gemmi::Structure &structure, Sails::MTZ &sails_mtz, int cycles, std::string &
resource_dir, bool verbose) {
Sails::Output run_cycle(Sails::Glycosites &glycosites, gemmi::Structure &structure, Sails::MTZ &sails_mtz, int cycles,
std::string &resource_dir, bool strict, bool verbose) {

std::string data_file = resource_dir + "/data.json";
Sails::JSONLoader loader = {data_file};
Expand Down Expand Up @@ -153,7 +155,7 @@ Sails::Output run_cycle(Sails::Glycosites& glycosites, gemmi::Structure &structu

// std::cout << "Attempting removal at " << Sails::Utils::format_residue_from_site(glycosite, &structure) << std::endl;
Sails::Glycan old_glycan = glycan;
remove_erroneous_sugars(&structure, &density, &glycan, verbose);
remove_erroneous_sugars(&structure, &density, &glycan, strict, verbose);

topology.set_structure(&structure); // need to update neighbor search after removing n residues
Sails::Glycan new_glycan = topology.find_glycan_topology(glycosite);
Expand Down Expand Up @@ -189,13 +191,21 @@ Sails::Output run_cycle(Sails::Glycosites& glycosites, gemmi::Structure &structu
Sails::Output n_glycosylate(gemmi::Structure &structure, Sails::MTZ &sails_mtz, int cycles, std::string &resource_dir,
bool verbose) {
auto glycosites = Sails::find_n_glycosylation_sites(structure);
return run_cycle(glycosites, structure, sails_mtz, cycles, resource_dir, verbose);
return run_cycle(glycosites, structure, sails_mtz, cycles, resource_dir, false, verbose);
}

Sails::Output c_glycosylate(gemmi::Structure &structure, Sails::MTZ &sails_mtz, int cycles, std::string &resource_dir,
bool verbose) {
auto glycosites = Sails::find_c_glycosylation_sites(structure);
return run_cycle(glycosites, structure, sails_mtz, cycles, resource_dir, verbose);
return run_cycle(glycosites, structure, sails_mtz, cycles, resource_dir, false, verbose);
}

Sails::Output o_mannosylate(gemmi::Structure &structure, Sails::MTZ &sails_mtz, int cycles, std::string &resource_dir,
bool verbose) {
Sails::SolventAccessibility sa = Sails::SolventAccessibility(&structure);
Sails::SolventAccessibility::SolventAccessibilityMap sa_map = sa.calculate_solvent_accessibility();
auto glycosites = Sails::find_o_mannosylation_sites(structure, sa_map);
return run_cycle(glycosites, structure, sails_mtz, cycles, resource_dir, false, verbose);
}

std::string get_snfg(std::string chain, int seqid, gemmi::Structure& structure, std::string& resource_dir) {
Expand Down Expand Up @@ -244,28 +254,26 @@ std::map<std::string, std::string> get_all_snfgs(gemmi::Structure& structure, st
}

void test() {
const std::string path = "testing/test_data/5fji/5FJI.cif";
const std::string mtz_path = "testing/test_data/5fji/5fji.mtz";

const std::string path = "testing/test_data/4ax7/4AX7_deglycosylated.cif";
const std::string mtz_path = "testing/test_data/4ax7/4AX7.mtz";
gemmi::Mtz mtz = gemmi::read_mtz_file(mtz_path);
auto smtz = Sails::form_sails_mtz(mtz, "FP", "SIGFP");
gemmi::Structure structure = gemmi::read_structure_file(path);

std::string data_file = "package/src/sails/data/data.json";
Sails::JSONLoader loader = {data_file};
Sails::ResidueDatabase residue_database = loader.load_residue_database();

auto snfg = Sails::SNFG(&structure, &residue_database);
Sails::Topology topology = {&structure, residue_database};
auto glycosites = Sails::find_n_glycosylation_sites(structure);
Sails::Density density = Sails::Density(mtz);
density.load_hkl("FP", "SIGFP");
density.recalculate_map(structure);

for (auto& site: glycosites) {
// auto site = glycosites[4];
auto glycan = topology.find_glycan_topology(site);
if (glycan.empty()) continue;
std::string snfg_path = "snfgs/" + Sails::Utils::format_residue_from_site(site, &structure) + ".svg";
std::ofstream f(snfg_path);
f << snfg.create_snfg(glycan, site);
f.close();
}

// auto o = find_o_mannosylation_sites(structure, sa_map);
// std::string a = "package/src/sails/data";
// auto output = run_cycle(o, structure, smtz, 1, a, true, true);
// Sails::Utils::save_structure_to_file(output.structure, "o-mannose-strict.cif");
// std::cout << output.log << std::endl;
}

// testbed
Expand Down
2 changes: 1 addition & 1 deletion package/src/include/sails-glycan.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ namespace Sails {
acceptor_atom(acceptor_atom) {

std::string donor_number_s = {donor_atom[donor_atom.size()-1]};
donor_number = std::stoi(donor_number_s);
donor_number = std::isdigit(donor_number_s[0]) ? std::stoi(donor_number_s): 1;
}

Sugar *donor_sugar;
Expand Down
4 changes: 4 additions & 0 deletions package/src/include/sails-sequence.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ namespace Sails {
* If no C-glycosylation sites are found, an empty vector is returned.
*/
Glycosites find_c_glycosylation_sites(const gemmi::Structure &structure);



Glycosites find_o_mannosylation_sites(const gemmi::Structure& structure, std::map<Glycosite, double>& solvent_accessibility_map);
}

#endif //SAILS_SAILS_SEQUENCE_H
2 changes: 1 addition & 1 deletion package/src/include/sails-solvent.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ namespace Sails {
*
* @return A map containing the solvent accessibility value for each glycosite.
*/
std::map<Glycosite, double> calculate_solvent_accessibility();
[[nodiscard]] std::map<Glycosite, double> calculate_solvent_accessibility();

/**
* @brief Calculates the bounding box of the structure.
Expand Down
Loading