Skip to content

Definitional Hashes

ChemMitch edited this page Mar 1, 2022 · 2 revisions

[For technical and scientific users]
Each substances has a definitional hash that provides a concise, machine-readable definition of the factors that make each substance unique. The idea is that 2 substances that share the same definitional hash are probably equivalent.

The definitional hash consists of a collection of DefinitionalElement objects (key/value/layer triplets)

  • Key – what the factor is
  • Value – factor’s specific result for one substance
  • Layer – how central the factor is to our thinking about this substance (currently, either 1 or 2)

Each substance type has its own keys that define its definitional hash.

Chemical:

  • Stereo insensitive hash - a hash of the chemical structure that ignores stereochemistry (layer 1)
  • Exact hash - a hash of the chemical structure that includes stereochemistry (layer 2)
  • Stereochemistry - (Absolute|Achiral|Epimeric|Mixed|Racemic|Unknown) (layer 2)
  • Optical Activity -(+|-|+/-|Unspecified|None) (layer 2)
  • Moieties (repeat Stereo insensitive hash, Exact hash, Stereochemistry, and Optical Activity, plus amount for each fragment).

Mixture:

  • ID of each component (layer 1)
  • Type (any/all...) of each component (layer 2)
  • Parent substance ID (layer 2)
  • Modifications (structural, agent and physical; see explanation, below) (layer 2)

Nucleic Acid:

  • Sequence of bases in each subunit (layer 1)
  • Linkages (including sites) (layer 2)
  • Sugars (sugar identity and sites) (layer 2)
  • Modifications (structural, agent and physical; see explanation, below) (layer 2)

Polymer:

  • Monomer IDs (layer 1)
  • Monomer amounts (layer 2)
  • Structural units' exact structure hash (layer 1)
  • Structural units' amounts (layer 2)
  • Modifications (structural, agent and physical; see explanation, below) (layer 2)
  • Properties (only when flagged as 'defining') (name/value pairs for the property itself and for associated property parameters) (layer 2)

Protein:

  • Subunits (after subunits are ordered canonically, index, amino acid sequence and length for each subunit) (layer 1)
  • Glycosylation sites (O, N and C; the sites and the type of glycosylation) (layer 2)
  • Disulfide links (layer 2)
  • Other links (layer 2)
  • Modifications (structural, agent and physical; see explanation, below) (layer 2)

Specified substance group 1:

  • Constituent IDs (layer 1)
  • Constituent roles (layer 2)
  • Constituent amounts (layer 2)
  • Modifications (structural, agent and physical; see explanation, below) (layer 2)
  • Properties (only when flagged as 'defining') (name/value pairs for the property itself and for associated property parameters) (layer 2)

Structurally diverse:

  • Parent substance ID (layer 1)
  • Family, genus, species values (layer 1)
  • Author (layer 2)
  • Part (layer 1)
  • Part location (layer 2)
  • Source material class (layer 1)
  • Source material type (layer 1)
  • Fraction name (layer 1)
  • Fraction material type (layer 1)
  • Infraspecific type (layer 2)
  • Infraspecific name (layer 2)
  • Modifications (structural, agent and physical; see explanation, below) (layer 2)
  • Properties (only when flagged as 'defining') (name/value pairs for the property itself and for associated property parameters) (layer 2)

Concept:

  • Primary name (layer 1)

Modifications (type of collection common to several substance types)

  • For agent modifications, the ID of the agent substance (layer 2)
  • Amount (layer 2)
  • For physical modifications, the modification group (layer 2)
  • For physical and structural modifications, the modification role (layer 2)
  • For structural modifications, the residue modified (layer 2)

[Under construction]