-
Notifications
You must be signed in to change notification settings - Fork 4
Definitional Hashes
ChemMitch edited this page Mar 1, 2022
·
2 revisions
[For technical and scientific users]
Each substances has a definitional hash that provides a concise, machine-readable definition of the factors that make each substance unique. The idea is that 2 substances that share the same definitional hash are probably equivalent.
The definitional hash consists of a collection of DefinitionalElement objects (key/value/layer triplets)
- Key – what the factor is
- Value – factor’s specific result for one substance
- Layer – how central the factor is to our thinking about this substance (currently, either 1 or 2)
Each substance type has its own keys that define its definitional hash.
- Stereo insensitive hash - a hash of the chemical structure that ignores stereochemistry (layer 1)
- Exact hash - a hash of the chemical structure that includes stereochemistry (layer 2)
- Stereochemistry - (Absolute|Achiral|Epimeric|Mixed|Racemic|Unknown) (layer 2)
- Optical Activity -(+|-|+/-|Unspecified|None) (layer 2)
- Moieties (repeat Stereo insensitive hash, Exact hash, Stereochemistry, and Optical Activity, plus amount for each fragment).
- ID of each component (layer 1)
- Type (any/all...) of each component (layer 2)
- Parent substance ID (layer 2)
- Modifications (structural, agent and physical; see explanation, below) (layer 2)
- Sequence of bases in each subunit (layer 1)
- Linkages (including sites) (layer 2)
- Sugars (sugar identity and sites) (layer 2)
- Modifications (structural, agent and physical; see explanation, below) (layer 2)
- Monomer IDs (layer 1)
- Monomer amounts (layer 2)
- Structural units' exact structure hash (layer 1)
- Structural units' amounts (layer 2)
- Modifications (structural, agent and physical; see explanation, below) (layer 2)
- Properties (only when flagged as 'defining') (name/value pairs for the property itself and for associated property parameters) (layer 2)
- Subunits (after subunits are ordered canonically, index, amino acid sequence and length for each subunit) (layer 1)
- Glycosylation sites (O, N and C; the sites and the type of glycosylation) (layer 2)
- Disulfide links (layer 2)
- Other links (layer 2)
- Modifications (structural, agent and physical; see explanation, below) (layer 2)
- Constituent IDs (layer 1)
- Constituent roles (layer 2)
- Constituent amounts (layer 2)
- Modifications (structural, agent and physical; see explanation, below) (layer 2)
- Properties (only when flagged as 'defining') (name/value pairs for the property itself and for associated property parameters) (layer 2)
- Parent substance ID (layer 1)
- Family, genus, species values (layer 1)
- Author (layer 2)
- Part (layer 1)
- Part location (layer 2)
- Source material class (layer 1)
- Source material type (layer 1)
- Fraction name (layer 1)
- Fraction material type (layer 1)
- Infraspecific type (layer 2)
- Infraspecific name (layer 2)
- Modifications (structural, agent and physical; see explanation, below) (layer 2)
- Properties (only when flagged as 'defining') (name/value pairs for the property itself and for associated property parameters) (layer 2)
- Primary name (layer 1)
- For agent modifications, the ID of the agent substance (layer 2)
- Amount (layer 2)
- For physical modifications, the modification group (layer 2)
- For physical and structural modifications, the modification role (layer 2)
- For structural modifications, the residue modified (layer 2)
[Under construction]