This page describes the changes that are required to modify the CSR data model.
The data model is defined as a data class in csr/csr.py. In order to add a new entity, a new class has to be defined.
Classes are defined using pydantic - library to validate JSON documents and convert them into Python objects, based on Python Type Annotations.
All new entities have to have a non-nullable identity defined. They also have to be a part of SubjectEntity union defined in csr.py. If there is a relation between the new entity and other entities, it has to be specified as a value of references key in Schema dictionary.
For more details see official pydantic documentation.
If a property of the new entity requires some additional calculation, e.g. its value has to be derived from other input values, the calculation logic should be added to sources2csr/derived_values.py.
Changes added to the code should be well-covered by tests (test coverage should not be decreased). In order to test reading of the source data and mapping it to CSR, extend tests/sources2csr/test_sources2csr.py. In order to test CSR to tranSMART mapping, extend tests/csr2transmart/test_csr_mapper.py. All new test data sets should be added to test_data folder.
Remember to update the section about data model in README.rst.
When adding fields to one of the existing entities, there is a change required in the entity class in csr/csr.py.
When changing csr/csr.py, tests both for mapping from sources to CSR and from CSR to tranSMART has to be changed or extended - see "Extending tests and documentation" section above.
Some of the entities can be left out, if no data is available for them. In order to do this is, an empty "attributes" array should be assigned to the entities in one of the configuration files of sources2csr - sources_config.json file.
For example:
{
"entities": {
"Biosource": {
"attributes": []
}
}
}