You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This ticket contains a list of improvements in the Variant data model, mainly in the StudyEntry model proposed for the next major version. Some of these changes break the compatibility with previous versions.
1 - Replace List<List<String>> samplesData with List<SampleEntry> samples
The current implementation of variant data model makes difficult:
/**
* New data model
*/
SampleEntry {
String sampleId; // Optional
Integer fileIndex; // Mandatory if files is not excluded
List<String> data; // Mandatory
}
StudyEntry {
...
List<String> sampleDataKeys;
List<SampleEntry> samples;
...
}
Implementation notes
Few important implementation notes:
fileIndex points to the files array and it is mandatory unless files are excluded
It is worth mentioning that the first value in samples.data list is always the genotype field (GT), even in somatic studies
the samples.data values are sorted following the field sampleDataKeys in the Study antre (see below)
2 - Rename StudyEntry.format to StudyEntry.sampleDataKeys
The field name format was taken from the VCF file specification, and, unless you experience with VCF files, it's hard to guess the content from its name. This is renamed to sampleDataKeys and it specifies the keys in the samples.data array.
This change is made to be consistent with the name with the SampleEntry.data field
5 - Replace map<VariantStats> with array<VariantStats>
To add more homogeneity to the data model, instead of having a map of cohosrtId -> VariantStats, change it to a list of VariantStats. This requires to add a field id in the model VariantStats
6 - Remove unused hgvs field in Variant
This field was added to VariantAnnotation and therefore has not been used for a long time.
7 - Replace FileEntry.call string with record
Instead of having a single String with the variant and the alleleIdx separated by a colon, replace it with a small model with two fields:
Differently to the previous string call field, the new "call.variantId" field starts with the chromosome. The correct way to parse it is with new Variant(call.getVariantId())
Tasks
1 - SampleEntry.samplesData to SampleEntry.samples
2 - StudyEntry.format to StudyEntry.sampleDataKeys
3 - Add StudyEntry.issues
4 - FileEntry.attributes to FileEntry.data
5 - map<VariantStats> to array<VariantStats>
6 - Remove hgvs
7 - Replace string FileEntry.call with specific record
The text was updated successfully, but these errors were encountered:
This ticket contains a list of improvements in the Variant data model, mainly in the StudyEntry model proposed for the next major version. Some of these changes break the compatibility with previous versions.
1 - Replace
List<List<String>>
samplesData withList<SampleEntry>
samplesThe current implementation of variant data model makes difficult:
Current
Required
Implementation notes
Few important implementation notes:
2 - Rename
StudyEntry.format
toStudyEntry.sampleDataKeys
The field name
format
was taken from the VCF file specification, and, unless you experience with VCF files, it's hard to guess the content from its name. This is renamed to sampleDataKeys and it specifies the keys in the samples.data array.3 - Add Issues to
StudyEntry
You can follow this at #177
4 - Rename
FileEntry.attributes
toFileEntry.data
This change is made to be consistent with the name with the
SampleEntry.data
field5 - Replace
map<VariantStats>
witharray<VariantStats>
To add more homogeneity to the data model, instead of having a map of cohosrtId -> VariantStats, change it to a list of VariantStats. This requires to add a field
id
in the modelVariantStats
6 - Remove unused
hgvs
field in VariantThis field was added to VariantAnnotation and therefore has not been used for a long time.
7 - Replace
FileEntry.call
string with recordInstead of having a single String with the variant and the alleleIdx separated by a colon, replace it with a small model with two fields:
Differently to the previous
string call
field, the new "call.variantId" field starts with the chromosome. The correct way to parse it is withnew Variant(call.getVariantId())
Tasks
SampleEntry.samplesData
toSampleEntry.samples
StudyEntry.format
toStudyEntry.sampleDataKeys
StudyEntry.issues
FileEntry.attributes
toFileEntry.data
map<VariantStats>
toarray<VariantStats>
hgvs
FileEntry.call
with specific recordThe text was updated successfully, but these errors were encountered: