Replies: 4 comments 12 replies
-
Overall I agree with this proposal, especially with the introduction of The Property Value syntax is a bit ugly, but that's a limitation of JSON-LD... For human labels, can we add the {
"name": "label",
"description": "Human-readable label",
"@type": "ml:Field",
"dataType": ["sc:Text", "sc:name"]
} For the {
"name": "embarkation",
"description": "Port of Embarkation (C: Cherbourg, Q: Queenstown, S: Southampton, ?: Unknown).",
"@type": "ml:Field",
"dataType": ["sc:Text", "ml:Enum"],
"source": "#{passengers-table/embarked}",
"ml:enum": "#{embarkation_enum/key}"
} Technically one could just use the actual type of the data without On the naming of the Maybe if we have |
Beta Was this translation helpful? Give feedback.
-
I like the overall approach, as we discussed offline. I'm not convinced it's necessary to define "ml:Enum" as a data type. Also, the semantics of the "ml:enum" property is very close to the one of the "references" property used for joins. What does it buy us to define these explicitly? |
Beta Was this translation helpful? Give feedback.
-
Good suggestions, I think! I do want to comment on the verbosity, though. Previously we had
Now we have a list of We've discussed a possibility to alleviate the problem before: a simpler Croissant-view, convertible to the schema.org compliant json-ld. Do we want to go that route? If not, what do you think on the verbosity? |
Beta Was this translation helpful? Give feedback.
-
Just to clarify the alternative proposal:
So your example becomes: {
"name": "embarkation",
"description": "Port of Embarkation (C: Cherbourg, Q: Queenstown, S: Southampton, ?: Unknown).",
"@type": "ml:Field",
"dataType": "sc:Text"
"source": "#{passengers-table/embarked}",
"references": "#{embarkation_enum/key}"
}, with the RecordSet of embarkation_enum defined the same way as in your example. I don't think it's very likely, but if needed, we can have multiple 'references' defined for the same field. |
Beta Was this translation helpful? Give feedback.
-
Handle enums in Croissant
Problem
The Titanic dataset declares enums to:
For instance, in the original dataset, the embarkation is a string in
C
,Q
,S
orU
. The human would like to have the semantic translations:Cherbourg
,Queenstown
,Southampton
orUnknown
. The machine would like to have the semantic meaning using https://www.wikidata.org/wiki/Q3667188, https://www.wikidata.org/wiki/Q733093, https://www.wikidata.org/wiki/Q79848 and https://www.wikidata.org/wiki/Q24238356.Current way of doing: we declare a record set, and we join based on this record set. Currently, we do not have clear guidelines on how to do this. The goal of this discussion is to propose and discuss guidelines.
Solution
After discussion, the solution is: #52 (reply in thread)
Proposal
male->0
andfemale->1
doesn't bring any semantic meaning, and should be dropped. However, the fact that the dataset declares two genders (https://www.wikidata.org/wiki/Q6581097 and https://www.wikidata.org/wiki/Q6581072) In particular, enums do not change the value outputed by Croissant (like in the example of male/female).ml:Enum
data type. The source is the actual source.ml:enum
references the column of a record set that corresponds to the value in the source.Beta Was this translation helpful? Give feedback.
All reactions