This living document specifies the intermediate format used between modules in order to represent questions. It is divided in two parts, the data model with mathematical-like notations and the canonical serialization in JSON.
This document tries to use the same denominations as the RDF specifications.
The PPP queries/questions are trees, where leaves are values and nodes are operators.
We describe here the different kind of possible values.
A resource represents something in the universe. It may be a person denoted by its name, a date, a location, etc. Its notation is just a string like Douglas Adams
, Peru
, true
or 2014
.
A list is an ordered collection of resources without duplicates. Its notation is a comma separated list between brackets like [Foo, Bar]
. We also assimilate the list with only one element with the element itself. So [Foo]
may be also written Foo
.
bool is the set of the two resources true and false representing the two booleans. They are written true
and false
.
A missing represents what is the target of the query. Its notation is ?
.
A triple, as in RDF, is a structure composed of three elements:
- the subject, what the triple refers to
- the predicate, that denotes the relationship between the subject and the object
- the object, what property of the subject the triple refers to
Triples notation is (subject, predicate, object)
.
A triple (a, b, c)
where a
, b
and c
are resources is true if, and only if, the associated statement is true. For example (Douglas Adams, instance of, human)
is true because Douglas Adams is an instance of human is a true statement but (Douglas Adams, instance of, flower)
is false because Douglas Adams is definitively not a flower.
We define also a subset of resources called properties such that b
is a property if, an only if, b
may be seen as a relation between two resources (i.e. may be used as a predicate of a meaningful triple). For example birth date
is a property but Douglas Adams
is not.
We use triple formalism with some operators:
Full triple is a function of list × list × list → bool written (la, lb, lc)
that returns true
, if and only if, for all a
∈ la
and c
∈ lc
, there exists b
∈ lb
such that (a, b, c)
is true.
The aim of triples with hole is to get information. They are functions of list × list → list. We define two of them:
- missing object triple written
(la, lb, ?)
that returns the list of resourcesc
such that there existsa
∈la
andb
∈lb
with(a,b,c)
true. - missing subject triple written
(?, lb, lc)
that returns the list of resourcesa
such that there existsb
∈lb
andc
∈lc
with(a,b,c)
true. - missing predicate triple written
(la, ?, lc)
that returns the list of predicatesb
such that there existsa
∈la
andc
∈lc
with(a,b,c)
true.
Example: a triple generated from the question “What is the birth date of George Washington?” could be: (George Washington, birth date, ?)
.
There are some operators that manipulate lists. These operators do not preserve order.
The inverse is an operator of property → property such that inverse(p)
is an inverse property of p
i.e. a property such that for all a
, c
, (a, p, c)
↔ (c, inverse(p), a)
. We generalize this operator on list → list (it returns an inverse property for each property in the input list).
Example: We may have has parent = inverse(has child)
.
inverse operator is useful when the same statement may be written in inversed manners in the database. For example, a
is a part of c
may be encoded as (a, part of, c)
or (c, member, a)
. With inverse, in order to retrieve all parts of c
we may use this single triple (?, part of ∪ inverse(member), c) = (c, member ∪ inverse(part of), ?)
.
The union is an operator of list⁺ → list that returns the union of the lists. Its notation is the infix ∪
like l1 ∪ l2 ∪ l3
.
The intersection is an operator of list⁺ → list that returns the intersection of the lists. Its notation is the infix ∩
.
Sort is an operator of list × resource → list, written sort(l, a)
, which sorts the list l
with increasing order according to the predicate a
. A default
predicate can be used when the question does not contain information about it.
Example: sort([Theodore Roosevelt, George Washington], birth date)
returns [George Washington, Theodore Roosevelt]
.
Example of default
: The question "Who is the first president of France" may be formalized as sort((France, president, ?), default)
.
nth is an operator of integer × list → resource, written nth(i, l)
, that returns the ith element of the list l (the first element of the list is at position 0). If i < 0 then it returns the ith element from the end (nth(-1, l) returns the last element of the list).
Example: nth(0, [George Washington, Theodore Roosevelt])
returns George Washington
and nth(-1, [George Washington, Theodore Roosevelt])
returns Theodore Roosevelt
.
We define also the deprecated aliases first(l) = nth(0, l) and last(l) = nth(-1, l)
There are some operators that manipulate bool:
The and is an operator of bool⁺ → bool that returns true if the conjonction of the parameters is true. Its notation is the infix ∧
.
The or is an operator of bool⁺ → bool that returns true if the disjonction of the parameters is true. Its notation is the infix ∨
.
The not is an operator of bool → bool that returns the negation of the parameter. Its notation is the prefix ¬
.
The exists is an operator of list → bool that returns true if, and only if, there exits a resource in the list (i.e. the list is not empty). Its notation is the prefix ∃
.
Example: The question "Is there a pink bird?" may be formalized as ∃ (?, instance of, bird) ∩ (?, color, pink)
.
A sentence represents a full question encoded as a string. Its notation is a string between quotation marks like "Who are you?"
. It may be only the root of the question tree.
It is possible to add type information to resource and missing nodes.
Example: If we choose as range "time" to the missing node ?
in the triple (George Washington, birth, ?)
this triple with hole can only return time points.
Example: If we choose as range "musician" for the resource node Michele Smith
we state that this resource is the musician Michele Smith and not the author that has the same name.
We provide a canonical representation of the data model in JSON.
Each node of the query tree is encoded as a JSON object with an attribute type
that determines the kind of node and the other attributes.
The type
attributes has the same value as the name of the node type in the data model.
Here are the serialization for the possible nodes:
The resource
serialization has three primary attributes:
value
that is a string representation of the resource (for interoperability).value-type
(optional) that adds information about the type of the entity. Each module can use its own types or use basic types specified just after. Default:string
.range
(optional) used in order to contain type informations as specified by the typing extension of the data model.
There may be additional attributes depending on the value-type
.
Simple example:
{"type": "resource", "value": "George Washington"}
Example with type
for the bool true:
{"type": "resource", "value": "true", "value-type":"boolean"}
Example with type
and additional attributes.
{"type": "resource", "value": "1111-11-11", "value-type":"time", "calendar":"julian"}
They are inspired by XML Schema ones.
A simple string. value
may be any string.
Additional attributes:
language
(optional) the language code in which thevalue
is written. Should Follow RFC 4646.
A boolean. value
should be in {"true", "false", "1", "0"}.
A point in time. value
should match ISO 8601.
Consider using instead resource-jsonld
that is more powerful (see the third example).
Additional attributes:
calendar
(optional) the calendar of the date. Default:gregorian
.
A mathematical formula. value
is a human readable string representation of the formula and latex
is the formula written in LaTeX without the $ $
. For instance:
{"type": "resource", "value": "1/2", "latex": "\frac{1}{2}", "value-type": "math-latex"}
A resource described in JSON-LD. value
is a human readable string representation and graph
is the JSON-LD graph describing the resource.
The JSON-LD graph must have as root a node object describing the resource. It should use schema.org vocabulary as much as possible in order to increase interoperability between modules and should not contain blank node identifiers.
The graph may be compacted in order to reduce the size of communications. The context http://schema.org
should be used for that.
You must not use the schema:url property but instead the @id
keyword.
In order to return literals like dates, you may use as root node of the JSON-LD tree a resource with as @type
the type of your literal and as rdf:value (IRI: http://www.w3.org/1999/02/22-rdf-syntax-ns#value
) the literal itself.
Note: You must use as value of schema:sameAs only URIs that identify the exact same resource as the current one. For example, you can state that Douglas Adams is schema:sameAs http://wikidata.org/entity/Q42 but not schema:sameAs http://en.wikipedia.org/wiki/Douglas_Adams, because the later is the URI of an article about Douglas Adams but not an URI for Douglas Adams himself.
{
"type": "resource",
"value-type": "resource-jsonld",
"value": "Douglas Adams",
"graph": {
"@context": "http://schema.org",
"@type": "Person",
"name": {"@value": "Douglas Adams", "@language": "en"},
"description": [
{"@value": "English writer and humorist", "@language": "en"},
{"@value": "écrivain anglais de science-fiction", "@language": "fr"}
],
"sameAs": "http://www.wikidata.org/entity/Q42",
"image": {
"@type": "ImageObject",
"contentUrl": "//upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg",
"name": "Douglas adams portrait cropped.jpg"
},
"potentialAction": {
"@type": "ViewAction",
"name": [{"@value": "View on Wikidata", "@language": "en"}, {"@value": "Voir sur Wikidata", "@language": "fr"}],
"image": "//upload.wikimedia.org/wikipedia/commons/f/ff/Wikidata-logo.svg",
"target": "//www.wikidata.org/wiki/Q42"
}
}
}
{
"type": "resource",
"value-type": "resource-jsonld",
"value": "Lyon",
"graph": {
"@context": "http://schema.org",
"@type": "GeoCoordinates",
"latitude": "45.72",
"longitude": "4.82",
"@reverse": {
"geo": {
"@type": "Place",
"name": "Lyon",
"sameAs": "http://www.wikidata.org/entity/Q456"
}
}
}
}
{
"type": "resource",
"value-type": "resource-jsonld",
"value": "December 23, 1790",
"graph": {
"@context": "http://schema.org",
"@type": "Date",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#value": {
"@value": "1790-12-23",
"@type": "Date"
},
"@reverse": {
"birthDate": {
"@type": "Person",
"name": "Jean-François Champollion",
"sameAs": "http://www.wikidata.org/entity/Q260"
}
}
}
}
It has only one attribute, list
that is an array that stores the serialization of list elements.
Example: The serialization of `[George Washington, Theodore Roosevelt]' is
{"type": "list", "list":[{"type": "resource", "value": "George Washington"}, {"type": "resource", "value": "Theodore Roosevelt"}]}
The list with only one element may be also serialialized as the serialization of the element.
Example: The serialization of `[George Washington]' may be
{"type": "resource", "value": "George Washington"}
It has one possible attribute:
range
(optional) used in order to contains type informations as specified by the typing extension of the data model.
Example:
{"type": "missing", "range":"book"}
The different kind of triples use all the same serialization with their three members as attributes, subject
, predicate
and object
.
Example: the serialization of the triple (George Washington, birth date, ?)
is:
{
"type": "triple",
"subject": {"type": "resource", "value": "George Washington"},
"predicate": {"type": "resource", "value": "birth date"},
"object": {"type": "missing"}
}
There is also an optional fourth attribute, inverse-predicate
to encode triples like (Barack Obama, residence ∪ inverse(inhabitant), ?)
:
{
"type": "triple",
"subject": {"type": "resource", "value": "Barack Obama"},
"predicate": {"type": "resource", "value": "residence"},
"inverse-predicate": {"type": "resource", "value": "inhabitant"},
"object": {"type": "missing"}
}
Please note that the previous triple is equivalent to:
{
"type": "triple",
"subject": {"type": "missing"},
"predicate": {"type": "resource", "value": "inhabitant"},
"inverse-predicate": {"type": "resource", "value": "residence"},
"object": {"type": "resource", "value": "Barack Obama"}
}
There is only one parameter, list, that is an array containing the operator parameters.
Example: the serialization of the query [George Washington] ∪ [Theodore Roosevelt]
is:
{
"type": "union",
"list": [{"type": "resource", "value": "George Washington"}, {"type": "resource", "value": "Theodore Roosevelt"}]
}
There are two parameters:
list
the input node.predicate
the predicate used to sort the list.
Example: the serialization of the query sort([Theodore Roosevelt, George Washington], birth date)
is:
{
"type": "sort",
"list": {"type":"list", "list":[{"type": "resource", "value": "Theodore Roosevelt"}, {"type": "resource", "value": "George Washington"}]},
"predicate": {"type": "resource", "value": "birth date"}
}
There are two parameters:
list
that is the input node.position
that is the position of the requested element.
Example: the serialization of the query nth(2, [George Washington, Theodore Roosevelt])
is:
{
"type": "nth",
"position": 2,
"list": {"type":"list", "list":[{"type": "resource", "value": "George Washington"}, {"type": "resource", "value": "Theodore Roosevelt"}]}
}
There is only one parameter list
that is the input list.
Example: the serialization of the query first([George Washington, Theodore Roosevelt])
is:
{
"type": "first",
"list": {"type":"list", "list":[{"type": "resource", "value": "George Washington"}, {"type": "resource", "value": "Theodore Roosevelt"}]}
}
It has only one attribute, value
that contains the sentence.
Example: The serialization of "Who is George Washington?"
is:
{"type": "sentence", "value": "Who is George Washington?"}