Document under construction.
This document describes the formal semantics of Shape Expressions (ShEx) schemas. The semantics is defined on a slight generalization of ShEx schemas, called the shapes language. The generalization allows for a more succinct abstract syntax on which the semantics of the shapes language is defined. In the generalization,
The Shape Expressions language is expected to remain stable with the exception of:
Shape expressions are defined using terms from RDF semantics:
The following sets access the elements of an RDF graph
Additionally, we define
Consider the RDF graph
PREFIX ex: http://ex.example/# PREFIX foaf: http://xmlns.com/foaf/ PREFIX xsd: http://www.w3.org/2001/XMLSchema# inst:Issue1 ex:state ex:unassigned ; ex:reportedBy inst:User2 . inst:User2 foaf:name "Bob Smith" ; foaf:mbox <mailto:bob@example.org> .
The set
{ inst:Issue1, ex:unassigned, inst:User2, "Bob Smith", <mailto:bob@example.org> }
The set
{ ex:state, ex:reportedBy, foaf:name, foaf:mbox }
The set
{ ^ex:state, ^ex:reportedBy, ^foaf:name, ^foaf:mbox }
The set
inst:User2 foaf:name "Bob Smith" . inst:User2 foaf:mbox <mailto:bob@example.org> . inst:User2 ^ex:reportedBy inst:Issue1 .
A shapes schema defines a set of named constraints, called shapes. The name of a shape is also called shape label.
A shapes schema
The structure of a shapes schema is captured by the following abstract syntax.
A typing is a set of node-label associations of the form
Formally, let
Consider a shapes schema
An example of a typing of the graph
The
A shape expression is a Boolean combination of two kinds of atomic components:
Let
Let
Examples of value expressions are
The semantics of a shape expression is defined relative to a typing.
Let
Let
Whatever the value of
If
Triple expressions are used for defining neighbourhood constraints within a shape expression.
A triple expression is matched by a set of triples that come from the neighbourhood of a node in an RDF graph.
An atomic triple expression can be
Triple expressions allow for two binary operators, the some-of operator, denoted
Finally, a triple expression can be repeated by specifying a minimal cardinality that is a natural number, and a maximal cardinality that is a natural number of an unbounded value denoted as a star
Moving the mouse over the elements of the list below will highlight the corresponding elements in the triple expression.
# triple expression of the shapes language
{rdf:type} <SL_1> ;
{foaf:name} <SL_2> ;
( {foaf:mbox} <SL_3> {1;3} | {foaf:adress} <SL_3> ) ;
{^ex:leader} <SL_4> ;
C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox, ^ex:leader} <SL_all> *
Where C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox} = (Prop ∪ InvProp) \ {rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox}, that is, the complement of the set {rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox} within the set of all properties and inverse properties.
A triple expression is matched by a set of triples and inverse triples.
In practice, the set will be obtained as the neighbourhood of some node in an RDF graph, or will be a subset of such neighbourhood.
Such set is composed of triples of the form
For a set of triples or inverse triples
Example of matching a triple constraint.
# if typing is such that (:n1, <SL_1>) belongs to typing (:n2, <SL_2>) does not belong to typing # then typing, {(:n :p :n1)} ⊧ {:p} <SL_1> typing, {(:n :p :n1)} ⊧ {:p, :q} <SL_1> typing, {(:n :p :n2)} ⊭ {:p} <SL_2>
Examples of satisfying an each-of triple expression.
# if typing is such that it contains the following node-label associations (:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>) # then typing, {(:n :p :n1), (:n :p :n2)} ⊧ {:p} <SL_1> ; {:q} <SL_2> typing, {(:n :p :n1), (:n :p :n3)} ⊧ {:p} <SL_1> ; {:p} <SL_1> typing, {(:n :p :n1)} ⊭ {:p} <SL_1> ; {:p} <SL_1>
Examples of satisfying a some-of triple expression.
# if typing is such that it contains the following node-label associations (:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>) # then typing, {(:n :p :n1)} ⊧ {:p} <SL_1> | {:q} <SL_2> typing, {(:n :p :n1), (:n :p :n2)} ⊭ {:p} <SL_1> | {:q} <SL_2>
Examples of satisfying a triple expression with some-of and each-of operators.
# if typing is such that it contains the following node-label associations (:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>) # then typing, {(:n :p :n1)} ⊧ {:p} <SL_1> | {:q} <SL_2> typing, {(:n :p :n1), (:n :p :n2)} ⊭ {:p} <SL_1> | {:q} <SL_2>
Consider the following triple expression and RDF graph.
# graph G
ex:lp rdf:type foaf:Person;
foaf:name "Louisa Pavlova";
foaf:mbox "louisa@shex.org";
foaf:mbox "lpavlova@jobs.com";
ex:memberOf ex:gr2 .
ex:gr2 ex:leader ex:lp .
ex:bug ex:reportedBy ex:lp .
# typing is such that it contains the following node-label associations
(n, <SL_all>) for all node n in G
(foaf:Person, <SL_Person>)
("Louisa Pavlova", <SL_string>)
("louisa@shex.org", <SL_mbox_string>)
("lpavlova@jobs.com", <SL_mbox_string>)(ex:gr2, <SL_Group>)
# triple expression TExpr
{rdf:type} <SL_Person> ;
{foaf:name} <SL_string> ;
( {foaf:mbox} <SL_mbox_string> {1;3} | {foaf:adress} <SL_adr> ) ;
{^ex:leader} <SL_Group> ;
C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox, ^ex:leader} <SL_all> *
# neigh(G, ex:lp), the neighbourhood of ex:lp in G
ex:lp rdf:type foaf:Person .
ex:lp foaf:name "Louisa Pavlova" .
ex:lp foaf:mbox "louisa@shex.org" .
ex:lp foaf:mbox "lpavlova@jobs.com" .ex:lp ^ex:leader ex:gr2
ex:lp ex:memberOf ex:gr2
ex:lp ^ex:reportedBy ex:bug
Then typing, neigh(G, ex:lp) ⊢ TExpr. Move the mouse over the triples in neigh(G, ex:lp) to see which triples contribute to satisfy which of the sub-expressions of TExpr.
A
As we can see from the syntax, shape expressions can refer other shape definitions through a
We start by defining the dependency graph for a shapes schema, based on which we will define the stratification.
For a schema
We say that the shapes schema
The following is an example of a shapes schema and its dependency graph.
# a shapes schema <SL_all>:= IRI ∪ RDFLiteral ∪ {blank} <SL_1> := NOT ({:c} <SL_5>) <SL_2> := {:a} <SL_4> ; {:c} <SL_5> <SL_3> := NOT ({:c} <SL_all>) AND {:a} <SL_2> <SL_4> := {:a} <SL_3> <SL_5> := {:c} <SL_all>
# the edges of the dependency graph # ---> is a negative dependency edge # -+-> is a positive dependency edge <SL_1> ---> <SL_5> <SL_2> -+-> <SL_4> <SL_2> -+-> <SL_5> <SL_3> ---> <SL_all> <SL_3> -+-> <SL_2> <SL_4> -+-> <SL_3> <SL_5> -+-> <SL_all>
If a schema is with stratified negation, then there exists a partition of the shape labels of the schema such that there does not exist a negative dependency edge between the shape labels of the same part of the partition. Moreover, the parts of the partition can be ranked in a way that all negative dependency edges always go from a partition with a higher rank to a partition with a lower rank. Such partitioning is called a stratification.
More formally, if
A stratification guarantees the following property: the definition of a shape label can depend negatively only on shape labels that are in strictly lower stratums, and can depend positively only on shape labels that are on the same stratum or on a lower stratum. In particular, every circular dependency (the definition of a shape label depends on itself) enforces that all the intermediate shape labels are on the same stratum.
A stratification for the schema on the previous example is:
strat(<SL_2>) = 3 strat(<SL_3>) = 3 strat(<SL_4>) = 3 strat(<SL_1>) = 2 strat(<SL_5>) = 1 strat(<SL_all>) = 1
The shape labels <SL_2>, <SL_3>, <SL_4> are on the same stratum. One can see that there is a circular dependency between these three shape labels. The two negative dependencies in that schema go from <SL_1> to <SL_5> (from stratum 2 to stratum 1), and from <SL_3> to <SL_all> (from stratum 3 to stratum 1).
Another possible stratification for that schema is to have only two stratums, and <SL_1> on the same stratum as <SL_2>, <SL_3>, <SL_4>.
Remark that it is easy to check whether a schema is with stratified negation: it is enough to construct the dependency graph, and check that it does not contain any cycle that goes through a negative dependency.
A schema with stratified negation admits a unique typing defined here after. This typing is used for defining the semantics of shapes schemas: we say that a node in an RDF graph satisfies the constraint of a shape label if and only if the corresponding node-label association belongs to this particular typing. This typing is maximal as it forces to add in the typing every provable node-label association.
Let
It follows from this definition that
Then we define a sequence of typings
Note that because
The typing
TODO: an example of the sequence of typings
Every shape label from the ShEx schema is used as shape label in the shapes schema, which definition is the compilation of the definition from the ShEx schema.
Every shape expression that appears in a triple constraint of the ShEx schema yields a fresh shape label having the same definition in the shapes schema. This does not apply to shape expressions that are references.
The translation of a CLOSED triple expression in the ShEx schema translates almost directly to a triple expression in the shapes schema, exept that predicates are replaced with singleton sets, and value expressions are replaced with the corresponding freshly introduced shape labels
# ShEx schema
<IssueShape>
CLOSED {
ex:state [ex:unassigned ex:assigned] ;
ex:reportedBy @<UserShape> ;
ex:reportedOn xsd:dateTime OR xsd:date ;
ex:related @<IssueShape>* ;
^ex:related @<IssueShape>*
}
<UserShape> {
PATTERN "http://example.org/User#.*"
}
# shapes schema
<IssueShape> :=
{
{ex:state} <SL_1> ;
{ex:reportedBy} <UserShape>
{ex:reportedOn} <SL_3> ;
{ex:related} <IssueShape>* ;
{^ex:related} <IssueShape>*
}
<UserShape> {
Set(PATTERN "http://example.org/User#.*")
}
<SL_1> := {ex:unassigned, ex:assigned}
<SL_3> := xsd:dateTime OR xsd:date
As in the previous example, the value expressions of the ShEx schema yield shape labels with the same definition. Here, the resulting fresh shape labels are <SL_5>, <SL_6>, <SL_7>, <SL_8>.
The triple expression from the ShEx schema is translated into a triple expression in the shapes schema. One part of that triple expression is a direct translation of the triple expression from the ShEx schema.
The EXTRA modifier results in additional triple constraints in the shapes schema. There is one such triple constraint with * cardinality for every extra property. For every extra property, the value expression is the negation of the value expressions that occur with the same property.
# ShEx schema
<UserShape>
CLOSED
EXTRA rdf:type foaf:mbox {
rdf:type [foaf:Person] ;
rdf:type [ex:User] ;
(
foaf:name xsd:string
|
foaf:givenName xsd:string+ ;
foaf:familyName xsd:string
);
foaf:mbox IRI
}
# shapes schema
<SL_5> := {foaf:Person}
<SL_6> := {ex:User}
<SL_7> := xsd:string
<SL_8> := IRI
<SL_UserShape_Extra_rdftype> := NOT{foaf:Person} AND NOT{ex:User}
<SL_UserShape_Extra_foafmxbox> := NOT IRI
<UserShape> :=
{
{rdf:type} <SL_5> ;
{rdf:type} <SL_6> ;
(
{foaf:name} <SL_7>
|
{foaf:givenName} <SL_7>+ ;
{foaf:familyName} <SL_7>
);
{foaf:mbox} <SL_8> ;
{rdf:type} <SL_UserShape_Extra_rdftype> * ;
{foaf:mbox} <SL_UserShape_Extra_foafmbox> * ;
}
As for the case with EXTRA properties, the shape expression from the ShEx schema is tranlated into a triple expression with an additional triple constraint with * cardinality. The set of properties of the latter is the set of all properties that do not appear in the shape expression. The value expression accepts all values.
# ShEx schema
<SomeShape>
{
ex:p xsd:int* ;
( ex:q xsd:int | ex:r IRI )?
}
# shapes schema
<SL_9> := xsd:int
<SL_10> := IRI
<SL_all_values> := IRI ∪ RDFLiteral ∪ {blank}
<SomeShape> :=
{
{ex:p} <SL_9>* ;
( {ex:q} <SL_9> | {ex:r} <SL_10> )? ;
PropSet <SL_all_values>*
}
# with PropSet = (Prop ∪ InvProp) \ {ex:p, ex:q, ex:r}
The shape label <EmployeeShape> is defined as a conjunction of node constrints and triple expressions. In the shapes schema, this yields a ShapeAnd shape expression with one conjunct per conjunct in the ShEx schema.
# ShEx schema
<EmployeeShape>
PATTERN "^http:/example.org/.*"
CLOSED {
foaf:phone IRI*;
foaf:mbox IRI
}
AND
CLOSED {
( foaf:phone PATTERN "^tel:\\+33" ;
foaf:mbox PATTERN "\\.fr$" )? ;
( foaf:phone PATTERN "^tel:\\+44" ;
foaf:mbox PATTERN "\\.uk$")?
}
# shapes schema
<SL_11> := Set(PATTERN "^tel:\\+33")
<SL_12> := Set(PATTERN "\\.fr$")
<SL_13> := Set(PATTERN "^tel:\\+44")
<SL_14> := Set(PATTERN "\\.uk$")
<SL_15> := IRI
<EmployeeShape> := {
Set(PATTERN "^http:/example.org/.*")
AND
{foaf:phone} <SL_15>* ;
{foaf:mbox} <SL_15>
AND
( {foaf:phone} <SL_11> ;
{foaf:mbox} <SL_12> )? ;
( {foaf:phone} <SL_13> ;
{foaf:mbox} <SL_14> )?
}
The negated triple constraint is treated as if its property was an extra property. The corresponding triple constraint in the translation has a {0;0} cardinality.
# ShEx schema
<SomeShape>
CLOSED {
ex:p xsd:int* ;
( ! ex:q xsd:int
| ex:r IRI )?
}
# shapes schema
<SL_9> := xsd:int
<SL_10> := IRI
<SL_Extra> := NOT xsd:int
<SomeShape> := {
{ex:p} <SL_9>* ;
( {ex:q} <SL_9>{0;0}
| {ex:r} <SL_10> )? ;
{ex:q} <SL_Extra>*
}
Let
Formally, let
Remark that the validation problem does not necessarily require to compute
The abstract syntax of ShEx schemas is given by the following rules. All non-terminals of the abstract syntax are terminated by an underscore (_), this in order to avoid confusion with the non-terminals of the abstract syntax of the shapes language.
A shape expressions schema defines a set of shape labels, together with their definition.
TODO: explanation of the constructs of a ShEx schema
In this example we illustrate how concrete syntax of ShEx in Turtle format corresponds to the abstract syntax presented above.
Here is a ShEx schema written in Turtle format. The leading numbers are line numbers used for reference.
RREFIX ex: <http://ex.example/#> PREFIX foaf: <http://xmlns.com/foaf/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 1 <IssueShape> CLOSED EXTRA rdf:type 2 { 3 rdf:type [ex:Issue]; 4 ex:state [ex:unassigned ex:assigned]; 5 6 ex:reportedBy @<UserShape>; 7 ex:reportedOn xsd:dateTime; 8 ( 9 ex:reproducedBy @<EmployeeShape>; 10 ex:reproducedOn xsd:dateTime OR xsd:date 11 )?; 12 ^ex:related @<IssueShape>* 13 } 14 15 <UserShape> PATTERN "^http:/example.org/.*" { 16 ( 17 foaf:name xsd:string 18 | 19 foaf:givenName xsd:string+; 20 foaf:familyName xsd:string 21 ); 22 foaf:mbox IRI 23 } 24 25 <EmployeeShape> { 26 foaf:phone IRI*; 27 foaf:mbox IRI 28 } AND { 29 ( foaf:phone PATTERN "^tel:\\+33"; 30 foaf:mbox PATTERN "\\.fr$" )?; 31 ( foaf:phone PATTERN "^tel:\\+44"; 32 foaf:mbox PATTERN "\\.uk$")? 33 }
We first explain some elements of the concrete syntax. A triple expression is surrounded by curly braces { and }. The square braces delimit sets of values. Parentheses are used as usual for defining the structre of an expression. We use abbreviations for some intervals of min and max cardinality, namely, '+' is the interval [1;*], '*' is the interval [0;*], and '?' is the interval [0;1].
Then, the schema here above defines three shape labels: <IssueShape>, <UserShape>, <EmployeeShape>.
The shape expression that defines the shape label <IssueShape> starts on line 1 after the shape label, and ends with the curly bracket on line 13.
It has an empty list of node constraints, and a single shape definition.
The shape definition has two modifiers,
The triple expression in the shape definition is given between the curly braces on lines 2 and 13. It is an each-of expression composed of six sub-expressions, separated by semicolons. The sub-expressions are as follows:
We explain the different forms of triple constraint that appear in the definition of <IssueShape>, by explaining their property or their value expression.
1 <IssueShape> CLOSED EXTRA rdf:type 2 { 3 rdf:type [ex:Issue]; # the property is rdf:type, # the value expression is a node constraint # that is a singleton set containing the IRI ex:Issue 4 ex:state [ex:unassigned ex:assigned]; # the property is ex:state 5 # the value expression is a set with two IRI, # ex:unassigned and ex:assigned 6 ex:reportedBy @<UserShape>; # the value expression is a ShapeExprRef 7 ex:reportedOn xsd:dateTime; # the value expression is a the set of literal values # that have XSD type xsd:dateTime 8 ( 9 ex:reproducedBy @<EmployeeShape>; 10 ex:reproducedOn xsd:dateTime OR xsd:date # the value expression is DisjunctiveValueExpr, # whose sub-expressions are sets of literal values 11 )?; 12 ex:related @<IssueShape>* 13 }
We now explain a few other elements of the syntax.
15 <UserShape> PATTERN "^http:/example.org/.*" { ... 23 }
The shape expression that defines <UserShape> is composed of one node constraint PATTERN "^http:/example.org/.*" and one shape definition between the curly braces lines 15 to 23. The node constraint PATTERN "^http:/example.org/.*" defines the set of IRI that satisfy a pattern given by a regular expression.
25 <EmployeeShape> { ... 28 } AND { ... 33 }
The shape expression that defines <EmployeeShape> is composed of zero node constrainst, and two shape definitions, separated by an AND.