Document under construction.

This document describes the formal semantics of Shape Expressions (ShEx) schemas. The semantics is defined on a slight generalization of ShEx schemas, called the shapes language. The generalization allows for a more succinct abstract syntax on which the semantics of the shapes language is defined. In the generalization,

The semantics of ShEx schemas is then defined as a translation to the shapes language.

The Shape Expressions language is expected to remain stable with the exception of:

Terminology

Shape expressions are defined using terms from RDF semantics:

The following sets access the elements of an RDF graph G:

Additionally, we define

Consider the RDF graph G given by:

PREFIX ex: http://ex.example/#
PREFIX foaf: http://xmlns.com/foaf/
PREFIX xsd: http://www.w3.org/2001/XMLSchema#

inst:Issue1 
    ex:state      ex:unassigned ;
    ex:reportedBy inst:User2 .

inst:User2
    foaf:name     "Bob Smith" ;
    foaf:mbox     <mailto:bob@example.org> .
         

The set Nodes(G) of the nodes of G is

    { inst:Issue1, ex:unassigned, inst:User2, "Bob Smith", <mailto:bob@example.org> }

The set Props(G) of the properties of G is

    { ex:state, ex:reportedBy, foaf:name, foaf:mbox }

The set InvProps(G) of inverse properties of G is

    { ^ex:state, ^ex:reportedBy, ^foaf:name, ^foaf:mbox }

The set neigh(G, inst:User2), neighbourhood of the node inst:User2 is the set composed of the following triples and inverse triples:

    inst:User2  foaf:name  "Bob Smith" .
    inst:User2  foaf:mbox  <mailto:bob@example.org> .
    inst:User2  ^ex:reportedBy  inst:Issue1 .

The Shapes Language

A shapes schema defines a set of named constraints, called shapes. The name of a shape is also called shape label.

Shapes Schema

A shapes schema Sch is given by a couple Sch = (SLabels, def) where

The structure of a shapes schema is captured by the following abstract syntax.

ShapesSchema ::= (ShapeLabel ShapeExpression)+

Typing of a Graph by a Shapes Schema

A typing is a set of node-label associations of the form (node,T) where node is a node in a graph, and T is a shape label from a schema. A node-label association intuitively indicates that the node node satisfies the constraint given by the definition of the shape label T.

Formally, let Sch = (SLabels, def) be a shapes schema and G be a graph. A typing of G by Sch is a subset of Nodes(G) × SLabels.

Consider a shapes schema Sch = (SLabels, def) with SLabels = {<IssueShape>, <UserShape>, <EmployeeShape>}.

An example of a typing of the graph G by Sch is:

  • typing = { (inst:Issue1, <IssueShape>), (inst:User2, <UserShape>), (inst:User2, <EmployeeShape>) }

Shape Expressions

The def function associates a shape expression with every shape label in a shapes schema.

Abstract Syntax

ShapeExpression ::= AtomicShape
| ShapeAnd
| ShapeOr
| ShapeNot
AtomicShape ::= NodeConstraint | NeighbourhoodConstraint
NodeConstraint ::= a set of IRI or RDFLiteral or the special value blank
NeighbourhoodConstraint ::= TripleExpression
ShapeAnd ::= ShapeExpression 'AND' ShapeExpression
ShapeOr ::= ShapeExpression 'OR' ShapeExpression
ShapeNot ::= 'NOT' ShapeExpression

A shape expression is a Boolean combination of two kinds of atomic components:

  • a node constraint (NodeConstraint) that defines the set of allowed values of a node. In shapes schemas, a node constraint can be an arbitrary set that contains IRIs or RDFLiteral, or the special value blank that stands for any BNode. In practice (in the concrete syntax), a node constraint can be defined using different ways of restraining the type of a node: string facet, numeric facet, node kind, a stem [TODO: reference to the concrete syntax].
  • a neighbourhood constraint (NeighbourhoodConstraint) that defines a constraint on the allowed neighbourhood of a node, that is, the allowed triples that contain this node as subject or object. A neighbourhood constraint is itself defined as a (TripleExpression).

Let U, V, int be node constraints, that is, sets which elements are IRI or RDFLiteral or the special value blank, with

  • U = { ex:unassigned, ex:assigned, ex:unknown }
  • V = { 1, 2, 5 }
  • int the set of all integer values

Let E1, E2 be triple expressions.

Examples of value expressions are

  • int AND (NOT V)
  • U AND (E1 AND (NOT E2))

Semantics

The semantics of a shape expression is defined relative to a typing.

Let Sch = (SLabels, def) be a shapes schema and G be a graph. For a node in Nodes(G), a typing of G by Sch, and a shape expression S, we say that node satisfies S with typing, written typing, node ⊢ S, as defined recursively on the structure of S on the figure above, where:

  • V is a NodeConstraint (in the se-node-constr and se-node-constr-blank rules);
  • E is a TripleExpression (in rule se-neighb-constr);
  • the ⊧ symbol in the premise of the se-neighb-constr rue the relation that defines whether the neighbourhood of a node matches a triple expression, and is defined below;
  • S, S1, S2 are ShapeExpressions;
  • the typing, node ⊬ S in the premise of the se-shape-not rule means that it is impossible to construct a proof for typing, node ⊢ S.

Let U, V, int, E1, E2 be as in the previous example.

Whatever the value of typing, it holds

  • typing, 3 ⊢ int AND (NOT V)

If typing = { (ex:assigned, E1), (ex:assigned, E2), (ex:unknown, E1), (ex:somenode, E1) }, then

  • typing, ex:assigned ⊬ U AND (E1 AND (NOT E2))
  • typing, ex:unknown ⊢ U AND (E1 AND (NOT E2T))
  • typing, ex:somenode ⊬ U AND (E1 AND (NOT E2))

Triple Expressions

Triple expressions are used for defining neighbourhood constraints within a shape expression.

Abstract syntax

TripleExpression ::= EmptyTripleExpr
| TripleConstraint
| OneOfTripleExpr
| EachOfTripleExpr
| RepeatedTripleExpr

EmptyTripleExpr ::= 'EMPTY'
TripleConstraint ::= PropertySet ShapeRef
PropertySet ::= a finite or co-finite subset of Prop ∪ InvProp
ShapeRef ::= a shape label from SLabels

OneOfTripleExpr ::= TripleExpression '|' TripleExpression
EachOfTripleExpr ::= TripleExpression ';' TripleExpression

RepeatedTripleExpr ::= TripleExpression '{' MinCard ';' MaxCard '}'
MinCard ::= a natural number
MaxCard ::= a natural number or '*'

A triple expression is matched by a set of triples that come from the neighbourhood of a node in an RDF graph.

An atomic triple expression can be

  • the empty triple expression, that is matched only by an empty set of triples;
  • a triple constraint, that is matched by a set containing a single triple. A triple constraint is composed of a set of properties or inverse properties, that define the allowed values for the predicate of a triple, as well as a shape label, that is a reference to shape expression to be satisfied by the node in the triple opposite to the focus node.

Triple expressions allow for two binary operators, the some-of operator, denoted |, and the each-of operator, denoted as a semicolon ;.

  • A some-of triple expression is matched by a set of triples if one or the other of its sub-expressions is matched by that set of triples.
  • An each-of triple expression is matched by a set of triples if that set can be split into two disjoint subsets such that one of the subsets matches the first sub-expression of the each-of expression, and the other subset matches the other sub-expression.
These however should not be considered as disjunction and conjunction, as we will see later on when the semantics will be formally defined.

Finally, a triple expression can be repeated by specifying a minimal cardinality that is a natural number, and a maximal cardinality that is a natural number of an unbounded value denoted as a star *. A repeated triple expression TExpr{min;max} is satisfied by a set of triples if that set can be split into k disjoint subsets such that min ≤ k ≤ max, and each of these subsets matches the sub-expression TExpr.

Moving the mouse over the elements of the list below will highlight the corresponding elements in the triple expression.

  • triple constraint (TripleConstraint)
  • some-of triple expression (OneOfTripleExpr)
  • each-of triple expression (EachOfTripleExpr)
  • repeated triple expression (RepeatedTripleExpr)
  • triple constraint with an inverse property
  • triple constraint with a non-singleton co-finite set of properties
  • shape reference (ShapeRef)

# triple expression of the shapes language

{rdf:type} <SL_1> ; {foaf:name} <SL_2> ; ( {foaf:mbox} <SL_3> {1;3} | {foaf:adress} <SL_3> ) ; {^ex:leader} <SL_4> ; C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox, ^ex:leader} <SL_all> *

Where C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox} = (Prop ∪ InvProp) \ {rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox}, that is, the complement of the set {rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox} within the set of all properties and inverse properties.

Semantics

A triple expression is matched by a set of triples and inverse triples. In practice, the set will be obtained as the neighbourhood of some node in an RDF graph, or will be a subset of such neighbourhood. Such set is composed of triples of the form (node, prop, node') or (node, ^prop, node'), where prop is a property and ^prop is an inverse property.

For a set of triples or inverse triples N from a graph G, a triple expression E, and a typing of G by Sch, we say that N matches E for typing, written typing, Neigh ⊧ E, as defined on the figure above recursively on the structure of E, where

  • in rule te-triple-constr, P L is a triple constraint, with P a subset of Prop ∪ InvProp, and L a shape label in SLabels;
  • min and max are the minimum, respectively maximum cardinality in a RepetedTripleExpr (in rule te-repet);

Example of matching a triple constraint.

# if typing is such that
(:n1, <SL_1>) belongs to typing
(:n2, <SL_2>) does not belong to typing

# then 
typing, {(:n :p :n1)} ⊧ {:p}  <SL_1>
typing, {(:n :p :n1)} ⊧ {:p, :q}  <SL_1>
typing, {(:n :p :n2)} ⊭ {:p} <SL_2>

Examples of satisfying an each-of triple expression.

# if typing is such that it contains the following node-label associations
(:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>)

# then
typing, {(:n :p :n1), (:n :p :n2)}  ⊧  {:p} <SL_1> ; {:q} <SL_2>
typing, {(:n :p :n1), (:n :p :n3)}  ⊧  {:p} <SL_1> ; {:p} <SL_1>
typing, {(:n :p :n1)}               ⊭  {:p} <SL_1> ; {:p} <SL_1>

Examples of satisfying a some-of triple expression.

# if typing is such that it contains the following node-label associations
(:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>)

# then
typing, {(:n :p :n1)}               ⊧   {:p} <SL_1> | {:q} <SL_2>
typing, {(:n :p :n1), (:n :p :n2)}  ⊭   {:p} <SL_1> | {:q} <SL_2>

Examples of satisfying a triple expression with some-of and each-of operators.

# if typing is such that it contains the following node-label associations
(:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>) 

# then
typing, {(:n :p :n1)}                ⊧  {:p} <SL_1> | {:q} <SL_2>
typing, {(:n :p :n1), (:n :p :n2)}   ⊭  {:p} <SL_1> | {:q} <SL_2>

Consider the following triple expression and RDF graph.

# graph G

ex:lp    rdf:type     foaf:Person;
         foaf:name    "Louisa Pavlova";
         foaf:mbox    "louisa@shex.org";
         foaf:mbox    "lpavlova@jobs.com";
         ex:memberOf  ex:gr2 .
ex:gr2   ex:leader    ex:lp .
ex:bug   ex:reportedBy  ex:lp .
# typing is such that it contains the following node-label associations

(n, <SL_all>)  for all node n in G
(foaf:Person, <SL_Person>)
("Louisa Pavlova", <SL_string>)
("louisa@shex.org", <SL_mbox_string>) ("lpavlova@jobs.com", <SL_mbox_string>)
(ex:gr2, <SL_Group>)
# triple expression TExpr

{rdf:type}  <SL_Person> ;
{foaf:name} <SL_string> ;
( {foaf:mbox} <SL_mbox_string> {1;3} | {foaf:adress} <SL_adr> ) ;
{^ex:leader} <SL_Group> ;
C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox, ^ex:leader} <SL_all> *
# neigh(G, ex:lp), the neighbourhood of ex:lp in G

ex:lp  rdf:type     foaf:Person .
ex:lp  foaf:name    "Louisa Pavlova" .
ex:lp foaf:mbox "louisa@shex.org" . ex:lp foaf:mbox "lpavlova@jobs.com" .
ex:lp ^ex:leader ex:gr2
ex:lp ex:memberOf ex:gr2 ex:lp ^ex:reportedBy ex:bug

Then typing, neigh(G, ex:lp) ⊢ TExpr. Move the mouse over the triples in neigh(G, ex:lp) to see which triples contribute to satisfy which of the sub-expressions of TExpr.

Correct typing

A typing of G by Sch is called correct if for all node-label association (node, T) in typing, it is the case that typing, neigh(node) ⊢ def(T).

Shapes Schemas with Stratified Negation

As we can see from the syntax, shape expressions can refer other shape definitions through a ShapeRef within a TripleConstraint. In order to avoid undesirable interaction between recursion and the negation in value expressions, we consider shapes schemas with stratified negation only, as defined in the sequel.

We start by defining the dependency graph for a shapes schema, based on which we will define the stratification.

For a schema Sch = (SLabels, def), its dependency graph is a graph which set of nodes is SLabels and that has an edge from the shape label L1 to the shape label L2 if and only if L2 appears in the shape expression def(L1). There are two kinds of edges in the graph:

We say that the shapes schema Sch is with stratified negation if its dependency graph does not contain a cycle that goes through a dep- edge.

The following is an example of a shapes schema and its dependency graph.

# a shapes schema

<SL_all>:= IRI ∪ RDFLiteral ∪ {blank}
<SL_1> := NOT ({:c} <SL_5>)
<SL_2> := {:a} <SL_4> ; {:c} <SL_5> 
<SL_3> := NOT ({:c} <SL_all>) AND {:a} <SL_2>
<SL_4> := {:a} <SL_3>
<SL_5> := {:c} <SL_all>
# the edges of the dependency graph
# ---> is a negative dependency edge
# -+-> is a positive dependency edge

<SL_1> ---> <SL_5>

<SL_2> -+-> <SL_4>
<SL_2> -+-> <SL_5>

<SL_3> ---> <SL_all>
<SL_3> -+-> <SL_2>

<SL_4> -+-> <SL_3>

<SL_5> -+-> <SL_all>

If a schema is with stratified negation, then there exists a partition of the shape labels of the schema such that there does not exist a negative dependency edge between the shape labels of the same part of the partition. Moreover, the parts of the partition can be ranked in a way that all negative dependency edges always go from a partition with a higher rank to a partition with a lower rank. Such partitioning is called a stratification.

More formally, if Sch = (SLabels, def) is a shapes schema with stratified negation, then there exists a natural number k and a partitioning function strat: SLLabels → 1..k that with every shape label from SLabels associates a natural number between 1 and k, called the stratum of that label. Moreover, the partitioning function strat satisfies the following property: for all shape labels L,L' in SLabels, if dep-(L,L'), then strat(L) > strat(L'), and if dep+(L,L'), then strat(L) ≥ strat(L'). The partitioning function strat is called a stratification of Sch.

A stratification guarantees the following property: the definition of a shape label can depend negatively only on shape labels that are in strictly lower stratums, and can depend positively only on shape labels that are on the same stratum or on a lower stratum. In particular, every circular dependency (the definition of a shape label depends on itself) enforces that all the intermediate shape labels are on the same stratum.

A stratification for the schema on the previous example is:

strat(<SL_2>) = 3
strat(<SL_3>) = 3
strat(<SL_4>) = 3

strat(<SL_1>) = 2

strat(<SL_5>) = 1
strat(<SL_all>) = 1

The shape labels <SL_2>, <SL_3>, <SL_4> are on the same stratum. One can see that there is a circular dependency between these three shape labels. The two negative dependencies in that schema go from <SL_1> to <SL_5> (from stratum 2 to stratum 1), and from <SL_3> to <SL_all> (from stratum 3 to stratum 1).

Another possible stratification for that schema is to have only two stratums, and <SL_1> on the same stratum as <SL_2>, <SL_3>, <SL_4>.

Remark that it is easy to check whether a schema is with stratified negation: it is enough to construct the dependency graph, and check that it does not contain any cycle that goes through a negative dependency.

Maximal Typing

A schema with stratified negation admits a unique typing defined here after. This typing is used for defining the semantics of shapes schemas: we say that a node in an RDF graph satisfies the constraint of a shape label if and only if the corresponding node-label association belongs to this particular typing. This typing is maximal as it forces to add in the typing every provable node-label association.

Let Sch = (SLabels, def) be a schema with stratified negation and let strat: TELabels → 1..k be a stratification for Sch. Consider a graph G. We define a sequence Sch1, ...., Schk of sub-schemas of Sch following the stratification strat, as follows. For all i in 1..k, Schi = (SLabelsi, defi) where SLabelsi is the subset of SLabels composed of the shape labels on the stratums 1..i. More formally,

  • SLabelsi = strat-1(1) ∪ ... ∪ strat-1(i),
  • defi is def restricted on SLabelsi.
where strat-1 is the inverse of the stratification function, thus strat-1(i) is the set of all shape labels which stratum is i.

It follows from this definition that Schk is equal to Sch.

Then we define a sequence of typings typing1, ..., typingk such that for all i in 1..k, typing1 is the maximal typing for Schi. Formally,

  • typing1 is the union of all correct typings of G by Sch1;
  • for i in 2..k, typingi is the union of all correct typings of G by Schi that coincide with typingi-1 on the SLabelsi-1.

Note that because Sch is stratified and because of the way the sub-schemas Schi were defined, it is the case that all the typings typingi are correct.

The typing typingk as defined above is called the maximal typing of G by Sch.

TODO: an example of the sequence of typings


Compilation of a ShEx Schema into a Shapes Schema

Examples

Closed shape expression

Every shape label from the ShEx schema is used as shape label in the shapes schema, which definition is the compilation of the definition from the ShEx schema.

Every shape expression that appears in a triple constraint of the ShEx schema yields a fresh shape label having the same definition in the shapes schema. This does not apply to shape expressions that are references.

The translation of a CLOSED triple expression in the ShEx schema translates almost directly to a triple expression in the shapes schema, exept that predicates are replaced with singleton sets, and value expressions are replaced with the corresponding freshly introduced shape labels

# ShEx schema

<IssueShape>
CLOSED { ex:state [ex:unassigned ex:assigned] ; ex:reportedBy @<UserShape> ; ex:reportedOn xsd:dateTime OR xsd:date ; ex:related @<IssueShape>* ; ^ex:related @<IssueShape>* }
<UserShape> { PATTERN "http://example.org/User#.*" }
# shapes schema

<IssueShape> := 
{ {ex:state} <SL_1> ; {ex:reportedBy} <UserShape> {ex:reportedOn} <SL_3> ; {ex:related} <IssueShape>* ; {^ex:related} <IssueShape>* }
<UserShape> { Set(PATTERN "http://example.org/User#.*") } <SL_1> := {ex:unassigned, ex:assigned} <SL_3> := xsd:dateTime OR xsd:date

Closed shape expression with EXTRA modifier

As in the previous example, the value expressions of the ShEx schema yield shape labels with the same definition. Here, the resulting fresh shape labels are <SL_5>, <SL_6>, <SL_7>, <SL_8>.

The triple expression from the ShEx schema is translated into a triple expression in the shapes schema. One part of that triple expression is a direct translation of the triple expression from the ShEx schema.

The EXTRA modifier results in additional triple constraints in the shapes schema. There is one such triple constraint with * cardinality for every extra property. For every extra property, the value expression is the negation of the value expressions that occur with the same property.

# ShEx schema
	      
	      
	      






<UserShape>
CLOSED EXTRA rdf:type foaf:mbox {
rdf:type [foaf:Person] ; rdf:type [ex:User] ; ( foaf:name xsd:string | foaf:givenName xsd:string+ ; foaf:familyName xsd:string ); foaf:mbox IRI
}
# shapes schema

<SL_5> := {foaf:Person}
<SL_6> := {ex:User}
<SL_7> := xsd:string
<SL_8> := IRI

<SL_UserShape_Extra_rdftype> := NOT{foaf:Person} AND NOT{ex:User}
<SL_UserShape_Extra_foafmxbox> := NOT IRI


<UserShape> := 
{
{rdf:type} <SL_5> ; {rdf:type} <SL_6> ; ( {foaf:name} <SL_7> | {foaf:givenName} <SL_7>+ ; {foaf:familyName} <SL_7> ); {foaf:mbox} <SL_8> ;
{rdf:type} <SL_UserShape_Extra_rdftype> * ; {foaf:mbox} <SL_UserShape_Extra_foafmbox> * ;
}

Non closed shape expression

As for the case with EXTRA properties, the shape expression from the ShEx schema is tranlated into a triple expression with an additional triple constraint with * cardinality. The set of properties of the latter is the set of all properties that do not appear in the shape expression. The value expression accepts all values.

# ShEx schema





<SomeShape> 
{ ex:p xsd:int* ; ( ex:q xsd:int | ex:r IRI )? }
# shapes schema

<SL_9>  := xsd:int
<SL_10> := IRI

<SL_all_values>  := IRI ∪ RDFLiteral ∪ {blank}

<SomeShape> := 
{ {ex:p} <SL_9>* ; ( {ex:q} <SL_9> | {ex:r} <SL_10> )? ; PropSet <SL_all_values>* }
# with PropSet = (Prop ∪ InvProp) \ {ex:p, ex:q, ex:r}

Complex shape definition

The shape label <EmployeeShape> is defined as a conjunction of node constrints and triple expressions. In the shapes schema, this yields a ShapeAnd shape expression with one conjunct per conjunct in the ShEx schema.

# ShEx schema

<EmployeeShape>
PATTERN "^http:/example.org/.*"

CLOSED { foaf:phone IRI*; foaf:mbox IRI }
AND
CLOSED { ( foaf:phone PATTERN "^tel:\\+33" ; foaf:mbox PATTERN "\\.fr$" )? ; ( foaf:phone PATTERN "^tel:\\+44" ; foaf:mbox PATTERN "\\.uk$")? }
# shapes schema

<SL_11> := Set(PATTERN "^tel:\\+33")
<SL_12> := Set(PATTERN "\\.fr$")
<SL_13> := Set(PATTERN "^tel:\\+44")
<SL_14> := Set(PATTERN "\\.uk$")
<SL_15> := IRI

<EmployeeShape> := {
Set(PATTERN "^http:/example.org/.*") AND
{foaf:phone} <SL_15>* ; {foaf:mbox} <SL_15>
AND
( {foaf:phone} <SL_11> ; {foaf:mbox} <SL_12> )? ; ( {foaf:phone} <SL_13> ; {foaf:mbox} <SL_14> )?
}

Negated triple expression

The negated triple constraint is treated as if its property was an extra property. The corresponding triple constraint in the translation has a {0;0} cardinality.

# ShEx schema

<SomeShape> 
CLOSED {
    ex:p xsd:int* ;
    ( ! ex:q xsd:int  
      | ex:r IRI )? 
}
# shapes schema

<SL_9>  := xsd:int
<SL_10> := IRI
<SL_Extra> := NOT xsd:int

<SomeShape> := { 
    {ex:p}  <SL_9>* ;
    ( {ex:q}  <SL_9>{0;0} 
      | {ex:r} <SL_10> )? ;

    {ex:q}  <SL_Extra>*
}

Definition of the Compilation

TODO

Validation of a Graph against a ShEx Schema

Let Shex be a ShEx schema with set ShapeNames of shape names. For a graph G, a validation problem is defined as a set val of node-label name associations. That is, val is a subset of Nodes(G) × ShapeNames. Then the validation problem consists in checking whether for every node-label name association (node, sname) in val, it is the case that node satisfies the constraint defined by sname.

Formally, let Sch be the shapes schema that corresponds to Shex, and for every shape name sname in Shex, let valConstr(sname) be the corresponding value constraint name in Sch. Then the answer to the validation problem val is yes if and only if for all (node, sname) in val, it is the case that typing, node ⊢ valConstr(sname), where typing is the maximal typing of G by Sch.

Remark that the validation problem does not necessarily require to compute typing entirely. Only a portion of typing can be computed, starting from val and adding only the node-label associations that are necessary for proving, or disproving, the node-label name associations from val.

Appendix: Abstract Syntax of ShEx Schemas [OBSOLETE]

The abstract syntax of ShEx schemas is given by the following rules. All non-terminals of the abstract syntax are terminated by an underscore (_), this in order to avoid confusion with the non-terminals of the abstract syntax of the shapes language.

ShapeExpressionsSchema_ ::= (ShapeLabel_ ShapeDefinition_)+
ShapeLabel_ ::= an identifier
ShapeDefinition_ ::= NodeConstraint_* ShapeExpressions_?
ShapeExpressions_ ::= ShapeExpression_ ('AND' ShapeExpression_)*
NodeConstraint_ ::= a subset of IRI ∪ RDFLiteral ∪ {blank}

ShapeExpression_ ::= 'CLOSED'? '^CLOSED'? ('EXTRA' ExtraPropSet_)? TripleExpression_
ExtraPropSet_ ::= a set of properties and inverse properties
TripleExpression_ ::= EmptyTripleExpr_
| TripleConstraint_
| OneOfTripleExpr_
| EachOfTripleExpr_
| RepeatedTripleExpr_

EmptyTripleExpr_ ::= 'EMPTY'
TripleConstraint_ ::= (Property_ | InvProperty_) ValueExpression_
Property_ ::= an IRI
InvProperty_ ::= '^'Property_

OneOfTripleExpr_ ::= TripleExpr_ '|' TripleExpr_
EachOfTripleExpr_ ::= TripleExpr_ ';' TripleExpr_

RepeatedTipleExpr_ ::= TripleExpr_ '{' MinCard_ ';' MaxCard_ '}'
MinCard_ ::= a natural number
MaxCard_ ::= a natural number or '*'

ValueExpression_ ::= ValueConstraint_
| ConjunctiveValueExpr_
| DisjunctiveValueExpr_
ValueConstraint_ ::= NodeConstraint_*
| ShapeRef_
ShapeRef_ ::= '@' ShapeLabel_
ConjunctiveValueExpr_ ::= ValueExpression_ 'AND' ValueExpression_
DisjunctiveValueExpr_ ::= ValueExpression_ 'OR' ValueExpression_

A shape expressions schema defines a set of shape labels, together with their definition.

TODO: explanation of the constructs of a ShEx schema

In this example we illustrate how concrete syntax of ShEx in Turtle format corresponds to the abstract syntax presented above.

Here is a ShEx schema written in Turtle format. The leading numbers are line numbers used for reference.


      RREFIX ex:   <http://ex.example/#>
      PREFIX foaf: <http://xmlns.com/foaf/>
      PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

1     <IssueShape> CLOSED EXTRA rdf:type
2     {                           
3         rdf:type [ex:Issue];
4         ex:state [ex:unassigned ex:assigned]; 
5                                            
6         ex:reportedBy @<UserShape>;   
7         ex:reportedOn xsd:dateTime;         
8         (                                   
9           ex:reproducedBy @<EmployeeShape>;  
10          ex:reproducedOn xsd:dateTime OR xsd:date 
11        )?;
12        ^ex:related @<IssueShape>*            
13    }
14
15    <UserShape> PATTERN "^http:/example.org/.*" {                     
16        (                                   
17           foaf:name xsd:string             
18         |                                  
19           foaf:givenName xsd:string+;      
20           foaf:familyName xsd:string
21        );     
22        foaf:mbox IRI              
23    }
24
25    <EmployeeShape> {        
26        foaf:phone IRI*;          
27        foaf:mbox IRI             
28    } AND {
29        ( foaf:phone PATTERN "^tel:\\+33"; 
30          foaf:mbox PATTERN "\\.fr$" )?;
31        ( foaf:phone PATTERN "^tel:\\+44"; 
32          foaf:mbox PATTERN "\\.uk$")?
33    }

We first explain some elements of the concrete syntax. A triple expression is surrounded by curly braces { and }. The square braces delimit sets of values. Parentheses are used as usual for defining the structre of an expression. We use abbreviations for some intervals of min and max cardinality, namely, '+' is the interval [1;*], '*' is the interval [0;*], and '?' is the interval [0;1].

Then, the schema here above defines three shape labels: <IssueShape>, <UserShape>, <EmployeeShape>.

The shape expression that defines the shape label <IssueShape> starts on line 1 after the shape label, and ends with the curly bracket on line 13. It has an empty list of node constraints, and a single shape definition. The shape definition has two modifiers, CLOSED, and EXTRA. The parameter of the EXTRA modifier is a singleton set that contains the property rdf:type.

The triple expression in the shape definition is given between the curly braces on lines 2 and 13. It is an each-of expression composed of six sub-expressions, separated by semicolons. The sub-expressions are as follows:

  • Four triple constraints on lines 3, 4, 6, 7.
  • Two repeated triple expressions, one on lines 8--11 with cardinality ? = [0;1], another one on line 12 with cardinality * = [0;*].

We explain the different forms of triple constraint that appear in the definition of <IssueShape>, by explaining their property or their value expression.

1     <IssueShape> CLOSED EXTRA rdf:type
2     {                           
3         rdf:type [ex:Issue];                            # the property is rdf:type, 
                                                          # the value expression is a node constraint 
                                                          #    that is a singleton set containing the IRI ex:Issue

4         ex:state [ex:unassigned ex:assigned];           # the property is ex:state
5                                                         # the value expression is a set with two IRI,
                                                          #    ex:unassigned and ex:assigned

6         ex:reportedBy @<UserShape>;                     # the value expression is a ShapeExprRef 

7         ex:reportedOn xsd:dateTime;                     # the value expression is a the set of literal values 
                                                          #    that have XSD type xsd:dateTime
8         (                                   
9           ex:reproducedBy @<EmployeeShape>;  
10          ex:reproducedOn xsd:dateTime OR xsd:date      # the value expression is DisjunctiveValueExpr,
                                                          #    whose sub-expressions are sets of literal values
11        )?;
12        ex:related @<IssueShape>*            
13    }

We now explain a few other elements of the syntax.

15    <UserShape> PATTERN "^http:/example.org/.*" {                     
         ...
23    }

The shape expression that defines <UserShape> is composed of one node constraint PATTERN "^http:/example.org/.*" and one shape definition between the curly braces lines 15 to 23. The node constraint PATTERN "^http:/example.org/.*" defines the set of IRI that satisfy a pattern given by a regular expression.

25    <EmployeeShape> {        
        ...
28    } AND {
        ...
33    }

The shape expression that defines <EmployeeShape> is composed of zero node constrainst, and two shape definitions, separated by an AND.