Document under construction.
This document describes the formal semantics of Shape Expressions (ShEx) schemas. The semantics is defined on a slight generalization of ShEx schemas, called the shapes language. The generalization allows for a more succinct abstract syntax on which the semantics of the shapes language is defined. In the generalization,
The Shape Expressions language is expected to remain stable with the exception of:
Shape expressions are defined using terms from RDF semantics:
	The following sets access the elements of an RDF graph 
Additionally, we define
	  Consider the RDF graph 
PREFIX ex: http://ex.example/#
PREFIX foaf: http://xmlns.com/foaf/
PREFIX xsd: http://www.w3.org/2001/XMLSchema#
inst:Issue1 
    ex:state      ex:unassigned ;
    ex:reportedBy inst:User2 .
inst:User2
    foaf:name     "Bob Smith" ;
    foaf:mbox     <mailto:bob@example.org> .
         
	
	  The set 
    { inst:Issue1, ex:unassigned, inst:User2, "Bob Smith", <mailto:bob@example.org> }
	
	
	  The set 
    { ex:state, ex:reportedBy, foaf:name, foaf:mbox }
	
	
	  The set 
    { ^ex:state, ^ex:reportedBy, ^foaf:name, ^foaf:mbox }
	
	
	  The set 
    inst:User2  foaf:name  "Bob Smith" .
    inst:User2  foaf:mbox  <mailto:bob@example.org> .
    inst:User2  ^ex:reportedBy  inst:Issue1 .
	
      A shapes schema defines a set of named constraints, called shapes. The name of a shape is also called shape label.
	A shapes schema 
The structure of a shapes schema is captured by the following abstract syntax.
	A typing is a set of node-label associations of the form 
	Formally, let 
	  Consider a shapes schema 
	  An example of a typing of the graph 
	The 
A shape expression is a Boolean combination of two kinds of atomic components:
Let 
	  Let 
Examples of value expressions are
The semantics of a shape expression is defined relative to a typing.
	Let 
	
      
	  Let 
	  Whatever the value of 
 If 
Triple expressions are used for defining neighbourhood constraints within a shape expression.
A triple expression is matched by a set of triples that come from the neighbourhood of a node in an RDF graph.
An atomic triple expression can be
	Triple expressions allow for two binary operators, the some-of operator, denoted 
	Finally, a triple expression can be repeated by specifying a minimal cardinality that is a natural number, and a maximal cardinality that is a natural number of an unbounded value denoted as a star 
Moving the mouse over the elements of the list below will highlight the corresponding elements in the triple expression.
# triple expression of the shapes language
{rdf:type}  <SL_1> ;
{foaf:name} <SL_2> ;
( {foaf:mbox} <SL_3> {1;3} | {foaf:adress} <SL_3> ) ;
{^ex:leader} <SL_4> ;
C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox, ^ex:leader} <SL_all> *
	  Where C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox} = (Prop ∪ InvProp) \ {rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox}, that is, the complement of the set {rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox} within the set of all properties and inverse properties.
	A triple expression is matched by a set of triples and inverse triples.
	In practice, the set will be obtained as the neighbourhood of some node in an RDF graph, or will be a subset of such neighbourhood.
	Such set is composed of triples of the form 
	For a set of triples or inverse triples 
	
      
Example of matching a triple constraint.
# if typing is such that
(:n1, <SL_1>) belongs to typing
(:n2, <SL_2>) does not belong to typing
# then 
typing, {(:n :p :n1)} ⊧ {:p}  <SL_1>
typing, {(:n :p :n1)} ⊧ {:p, :q}  <SL_1>
typing, {(:n :p :n2)} ⊭ {:p} <SL_2>
Examples of satisfying an each-of triple expression.
# if typing is such that it contains the following node-label associations
(:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>)
# then
typing, {(:n :p :n1), (:n :p :n2)}  ⊧  {:p} <SL_1> ; {:q} <SL_2>
typing, {(:n :p :n1), (:n :p :n3)}  ⊧  {:p} <SL_1> ; {:p} <SL_1>
typing, {(:n :p :n1)}               ⊭  {:p} <SL_1> ; {:p} <SL_1>
Examples of satisfying a some-of triple expression.
# if typing is such that it contains the following node-label associations
(:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>)
# then
typing, {(:n :p :n1)}               ⊧   {:p} <SL_1> | {:q} <SL_2>
typing, {(:n :p :n1), (:n :p :n2)}  ⊭   {:p} <SL_1> | {:q} <SL_2>
Examples of satisfying a triple expression with some-of and each-of operators.
# if typing is such that it contains the following node-label associations
(:n1, <SL_1>), (:n2, <SL_2>), (:n3, <SL_1>) 
# then
typing, {(:n :p :n1)}                ⊧  {:p} <SL_1> | {:q} <SL_2>
typing, {(:n :p :n1), (:n :p :n2)}   ⊭  {:p} <SL_1> | {:q} <SL_2>
Consider the following triple expression and RDF graph.
# graph G
ex:lp    rdf:type     foaf:Person;
         foaf:name    "Louisa Pavlova";
         foaf:mbox    "louisa@shex.org";
         foaf:mbox    "lpavlova@jobs.com";
         ex:memberOf  ex:gr2 .
ex:gr2   ex:leader    ex:lp .
ex:bug   ex:reportedBy  ex:lp .
# typing is such that it contains the following node-label associations
(n, <SL_all>)  for all node n in G
(foaf:Person, <SL_Person>)
("Louisa Pavlova", <SL_string>)
("louisa@shex.org", <SL_mbox_string>)
("lpavlova@jobs.com", <SL_mbox_string>)(ex:gr2, <SL_Group>)
# triple expression TExpr
{rdf:type}  <SL_Person> ;
{foaf:name} <SL_string> ;
( {foaf:mbox} <SL_mbox_string> {1;3} | {foaf:adress} <SL_adr> ) ;
{^ex:leader} <SL_Group> ;
C{rdf:type, foaf:name, foaf:givenName, foaf:familyName, foaf:mbox, ^ex:leader} <SL_all> *
# neigh(G, ex:lp), the neighbourhood of ex:lp in G
ex:lp  rdf:type     foaf:Person .
ex:lp  foaf:name    "Louisa Pavlova" .
ex:lp  foaf:mbox    "louisa@shex.org" .
ex:lp  foaf:mbox    "lpavlova@jobs.com" .ex:lp  ^ex:leader   ex:gr2
ex:lp  ex:memberOf  ex:gr2
ex:lp  ^ex:reportedBy  ex:bug
Then typing, neigh(G, ex:lp) ⊢ TExpr. Move the mouse over the triples in neigh(G, ex:lp) to see which triples contribute to satisfy which of the sub-expressions of TExpr.
	A 
	As we can see from the syntax, shape expressions can refer other shape definitions through a 
We start by defining the dependency graph for a shapes schema, based on which we will define the stratification.
	For a schema 
	We say that the shapes schema 
The following is an example of a shapes schema and its dependency graph.
# a shapes schema
<SL_all>:= IRI ∪ RDFLiteral ∪ {blank}
<SL_1> := NOT ({:c} <SL_5>)
<SL_2> := {:a} <SL_4> ; {:c} <SL_5> 
<SL_3> := NOT ({:c} <SL_all>) AND {:a} <SL_2>
<SL_4> := {:a} <SL_3>
<SL_5> := {:c} <SL_all>
# the edges of the dependency graph # ---> is a negative dependency edge # -+-> is a positive dependency edge <SL_1> ---> <SL_5> <SL_2> -+-> <SL_4> <SL_2> -+-> <SL_5> <SL_3> ---> <SL_all> <SL_3> -+-> <SL_2> <SL_4> -+-> <SL_3> <SL_5> -+-> <SL_all>
If a schema is with stratified negation, then there exists a partition of the shape labels of the schema such that there does not exist a negative dependency edge between the shape labels of the same part of the partition. Moreover, the parts of the partition can be ranked in a way that all negative dependency edges always go from a partition with a higher rank to a partition with a lower rank. Such partitioning is called a stratification.
	More formally, if 
A stratification guarantees the following property: the definition of a shape label can depend negatively only on shape labels that are in strictly lower stratums, and can depend positively only on shape labels that are on the same stratum or on a lower stratum. In particular, every circular dependency (the definition of a shape label depends on itself) enforces that all the intermediate shape labels are on the same stratum.
A stratification for the schema on the previous example is:
strat(<SL_2>) = 3 strat(<SL_3>) = 3 strat(<SL_4>) = 3 strat(<SL_1>) = 2 strat(<SL_5>) = 1 strat(<SL_all>) = 1
The shape labels <SL_2>, <SL_3>, <SL_4> are on the same stratum. One can see that there is a circular dependency between these three shape labels. The two negative dependencies in that schema go from <SL_1> to <SL_5> (from stratum 2 to stratum 1), and from <SL_3> to <SL_all> (from stratum 3 to stratum 1).
Another possible stratification for that schema is to have only two stratums, and <SL_1> on the same stratum as <SL_2>, <SL_3>, <SL_4>.
Remark that it is easy to check whether a schema is with stratified negation: it is enough to construct the dependency graph, and check that it does not contain any cycle that goes through a negative dependency.
A schema with stratified negation admits a unique typing defined here after. This typing is used for defining the semantics of shapes schemas: we say that a node in an RDF graph satisfies the constraint of a shape label if and only if the corresponding node-label association belongs to this particular typing. This typing is maximal as it forces to add in the typing every provable node-label association.
	Let 
	It follows from this definition that 
	Then we define a sequence of typings 
	Note that because 
	The typing 
TODO: an example of the sequence of typings
Every shape label from the ShEx schema is used as shape label in the shapes schema, which definition is the compilation of the definition from the ShEx schema.
Every shape expression that appears in a triple constraint of the ShEx schema yields a fresh shape label having the same definition in the shapes schema. This does not apply to shape expressions that are references.
The translation of a CLOSED triple expression in the ShEx schema translates almost directly to a triple expression in the shapes schema, exept that predicates are replaced with singleton sets, and value expressions are replaced with the corresponding freshly introduced shape labels
# ShEx schema
<IssueShape>
CLOSED {                 
    ex:state      [ex:unassigned ex:assigned] ; 
    ex:reportedBy @<UserShape> ;   
    ex:reportedOn xsd:dateTime OR xsd:date ;
    ex:related    @<IssueShape>* ;
    ^ex:related   @<IssueShape>*
}
<UserShape> {
    PATTERN "http://example.org/User#.*"
}
# shapes schema
<IssueShape> := 
{
    {ex:state}      <SL_1> ;
    {ex:reportedBy} <UserShape>
    {ex:reportedOn} <SL_3> ;
    {ex:related}    <IssueShape>*  ;
    {^ex:related}   <IssueShape>*
}
<UserShape> {
    Set(PATTERN "http://example.org/User#.*")
}
<SL_1>  := {ex:unassigned, ex:assigned}
<SL_3>  := xsd:dateTime OR xsd:date
	    As in the previous example, the value expressions of the ShEx schema yield shape labels with the same definition. Here, the resulting fresh shape labels are <SL_5>, <SL_6>, <SL_7>, <SL_8>.
The triple expression from the ShEx schema is translated into a triple expression in the shapes schema. One part of that triple expression is a direct translation of the triple expression from the ShEx schema.
The EXTRA modifier results in additional triple constraints in the shapes schema. There is one such triple constraint with * cardinality for every extra property. For every extra property, the value expression is the negation of the value expressions that occur with the same property.
# ShEx schema
	      
	      
	      
<UserShape>
CLOSED 
EXTRA rdf:type foaf:mbox {
    rdf:type [foaf:Person] ;
    rdf:type [ex:User] ;
    (                                   
       foaf:name xsd:string 
     |                                  
       foaf:givenName xsd:string+ ;      
       foaf:familyName xsd:string
    );
    foaf:mbox IRI
}
# shapes schema
<SL_5> := {foaf:Person}
<SL_6> := {ex:User}
<SL_7> := xsd:string
<SL_8> := IRI
<SL_UserShape_Extra_rdftype> := NOT{foaf:Person} AND NOT{ex:User}
<SL_UserShape_Extra_foafmxbox> := NOT IRI
<UserShape> := 
{
    {rdf:type}  <SL_5> ;
    {rdf:type}  <SL_6> ;
    (
        {foaf:name}       <SL_7>
    |
        {foaf:givenName}  <SL_7>+ ;
        {foaf:familyName} <SL_7>
    );
    {foaf:mbox} <SL_8> ;
    {rdf:type}  <SL_UserShape_Extra_rdftype> * ;
    {foaf:mbox} <SL_UserShape_Extra_foafmbox> * ;
}
As for the case with EXTRA properties, the shape expression from the ShEx schema is tranlated into a triple expression with an additional triple constraint with * cardinality. The set of properties of the latter is the set of all properties that do not appear in the shape expression. The value expression accepts all values.
# ShEx schema
<SomeShape> 
{
    ex:p xsd:int* ;
    ( ex:q xsd:int | ex:r IRI )? 
}
# shapes schema
<SL_9>  := xsd:int
<SL_10> := IRI
<SL_all_values>  := IRI ∪ RDFLiteral ∪ {blank}
<SomeShape> := 
{
    {ex:p}  <SL_9>* ;
    ( {ex:q}  <SL_9> | {ex:r} <SL_10> )? ;
    PropSet  <SL_all_values>*    
}
# with PropSet = (Prop ∪ InvProp) \ {ex:p, ex:q, ex:r}
The shape label <EmployeeShape> is defined as a conjunction of node constrints and triple expressions. In the shapes schema, this yields a ShapeAnd shape expression with one conjunct per conjunct in the ShEx schema.
# ShEx schema
<EmployeeShape>
PATTERN "^http:/example.org/.*"
CLOSED {        
    foaf:phone IRI*;          
    foaf:mbox IRI             
}
AND 
CLOSED {
    ( foaf:phone PATTERN "^tel:\\+33" ; 
      foaf:mbox  PATTERN "\\.fr$" )?   ;
    ( foaf:phone PATTERN "^tel:\\+44" ; 
      foaf:mbox  PATTERN "\\.uk$")?
}
# shapes schema
<SL_11> := Set(PATTERN "^tel:\\+33")
<SL_12> := Set(PATTERN "\\.fr$")
<SL_13> := Set(PATTERN "^tel:\\+44")
<SL_14> := Set(PATTERN "\\.uk$")
<SL_15> := IRI
<EmployeeShape> := {
    Set(PATTERN "^http:/example.org/.*")
    AND 
    {foaf:phone} <SL_15>* ;
    {foaf:mbox}  <SL_15>
    AND 
    ( {foaf:phone} <SL_11> ;
      {foaf:mbox}  <SL_12> )? ;
    ( {foaf:phone} <SL_13> ;
      {foaf:mbox}  <SL_14> )?
}
The negated triple constraint is treated as if its property was an extra property. The corresponding triple constraint in the translation has a {0;0} cardinality.
# ShEx schema
<SomeShape> 
CLOSED {
    ex:p xsd:int* ;
    ( ! ex:q xsd:int  
      | ex:r IRI )? 
}
# shapes schema
<SL_9>  := xsd:int
<SL_10> := IRI
<SL_Extra> := NOT xsd:int
<SomeShape> := { 
    {ex:p}  <SL_9>* ;
    ( {ex:q}  <SL_9>{0;0} 
      | {ex:r} <SL_10> )? ;
    {ex:q}  <SL_Extra>*
}
	  
	Let 
	Formally, let 
	Remark that the validation problem does not necessarily require to compute 
The abstract syntax of ShEx schemas is given by the following rules. All non-terminals of the abstract syntax are terminated by an underscore (_), this in order to avoid confusion with the non-terminals of the abstract syntax of the shapes language.
A shape expressions schema defines a set of shape labels, together with their definition.
TODO: explanation of the constructs of a ShEx schema
In this example we illustrate how concrete syntax of ShEx in Turtle format corresponds to the abstract syntax presented above.
Here is a ShEx schema written in Turtle format. The leading numbers are line numbers used for reference.
      RREFIX ex:   <http://ex.example/#>
      PREFIX foaf: <http://xmlns.com/foaf/>
      PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
      PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
1     <IssueShape> CLOSED EXTRA rdf:type
2     {                           
3         rdf:type [ex:Issue];
4         ex:state [ex:unassigned ex:assigned]; 
5                                            
6         ex:reportedBy @<UserShape>;   
7         ex:reportedOn xsd:dateTime;         
8         (                                   
9           ex:reproducedBy @<EmployeeShape>;  
10          ex:reproducedOn xsd:dateTime OR xsd:date 
11        )?;
12        ^ex:related @<IssueShape>*            
13    }
14
15    <UserShape> PATTERN "^http:/example.org/.*" {                     
16        (                                   
17           foaf:name xsd:string             
18         |                                  
19           foaf:givenName xsd:string+;      
20           foaf:familyName xsd:string
21        );     
22        foaf:mbox IRI              
23    }
24
25    <EmployeeShape> {        
26        foaf:phone IRI*;          
27        foaf:mbox IRI             
28    } AND {
29        ( foaf:phone PATTERN "^tel:\\+33"; 
30          foaf:mbox PATTERN "\\.fr$" )?;
31        ( foaf:phone PATTERN "^tel:\\+44"; 
32          foaf:mbox PATTERN "\\.uk$")?
33    }
	We first explain some elements of the concrete syntax. A triple expression is surrounded by curly braces { and }. The square braces delimit sets of values. Parentheses are used as usual for defining the structre of an expression. We use abbreviations for some intervals of min and max cardinality, namely, '+' is the interval [1;*], '*' is the interval [0;*], and '?' is the interval [0;1].
Then, the schema here above defines three shape labels: <IssueShape>, <UserShape>, <EmployeeShape>.
	  The shape expression that defines the shape label <IssueShape> starts on line 1 after the shape label, and ends with the curly bracket on line 13.
	  It has an empty list of node constraints, and a single shape definition.
	  The shape definition has two modifiers, 
The triple expression in the shape definition is given between the curly braces on lines 2 and 13. It is an each-of expression composed of six sub-expressions, separated by semicolons. The sub-expressions are as follows:
We explain the different forms of triple constraint that appear in the definition of <IssueShape>, by explaining their property or their value expression.
1     <IssueShape> CLOSED EXTRA rdf:type
2     {                           
3         rdf:type [ex:Issue];                            # the property is rdf:type, 
                                                          # the value expression is a node constraint 
                                                          #    that is a singleton set containing the IRI ex:Issue
4         ex:state [ex:unassigned ex:assigned];           # the property is ex:state
5                                                         # the value expression is a set with two IRI,
                                                          #    ex:unassigned and ex:assigned
6         ex:reportedBy @<UserShape>;                     # the value expression is a ShapeExprRef 
7         ex:reportedOn xsd:dateTime;                     # the value expression is a the set of literal values 
                                                          #    that have XSD type xsd:dateTime
8         (                                   
9           ex:reproducedBy @<EmployeeShape>;  
10          ex:reproducedOn xsd:dateTime OR xsd:date      # the value expression is DisjunctiveValueExpr,
                                                          #    whose sub-expressions are sets of literal values
11        )?;
12        ex:related @<IssueShape>*            
13    }
	We now explain a few other elements of the syntax.
15    <UserShape> PATTERN "^http:/example.org/.*" {                     
         ...
23    }
	
	The shape expression that defines <UserShape> is composed of one node constraint PATTERN "^http:/example.org/.*" and one shape definition between the curly braces lines 15 to 23. The node constraint PATTERN "^http:/example.org/.*" defines the set of IRI that satisfy a pattern given by a regular expression.
25    <EmployeeShape> {        
        ...
28    } AND {
        ...
33    }
	The shape expression that defines <EmployeeShape> is composed of zero node constrainst, and two shape definitions, separated by an AND.