Abstract

ShEx is a schema language for RDF graphs. It provides structural constraint on graph structures and lexical forms of literals. ShExPath addresses elements in a ShEx schema. This can be used to anchor validation results, identify regions of RDF graphs, or tie external annotations to elements in a schema.

Scope

ShExPath identifies Shape Expressions and Triple Expressions. It does not, on its own, identify portions of Node Constraint.

Document Conventions

This specification describes the structure of ShEx schemas in terms of ShExJ. Elements in the ShExJ are addressed in a javascript notation, e.g. S.shapes[N].

Data Model

A ShExPath is a Unicode string which defines a traversal of a Shape Expressions schema.

An item is one of these elements from a Shape Expressions schema:

A value is a sequence of items.

This specification uses XPath's notion of item, value, step and singleton to leverage shared understanding. Would other terms be more useful?

is a value a set?

A singleton is a sequence of exactly one item.

Do we some way to say a singleton is expected? This would be useful at end to say ShExPath should specify a single item, but also useful earlier in path for debugging.

A context label identifies the expected element type in the ShExJ where each item should be. The context labels are: ShapeAnd | ShapeOr | ShapeNot | EachOf | OneOf | NodeConstraint | TripleConstraint.

A index may be an integer i, RDF node (URL or blank node label) N with an optional integer Ni. Integer indexes are 1-based accessors to array elements in a schema. Out of range indexes are ignored (they do not contribute any new items to a result value). The function evaluateIndex maps a value to a new value:

ShExPath indexes are 1-based, following XPath's precedent. Change to 0-based?

A ShExPath is divided into steps (see StepExpr [@@ add link when grammar is markup] in the grammar). The initial step may be the character /. All following steps are separated by / and include an optional context label followed by a required index.

evaluateShExPath takes as arguments a ShExPath P, a schema S and an initial value V. It iteratively calls evaluateStep with each step and either the initial value or the results of the last invocation of evaluateStep.

evaluateStep(P, S, V) is a function that takes as arguments a ShExPath P, schema S and a value V and produces a new value. The operators are evaluated as follows: .

step action
/ value = the list of shapes in the schema
context label the items in value are tested for alignment with the context label. It is a fatal error if the item is not the same as the context label. This does not change the value.

is an error or just a filter?

index value = evaluateIndex(index, value)
empty string the evaluation is terminated, the result is value.

Should we advance through ShapeAnd automagically? This requires more aggressive searching but maybe that's worth it. This would be useful when shape expr is a ShapeAnd of node constraint and a shape:

<#UserShape> IRI /User\?id=[0-9]+/ {
  foaf:mbox IRI
}
        
Compare @<#UserShape>/2/foaf:mbox with @<#UserShape>/foaf:mbox.

Issue Example

This schema excerpt describes an issue that might appear in an issue tracking system.

<#IssueShape> {
  :name STRING MinLength 4;
  :category ["bug" "feature request"];
  :postedBy @<#UserShape>;
  :processing {
    :reproduced [true false];
    :priority xsd:integer
  }?
}

<#UserShape> IRI /User\?id=[0-9]+/ {
  (  foaf:name xsd:string
   | foaf:givenName +;
     foaf:familyName            
  );
  foaf:mbox IRI
}
      

Simple Access

Elements can be addressed either by label or index. For shape expressions, the label is the name of the shape expression. Shape expression labels or indexes are prefixed by "@". For triple constraints, the label is the name of the predicate for that triple constraints. Elements of triple expressions may be selected by index within the triple expression.

ShExPath value
/@<#IssueShape>/:categorythe :category constraint in the #IssueShape shape, addressed by the name of the shape expression followed by the name of the :category property.
/@<#IssueShape>/2the :category constraint in the #IssueShape shape, addressed by the name of the shape expression followed by the index of the :category property.
/@1/2the :category constraint in the #IssueShape shape, addressed by the index of the shape expression followed by the index of the :category property.

Nested Expressions

A traversal through a schema, for instance, the results of a validation, can be expressed as a ShExPath. Such a ShExPath can include shape references in value expressions. These are can be appended to the triple expression path with a / separator.

ShExPath value
/@<#IssueShape>/:postedBy/@<#UserShape>/foaf:mboxthe :postedBy constraint in the #IssueShape shape, which then references the :category property in the #UserShape.
/@1/3/@2/2the same path, but with ordinals.

Context Tests

For added clarity and confidence, the ShExJ type of all shape expressions and triple expressions addressable can be tested. If the expression type is specified and does not match the corresponding type in the schema, the path is invalid.

Shape Expressions

The axes ShapeAnd, ShapeOr, ShapeNot, EachOf, OneOf, NodeConstraint and TripleConstraint may be used to specify the expected expression type.

ShExPath value
/@<#UserShape>/shapeAnd 2/foaf:mboxthe foaf:mbox constraint in #UserShape's shape. Note that IRI /User\?id=[0-9]+/ {...} compiles to a ShapeAnd with the first component a NodeConstraint and the second being a shape..

Triple Expressions

The axes EachOf, OneOf and TripleConstraint may be used to specify the expected expression type.

ShExPath value
/1/ShapeAnd 2/EachOf 2the :category constraint in the #IssueShape shape, explicitly labeling the index axes.
/<#UserShape>/2/EachOf 1/OneOf 2the EachOf containing foaf:givenName and foaf:familyName in the #UserShape shape.
/<#UserShape>/2/EachOf 1/EachOf 2invalid path for the given schema.

Relative ShExPaths

Invocation of evaluateShExPath includes a value. ShExPaths starting with a / force the context to be the schema. If the value is a schema, indexes access entries in the ShExJ .shapes property. The application may provide a different value. A context path is a ShExPath which, when evaluated against the schema, produces an equivalent value. For instance validation results MAY be reported using relative ShExPaths.

Context ShExPath value
/@<#IssueShape>:categorythe :category constraint in the #IssueShape shape.

Relative ShExPaths can be concatonated to their context with / separator.

Disambiguation

If more than one triple constraint has the same predicate, they can be indexed by the order they would be encountered in a depth-first search. If the index is omitted, it is assumed to be 1.

<BPObs>  {
  :component { code: "systolic"; value: xsd:double };
  :component { code: "diastolic"; value: xsd:double };
  :component { code: "posture"; value: @<Postures> }?;
}
      
ShExPath value
/<BPObs>/:component 3the third :component triple expression (the one expecting a code of "posture").
/<BPObs>/:componentthe first :component triple expression (the one expecting a code of "systolic").

Grammar

[Try it in yacker], e.g. this nested validation result test.

ShExPathExpr             ::= AbsolutePathExpr | RelativePathExpr
AbsolutePathExpr         ::= "/" RelativePathExpr
RelativePathExpr         ::= StepExpr ("/" StepExpr)*
StepExpr                 ::= ContextTest? ExprIndex | ContextTest
ContextTest              ::= ShapeExprContext | TripleExprContext
ShapeExprContext         ::= "ShapeAnd" | "ShapeOr" | "ShapeNot"
                           | "NodeConstraint" | "Shape"
TripleExprContext        ::= "EachOf" | "OneOf" | "TripleConstraint"
ExprIndex                ::= ShapeExprIndex | TripleExprIndex
ShapeExprIndex           ::= "@" (INTEGER | ShapeExprLabel)
ShapeExprLabel           ::= iri | BLANK_NODE_LABEL
TripleExprIndex          ::= INTEGER | TripleExprLabel
TripleExprLabel          ::= (iri | BLANK_NODE_LABEL) INTEGER?
[136s]  iri              ::= IRIREF | prefixedName
[137s]  prefixedName     ::= PNAME_LN | PNAME_NS

@terminals
[18t]   IRIREF           ::= '<' ([^#x00-#x20<>\"{}|^`\\] | UCHAR)* '>'
[140s]  PNAME_NS         ::= PN_PREFIX? ':'
[141s]  PNAME_LN         ::= PNAME_NS PN_LOCAL
[142s]  BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
[19t]   INTEGER          ::= [+-]? [0-9]+
[26t]   UCHAR            ::= '\\u' HEX HEX HEX HEX
                           | '\\U' HEX HEX HEX HEX HEX HEX HEX HEX
[160s]  ECHAR            ::= '\\' [tbnrf\\\"\']
[164s]  PN_CHARS_BASE    ::= [A-Z] | [a-z]
                           | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF]
                           | [#x0370-#x037D] | [#x037F-#x1FFF]
                           | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF]
                           | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
                           | [#x10000-#xEFFFF]
[165s]  PN_CHARS_U       ::= PN_CHARS_BASE | '_'
[167s]  PN_CHARS         ::= PN_CHARS_U | '-' | [0-9]
                           | [#x00B7] | [#x0300-#x036F] | [#x203F-#x2040]
[168s]  PN_PREFIX        ::= PN_CHARS_BASE ((PN_CHARS | '.')* PN_CHARS)?
[169s]  PN_LOCAL         ::= (PN_CHARS_U | ':' | [0-9] | PLX)
                             ((PN_CHARS | '.' | ':' | PLX)*
                             (PN_CHARS | ':' | PLX))?
[170s]  PLX              ::= PERCENT | PN_LOCAL_ESC
[171s]  PERCENT          ::= '%' HEX HEX
[172s]  HEX              ::= [0-9] | [A-F] | [a-f]
[173s]  PN_LOCAL_ESC     ::= '\\' ('_' | '~' | '.' | '-' | '!' | '$' | '&' | "'"
                           | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '/' | '?'
                           | '#' | '@' | '%')
        

This grammar does not ensure that an absolute path start with a ShapeExprLabel and ShapeExprContext, which would require duplication of the RelativePathExpr and StepExpr productions. Nor does it ensure a striping between a shape expression labels, triple expression labels, and any nested shape expression labels in triple expression value expressions.