The Shape Expressions (ShEx) language describes RDF nodes and graph structures. A node constraint describes an RDF node (IRI, blank node or literal) and a shape describes the triples involving nodes in an RDF graph. The ShapeMap language associates RDF nodes with ShEx shapes. These associations can be used to state candidate shape maps as an input to the validation process. They can be the output of a validation process, where the ShEx engine reports the conformance of RDF nodes with respect to ShEx shapes.
This document defines the ShapeMap language. See the Shape Expressions Primer for an introduction to ShEx validation and the Shape Expressions Language for a formal definition of ShEx.
This document has been developed by the Shape Expressions Community Group.
This version is an initial editor's proposal to the CG.
This document assumes an understanding of the ShEx notation and terminology.
ShExMap uses the following terms from RDF semantics [[!rdf11-mt]]:
Conformance criteria are relevant to authors and authoring tool implementers. As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
ShapeMap: a finite set of shape associations. Each shape association has at least two members: a node and a shape, and when used for the result of validation, may have any of status, reason, or appInfo:
START
" for the start shape expression.
In this document, these members can be addressed with a '.
' operator. For instance, a shape association A
would have an A.node
member.
If the status member is absent, the status is assumed to be "conformant". The reason and appInfo members may also be absent but have no default value.
A triple pattern has a subject pattern, predicate IRI and object pattern.
A focus selector identifies the slot (subject or object) to be validated. A wildcard indicates that the slot may hold any value. A triple pattern has exactly one focus selector. A triple pattern maps to a SPARQL triple pattern with the following restrictions:
V
(the set of variables) is either a fresh variable or a known token to identify the focus node.I
in the SPARQL definitions).A query ShapeMap is a ShapeMap in which each shape association has only the members node and shape. The node property may directly identify an RDF node or it may select a set of RDF nodes.
A fixed ShapeMap is a query ShapeMap in which each node is an RDF node. The ShEx validation process takes as input a fixed ShapeMap.
A result ShapeMap is a fixed ShapeMap with the addition of optional members status, reason and appInfo.
No two shape associations in a ShapeMap may have the same combination of node and shape.
ShapeMaps are designed to express the goal or the result of validating an RDF node against a ShEx schema:
A query ShapeMap is converted to a fixed ShapeMap to be used as the input to the validation process.
This process takes as input a query ShapeMap and a graph and produces a fixed ShapeMap.
For a query ShapeMap Q
and a graph G
, for each shape association A
in Q
:
A
.node is an RDF node, A
is in the fixed ShapeMap.A
.node is a triple pattern, let P
be a SPARQL Triple Pattern where
A
's subject is a focus selector or a wildcard, P
's subject is a fresh variable, otherwise P
's subject is A
's subject.P
's predicate is A
's predicate.A
's object is a focus selector or a wildcard, P
's object is a fresh variable, otherwise P
's subject is A
's object.T
in G
which triplesMatches P
, the fixed ShapeMap has a shape association F
where F
.shape = A
.shape and
A
's subject is a focus selector, F
.node is T
's subject.A
's object is a focus selector, F
.node is T
's object.A triple T
triplesMatches a pattern P
if for every term in P
(subject, predicate and object) which is not focus selector or a wildcard, the corresponding term in T
is the same RDF term.
A fixed ShapeMap is a set; if the same shape association is imputed multiple times, it appears in the fixed ShapeMap only once.
ShapeMaps can be easily transmitted and understood with a specialized syntax.
A query ShapeMap can include shape associations with both RDF nodes and triple patterns.
rdf:type
(<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
) property.
The ShapeMap grammar includes the expression of IRIs with prefix declarations and relative IRIs whose resolution depends on a resolution context
.
A resolution context
is a base IRI and a map of prefix to namespace IRI.
Though it is common practice to resolve shape references against a resolution context
found in the schema and node references agianst a resolution context
found in the data (e.g. Turtle prefixes), this specification does not specifiy that behavior.
Production numbers followed by a letter correspond to productions in other grammars:
[1 ] | shapeMap | ::= | shapeAssociation (',' shapeAssociation)* |
[2 ] | shapeAssociation | ::= | nodeSpec shapeSpec |
[3 ] | nodeSpec | ::= | objectTerm | triplePattern |
[4 ] | subjectTerm | ::= | iri | BLANK_NODE_LABEL |
[5 ] | objectTerm | ::= | subjectTerm | literal |
[6 ] | triplePattern | ::= | '{' "FOCUS" predicate (objectTerm | '_') '}' |
[7 ] | shapeSpec | ::= | '@' (iri | "START") | AT_START |
[13t] | literal |
::= | rdfLiteral | numericLiteral | booleanLiteral |
[16t] | numericLiteral |
::= | INTEGER | DECIMAL | DOUBLE |
[65x] | rdfLiteral |
::= | langString | string ("^^" iri)? |
[134s] | booleanLiteral |
::= | "true" | "false" |
[135s] | string |
::= | STRING_LITERAL1 | STRING_LITERAL_LONG1 |
[66x] | langString |
::= | LANG_STRING_LITERAL1 | LANG_STRING_LITERAL_LONG1 |
[4 ] | predicate | ::= | iri | RDF_TYPE |
[136s] | iri |
::= | IRIREF |
TerminalsText is matched against the longest matching terminal. The PASSED TOKENS below may appear between any terminals or literal strings which appear in the grammar above. | |||
[18t] | <IRIREF > |
::= | "<" ([^#0000- <>\"{}|^`\\] | UCHAR)* ">" |
[142s] | <BLANK_NODE_LABEL > |
::= | "_:" (PN_CHARS_U | [0-9]) ((PN_CHARS | ".")* PN_CHARS)? |
[16] | <RDF_TYPE > |
::= | "a" |
[17] | <AT_START > |
::= | "@START" |
The <AT_START > terminal has precendence over LANGTAG | |||
[145s] | <LANGTAG > |
::= | "@" ([a-zA-Z])+ ("-" ([a-zA-Z0-9])+)* |
[19t] | <INTEGER > |
::= | [+-]? [0-9]+ |
[20t] | <DECIMAL > |
::= | [+-]? [0-9]* "." [0-9]+ |
[21t] | <DOUBLE > |
::= | [+-]? ([0-9]+ "." [0-9]* EXPONENT | "."? [0-9]+ EXPONENT) |
[155s] | <EXPONENT > |
::= | [eE] [+-]? [0-9]+ |
[156s] | <STRING_LITERAL1 > |
::= | "'" ([^'\\\n\r] | ECHAR | UCHAR)* "'" |
[157s] | <STRING_LITERAL2 > |
::= | '"' ([^\"\\\n\r] | ECHAR | UCHAR)* '"' |
[158s] | <STRING_LITERAL_LONG1 > |
::= | "'''" ( ("'" | "''")? ([^\\'\\] | ECHAR | UCHAR) )* "'''" |
[159s] | <STRING_LITERAL_LONG2 > |
::= | '"""' ( ('"' | '""')? ([^\"\\] | ECHAR | UCHAR) )* '"""' |
[73x] | <LANG_STRING_LITERAL1 > |
::= | "'" ([^'\\\n\r] | ECHAR | UCHAR)* "'" LANGTAG |
[74x] | <LANG_STRING_LITERAL2 > |
::= | '"' ([^\"\\\n\r] | ECHAR | UCHAR)* '"' LANGTAG |
[75x] | <LANG_STRING_LITERAL_LONG1 > |
::= | "'''" ( ("'" | "''")? ([^\\'\\] | ECHAR | UCHAR) )* "'''" LANGTAG |
[76x] | <LANG_STRING_LITERAL_LONG2 > |
::= | '"""' ( ('"' | '""')? ([^\"\\] | ECHAR | UCHAR) )* '"""' LANGTAG |
[26t] | <UCHAR > |
::= | "\\u" HEX HEX HEX HEX |
[160s] | <ECHAR > |
::= | "\\" [tbnrf\\\"\\'] |
[164s] | <PN_CHARS_BASE > |
::= | [A-Z] | [a-z] |
[165s] | <PN_CHARS_U > |
::= | PN_CHARS_BASE | "_" |
[167s] | <PN_CHARS > |
::= | PN_CHARS_U | "-" | [0-9] |
[168s] | <PN_PREFIX > |
::= | PN_CHARS_BASE ( (PN_CHARS | ".")* PN_CHARS )? |
[169s] | <PN_LOCAL > |
::= | (PN_CHARS_U | ":" | [0-9] | PLX) ( (PN_CHARS | "." | ":" | PLX)* (PN_CHARS | ":" | PLX) )? |
[170s] | <PLX > |
::= | PERCENT | PN_LOCAL_ESC |
[171s] | <PERCENT > |
::= | "%" HEX HEX |
[172s] | <HEX > |
::= | [0-9] | [A-F] | [a-f] |
[173s] | <PN_LOCAL_ESC > |
::= | "\\" ( "_" | "~" | "." | "-" | "!" | "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" | "=" | "/" | "?" | "#" | "@" | "%" ) |
PASSED TOKENS |
::= | [ \t\r\n]+ |