The Shape Expressions (ShEx) language describes RDF nodes and graph structures. A node constraint describes an RDF node (IRI, blank node or literal) and a shape describes the triples involving nodes in an RDF graph. The ShapeMap language associates RDF nodes with ShEx shapes. These associations can be used to state candidate shape maps as an input to the validation process. They can be the output of a validation process, where the ShEx engine reports the conformance of RDF nodes with respect to ShEx shapes.

This document defines the ShapeMap language. See the Shape Expressions Primer for an introduction to ShEx validation and the Shape Expressions Language for a formal definition of ShEx.

This document has been developed by the Shape Expressions Community Group.

This version is an initial editor's proposal to the CG.

Notation and Terminology

This document assumes an understanding of the ShEx notation and terminology.

ShExMap uses the following terms from RDF semantics [[!rdf11-mt]]:

Conformance criteria are relevant to authors and authoring tool implementers. As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

ShapeMap structure

ShapeMap: a finite set of shape associations. Each shape association has at least two members: a node and a shape, and when used for the result of validation, may have any of status, reason, or appInfo:

In this document, these members can be addressed with a '.' operator. For instance, a shape association A would have an A.node member.

If the status member is absent, the status is assumed to be "conformant". The reason and appInfo members may also be absent but have no default value.

A triple pattern has a subject pattern, predicate IRI and object pattern.

A focus selector identifies the slot (subject or object) to be validated. A wildcard indicates that the slot may hold any value. A triple pattern has exactly one focus selector. A triple pattern maps to a SPARQL triple pattern with the following restrictions:

A query ShapeMap is a ShapeMap in which each shape association has only the members node and shape. The node property may directly identify an RDF node or it may select a set of RDF nodes.

A fixed ShapeMap is a query ShapeMap in which each node is an RDF node. The ShEx validation process takes as input a fixed ShapeMap.

A result ShapeMap is a fixed ShapeMap with the addition of optional members status, reason and appInfo.

No two shape associations in a ShapeMap may have the same combination of node and shape.

ShapeMap usage

ShapeMaps are designed to express the goal or the result of validating an RDF node against a ShEx schema:

A query ShapeMap is converted to a fixed ShapeMap to be used as the input to the validation process. This process takes as input a query ShapeMap and a graph and produces a fixed ShapeMap. For a query ShapeMap Q and a graph G, for each shape association A in Q:

A fixed ShapeMap is a set; if the same shape association is imputed multiple times, it appears in the fixed ShapeMap only once.

Query and Fixed ShapeMap syntax

ShapeMaps can be easily transmitted and understood with a specialized syntax.

A query ShapeMap can include shape associations with both RDF nodes and triple patterns.

ShapeMap grammar

The ShapeMap grammar includes the expression of IRIs with prefix declarations and relative IRIs whose resolution depends on a resolution context. A resolution context is a base IRI and a map of prefix to namespace IRI. Though it is common practice to resolve shape references against a resolution context found in the schema and node references agianst a resolution context found in the data (e.g. Turtle prefixes), this specification does not specifiy that behavior.

Production numbers followed by a letter correspond to productions in other grammars:

[1 ]    shapeMap    ::=   shapeAssociation (',' shapeAssociation)*
[2 ]    shapeAssociation    ::=   nodeSpec shapeSpec
[3 ]    nodeSpec    ::=   objectTerm | triplePattern
[4 ]    subjectTerm    ::=   iri | BLANK_NODE_LABEL
[5 ]    objectTerm    ::=   subjectTerm | literal
[6 ]    triplePattern    ::=     '{' "FOCUS" predicate (objectTerm | '_') '}'
| '{' (subjectTerm | '_') predicate "FOCUS" '}'
[7 ]    shapeSpec    ::=   '@' (iri | "START") | AT_START
[13t]    literal    ::=    rdfLiteral | numericLiteral | booleanLiteral
[16t]    numericLiteral    ::=    INTEGER | DECIMAL | DOUBLE
[65x]    rdfLiteral    ::=    langString | string ("^^" iri)?
[134s]    booleanLiteral    ::=    "true" | "false"
[135s]    string    ::=       STRING_LITERAL1 | STRING_LITERAL_LONG1
| STRING_LITERAL2 | STRING_LITERAL_LONG2
[66x]    langString    ::=       LANG_STRING_LITERAL1 | LANG_STRING_LITERAL_LONG1
| LANG_STRING_LITERAL2 | LANG_STRING_LITERAL_LONG2
[4 ]    predicate    ::=   iri | RDF_TYPE
[136s]    iri    ::=    IRIREF

Terminals

Text is matched against the longest matching terminal. The PASSED TOKENS below may appear between any terminals or literal strings which appear in the grammar above.

[18t]    <IRIREF>    ::=    "<" ([^#0000- <>\"{}|^`\\] | UCHAR)* ">"
[142s]    <BLANK_NODE_LABEL>    ::=    "_:" (PN_CHARS_U | [0-9]) ((PN_CHARS | ".")* PN_CHARS)?
[16]    <RDF_TYPE>    ::=    "a"
[17]    <AT_START>    ::=    "@START"
The <AT_START> terminal has precendence over LANGTAG
[145s]    <LANGTAG>    ::=    "@" ([a-zA-Z])+ ("-" ([a-zA-Z0-9])+)*
[19t]    <INTEGER>    ::=    [+-]? [0-9]+
[20t]    <DECIMAL>    ::=    [+-]? [0-9]* "." [0-9]+
[21t]    <DOUBLE>    ::=    [+-]? ([0-9]+ "." [0-9]* EXPONENT | "."? [0-9]+ EXPONENT)
[155s]    <EXPONENT>    ::=    [eE] [+-]? [0-9]+
[156s]    <STRING_LITERAL1>    ::=    "'" ([^'\\\n\r] | ECHAR | UCHAR)* "'"
[157s]    <STRING_LITERAL2>    ::=    '"' ([^\"\\\n\r] | ECHAR | UCHAR)* '"'
[158s]    <STRING_LITERAL_LONG1>    ::=    "'''" ( ("'" | "''")? ([^\\'\\] | ECHAR | UCHAR) )* "'''"
[159s]    <STRING_LITERAL_LONG2>    ::=    '"""' ( ('"' | '""')? ([^\"\\] | ECHAR | UCHAR) )* '"""'
[73x]    <LANG_STRING_LITERAL1>    ::=    "'" ([^'\\\n\r] | ECHAR | UCHAR)* "'" LANGTAG
[74x]    <LANG_STRING_LITERAL2>    ::=    '"' ([^\"\\\n\r] | ECHAR | UCHAR)* '"' LANGTAG
[75x]    <LANG_STRING_LITERAL_LONG1>    ::=    "'''" ( ("'" | "''")? ([^\\'\\] | ECHAR | UCHAR) )* "'''" LANGTAG
[76x]    <LANG_STRING_LITERAL_LONG2>    ::=    '"""' ( ('"' | '""')? ([^\"\\] | ECHAR | UCHAR) )* '"""' LANGTAG
[26t]    <UCHAR>    ::=       "\\u" HEX HEX HEX HEX
| "\\U" HEX HEX HEX HEX HEX HEX HEX HEX
[160s]    <ECHAR>    ::=    "\\" [tbnrf\\\"\\']
[164s]    <PN_CHARS_BASE>    ::=       [A-Z] | [a-z]
| [#00C0-#00D6] | [#00D8-#00F6] | [#00F8-#02FF]
| [#0370-#037D] | [#037F-#1FFF]
| [#200C-#200D] | [#2070-#218F] | [#2C00-#2FEF]
| [#3001-#D7FF] | [#F900-#FDCF] | [#FDF0-#FFFD]
| [#10000-#EFFFF]
[165s]    <PN_CHARS_U>    ::=    PN_CHARS_BASE | "_"
[167s]    <PN_CHARS>    ::=       PN_CHARS_U | "-" | [0-9]
| [#00B7] | [#0300-#036F] | [#203F-#2040]
[168s]    <PN_PREFIX>    ::=    PN_CHARS_BASE ( (PN_CHARS | ".")* PN_CHARS )?
[169s]    <PN_LOCAL>    ::=    (PN_CHARS_U | ":" | [0-9] | PLX) ( (PN_CHARS | "." | ":" | PLX)* (PN_CHARS | ":" | PLX) )?
[170s]    <PLX>    ::=    PERCENT | PN_LOCAL_ESC
[171s]    <PERCENT>    ::=    "%" HEX HEX
[172s]    <HEX>    ::=    [0-9] | [A-F] | [a-f]
[173s]    <PN_LOCAL_ESC>    ::=    "\\" ( "_" | "~" | "." | "-" | "!" | "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" | "=" | "/" | "?" | "#" | "@" | "%" )
    PASSED TOKENS    ::=       [ \t\r\n]+
| "#" [^\r\n]*