slanted W3C logo
Cover page images (keys)

ShEx

Testing data with the Shape Expressions Language

WikidataCon-2017
28 October 2017

http://shexspec.github.io/talks/2017/10-28-wikidatacon/

Shape Expressions (ShEx)

image/svg+xml bf:Text bf:Text <samples9298996>->bf:Text rdf:type bf:Work bf:Work <samples9298996>->bf:Work rdf:type "Oliver Twist." "Oliver Twist." <samples9298996>->"Oliver Twist." bf:title <http://id.loc.gov/…/PZ3> <http://id.loc.gov/…/PZ3> <samples9298996>-><http://id.loc.gov/…/PZ3> bf:class _:b1 <samples9298996>->_:b1 bf:creator bf:LCC ; bf:LCC <http://id.loc.gov/…/PZ3>->bf:LCC ; rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" bf:Person bf:Person _:b1->bf:Person rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" <http://id.loc.gov/…/PZ3> <samples9298996> bf:label bf:label "PZ3.D55O165PR4567" "PZ3.D55O165PR4567" "Dickens, Charles, 1812-1870." "Dickens, Charles, 1812-1870."

RDF Data as graph and as text

image/svg+xml bf:Text bf:Text <samples9298996>->bf:Text rdf:type bf:Work bf:Work <samples9298996>->bf:Work rdf:type "Oliver Twist." "Oliver Twist." <samples9298996>->"Oliver Twist." bf:title <http://id.loc.gov/…/PZ3> <http://id.loc.gov/…/PZ3> <samples9298996>-><http://id.loc.gov/…/PZ3> bf:class _:b1 <samples9298996>->_:b1 bf:creator bf:LCC ; bf:LCC <http://id.loc.gov/…/PZ3>->bf:LCC ; rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" bf:Person bf:Person _:b1->bf:Person rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" <http://id.loc.gov/…/PZ3> <samples9298996> bf:label bf:label "PZ3.D55O165PR4567" "PZ3.D55O165PR4567" "Dickens, Charles, 1812-1870." "Dickens, Charles, 1812-1870."
<samples9298996>
  rdf:type bf:Text ;
  rdf:type bf:Work ;
  bf:title "Oliver Twist." ;
  bf:class <id.loc.gov/…/PZ3> ;
  bf:creator [
    rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870." ;
  ] .

<id.loc.gov/…/PZ3>
  rdf:type bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

ShEx Model: RDF Data is tested

<samples9298996>
  rdf:type bf:Text ;
  rdf:type bf:Work ;
  bf:title "Oliver Twist." ;
  bf:class <id.loc.gov/…/PZ3> ;
  bf:creator [
    rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870." ;
  ] .
<id.loc.gov/…/PZ3>
  rdf:type bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

ShEx Model: ...against a ShEx Schema

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }

ShEx Model: ...to produce a Validation Result


validating samples9298996bad as Work:
   validating http://...oliverTwist:
      Error validating http://...oliverTwist
      as nodeKind literal:
        iri found when literal expected

ShEx Model: ...on the basis of a Shape Map

inst:Alice @ school:Enrollee,
inst:Bob @ school:Enrollee,
inst:Claire @ school:Enrollee,
inst:Don @ school:Enrollee
{FOCUS, foaf:age, _} @ school:Enrollee

A ShEx Schema prescribes the "shape" of RDF Data

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
A resource in the RDF data matching the "Work" shape:

A ShEx Schema prescribes the "shape" of RDF Data

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
A resource matching the "Classification" shape:

A ShEx Schema is tested against RDF Data

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
<samples9298996>
  rdf:type bf:Text ;
  rdf:type bf:Work ;
  bf:title "Oliver Twist." ;
  bf:class <id.loc.gov/…/PZ3> ;
  bf:creator [
    rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870." ;
  ] .

<id.loc.gov/…/PZ3>
  rdf:type bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

Shapes can specify use of properties

A <Work> must have exactly one bf:title with value LITERAL:

<Work> {
  bf:title LITERAL ;
}
<samples9298996>
  bf:title "Oliver Twist." .

<samples9298996bad>                 
  bf:title <http://...oliverTwist> .
try it
<samples9298996>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
   validating http://...oliverTwist:
      Error validating http://...oliverTwist
      as nodeKind literal:
        iri found when literal expected

  • one passed, one failed.

Shapes can specify types

Permit some data to have a type arc identifying it as a <Work>:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
}
<samples9298996> a bf:Work ;
  bf:title "Oliver Twist." .

<samples9298996b> a bf:Work ;
  bf:title "Oliver Twist." .

<samples9298996bad> a bf:Krow ;
  bf:title "Oliver Twist" .    
try it
<samples9298996>@<Work>
<samples9298996b>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating http://bibframe.org/vocab/Krow:
      Error validating http://bibframe.org/vocab/Krow
      as values [<http://bibframe.org/vocab/Work>]:
        value http://bibframe.org/vocab/Krow not found in set
        [<http://bibframe.org/vocab/Work>]

Shapes can reference other shapes

Add a reference to another shape:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class @<Classification> * ;
}

<Classification> {
  rdf:type [bf:LCC] ? ;
  bf:label LITERAL ;
}
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:class <http://id.loc.gov/…/PZ3> .
<http://id.loc.gov/…/PZ3> a bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

<samples9298996b>
  bf:title "Oliver Twist." ;
  bf:class [ bf:label "PZ3.D55O165PR4567" ].

<samples9298996bad>                                    
  bf:title "Oliver Twist." ;                           
  bf:class [ a bf:LCD ; bf:label "PZ3.D55O165PR4567" ].
try it
<samples9298996>@<Work>
<samples9298996b>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating _:b1:
      validating http://bibframe.org/vocab/LCD:,
      Error validating http://bibframe.org/vocab/LCD
      as values: [<http://bibframe.org/vocab/LCC>]}:
       value <http://bibframe.org/vocab/LCD> not found
       in set [<http://bibframe.org/vocab/LCC>]

Shapes can specify value sets

Or a reference to an authority:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class @<Classification> * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:class <http://id.loc.gov/…/PZ3> .

<samples9298996bad>                             
  bf:title "Oliver Twist." ;                    
  bf:class <http://id.loc.gov/authorities/PZ3> .
                                                
<samples9298996badb>                            
  bf:title "Oliver Twist." ;                    
  bf:class [ bf:label "PZ3.D55O165PR4567" ].    
try it
<samples9298996>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating <http://id.loc.gov/authorities/PZ3>:
      NodeConstraintError: expected to match
      [<http://id.loc.gov/authorities/classification/>~]
<samples9298996badb>@!<Work>
validating samples9298996bad as Work:
    validating _:b0:
      NodeConstraintError: expected to match
      [<http://id.loc.gov/authorities/classification/>~]

Shapes can combine constraints

Or a reference to a structure identifed by and authority:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class @<Classification> * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
  AND {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:class <http://id.loc.gov/…/PZ3> .
<http://id.loc.gov/…/PZ3> a bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

<samples9298996bad>                             
  bf:title "Oliver Twist." ;                    
  bf:class <http://id.loc.gov/authorities/PZ3> .
<http://id.loc.gov/authorities/PZ3> a bf:LCC ;  
  bf:label "PZ3.D55O165PR4567" .                
                                                
<samples9298996badb>                            
  bf:title "Oliver Twist." ;                    
  bf:class <http://id.loc.gov/…/PZ3999> .       
try it
<samples9298996>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating >http://id.loc.gov/authorities/PZ3>:
      validating <http://id.loc.gov/authorities/PZ3> as Classification:
      NodeConstraintError: expected to match [<http://id.loc.gov/…/>~]
<samples9298996badb>@!<Work>
validating samples9298996bad as Work:
    validating <http://id.loc.gov/…/PZ3999>:
      validating <http://id.loc.gov/…/PZ3999> as Classification:
      Missing property: <http://bibframe.org/vocab/label>

Shapes can specify choices

You may accept multiple forms of creator.

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class . * ;
  bf:creator @<Person> OR @<Organization> + ;
}

<Person> {
  rdf:type [bf:Person] ? ;
  bf:label LITERAL ;
}

<Organization> { EXTRA rdf:type
  rdf:type [bf:Organization] ;
  bf:label LITERAL ;
  org:member @<Person> OR @<Organization> *
}
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:creator [ rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870."
  ] .

<wp-55-45-1>
  bf:title "Nixon blasts 'false charges'" ;
  bf:creator <WP> .

<WP> rdf:type bf:Organization , bf:Newpaper ;
  bf:label "Washington Post" ;
  org:member <BobWoodward>, <CarlBernstein> .

<BobWoodward> bf:label "Bob Woodward" .
<CarlBernstein> bf:label "Carl Bernstein" .
try it
<samples9298996>@<Work>
<wp-55-45-1>@<Work>

Diseases in Wikidata

# Shape Expression for Diseases in Wikidata
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX pr:  <http://www.wikidata.org/prop/reference/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX do: <http://purl.obolibrary.org/obo/DOID_>

start = @<wikidata-disease>

<wikidata-disease> {
  p:P31 { # instance of disease
    ps:P31 [ wd:Q12136 ]; # instance of disease
    $<has-do-reference> prov:wasDerivedFrom @<do-reference>;
  } ;
  p:P279 { # subclass of
    ps:P279 @<wikidata-disease>;
    &<has-do-reference>
  } * ;
  p:P2888 EXTRA prov:wasDerivedFrom { # exact match
    ps:P2888 [ do:~ ];
    prov:wasDerivedFrom @<do-reference> ?
  } + ;
}

<do-reference> {
  # stated in
  pr:P248 @<version-disease-ontology> ;
  # retrieved
  pr:P813 xsd:dateTime ;
  # Disease ontology ID
  pr:P699 @<disease-ontology-id> ;
}

<disease-ontology-id> LITERAL /^DOID:[0-9]+$/

<version-disease-ontology> {
  # edition or translation of Disease Ontology
  p:P629 { ps:P629 [ wd:Q5282129 ] } ;
}
     

Wikidata items on Cancer should have an NCI Thesaurus ID

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>

start = @<wikidata_item>

<wikidata_item> {
  p:P1748 {
    ps:P1748 LITERAL ;
    prov:wasDerivedFrom @<reference>
  }+
}

<reference> {
  pr:P248  IRI ;
  pr:P813  xsd:dateTime ;
  pr:P699  LITERAL
}
     
Endpoint: https://query.wikidata.org/bigdata/namespace/wdq/sparql

Query: SELECT ?item ?itemLabel
WHERE
{ ?item wdt:P279* wd:Q12078 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} LIMIT 10
     
Try it!
Click “Wikidata item on Cancer should have a NCI Thesaurus ID” on the left,
then “Get all Wikidata items on Cancers (SPARQL)” on the right.

Three interchangeable concrete syntaxes

Open-source implementations

Shapes Constraint Language (SHACL)

ShEx and SHACL compared

Join the ShEx Community!

W3C Shape Expressions Community Group