slanted W3C logo
Cover page images (keys)

ShEx

Testing data with the Shape Expressions Language

SWAT4HCLS-2017
28 October 2017

http://shexspec.github.io/talks/2017/12-04-swat4hcls-ericp/

Shape Expressions (ShEx)

image/svg+xml bf:Text bf:Text <samples9298996>->bf:Text rdf:type bf:Work bf:Work <samples9298996>->bf:Work rdf:type "Oliver Twist." "Oliver Twist." <samples9298996>->"Oliver Twist." bf:title <http://id.loc.gov/…/PZ3> <http://id.loc.gov/…/PZ3> <samples9298996>-><http://id.loc.gov/…/PZ3> bf:class _:b1 <samples9298996>->_:b1 bf:creator bf:LCC ; bf:LCC <http://id.loc.gov/…/PZ3>->bf:LCC ; rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" bf:Person bf:Person _:b1->bf:Person rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" <http://id.loc.gov/…/PZ3> <samples9298996> bf:label bf:label "PZ3.D55O165PR4567" "PZ3.D55O165PR4567" "Dickens, Charles, 1812-1870." "Dickens, Charles, 1812-1870."

RDF Data as graph and as text

image/svg+xml bf:Text bf:Text <samples9298996>->bf:Text rdf:type bf:Work bf:Work <samples9298996>->bf:Work rdf:type "Oliver Twist." "Oliver Twist." <samples9298996>->"Oliver Twist." bf:title <http://id.loc.gov/…/PZ3> <http://id.loc.gov/…/PZ3> <samples9298996>-><http://id.loc.gov/…/PZ3> bf:class _:b1 <samples9298996>->_:b1 bf:creator bf:LCC ; bf:LCC <http://id.loc.gov/…/PZ3>->bf:LCC ; rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" bf:Person bf:Person _:b1->bf:Person rdf:type <http://id.loc.gov/…/PZ3>->"PZ3.D55O165PR4567" <http://id.loc.gov/…/PZ3> <samples9298996> bf:label bf:label "PZ3.D55O165PR4567" "PZ3.D55O165PR4567" "Dickens, Charles, 1812-1870." "Dickens, Charles, 1812-1870."
<samples9298996>
  rdf:type bf:Text ;
  rdf:type bf:Work ;
  bf:title "Oliver Twist." ;
  bf:class <id.loc.gov/…/PZ3> ;
  bf:creator [
    rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870." ;
  ] .

<id.loc.gov/…/PZ3>
  rdf:type bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

ShEx Model: RDF Data is tested

<samples9298996>
  rdf:type bf:Text ;
  rdf:type bf:Work ;
  bf:title "Oliver Twist." ;
  bf:class <id.loc.gov/…/PZ3> ;
  bf:creator [
    rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870." ;
  ] .
<id.loc.gov/…/PZ3>
  rdf:type bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

ShEx Model: ...against a ShEx Schema

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }

ShEx Model: ...to produce a Validation Result


validating samples9298996bad as Work:
   validating http://...oliverTwist:
      Error validating http://...oliverTwist
      as nodeKind literal:
        iri found when literal expected

ShEx Model: ...on the basis of a Shape Map

inst:Alice @ school:Enrollee,
inst:Bob @ school:Enrollee,
inst:Claire @ school:Enrollee,
inst:Don @ school:Enrollee
{FOCUS, foaf:age, _} @ school:Enrollee

Problem Statement

Useful data needs consistent structure:

Detect and correct errors:

@prefix : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/'> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<issue7> a :Issue , :SecurityIssue ;
    :state :unassigned ;
    :reportedBy <user6> , <user2> ; # cardinality 1
    :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime ;
    :assignedTo <user2>, <user1> ;
    :assignedOn "2012-11-31T23:57:00"^^xsd:dateTime ;
                       # reproduced before being reported
    :related <issue4>, <issue3>, <issue2> .
                       # referenced issues not included

<issue4> # a ???         missing type arc
    :state :unsinged ; # misspelled
    # :reportedBy ??? -  missing
    :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime .

<user2> a foaf:Person ;
    foaf:givenName "Alice" ;
    foaf:familyName "Smith" ;
    foaf:phone <tel:+1.555.222.2222> ;
    foaf:mbox <mailto:alice@example.com> .

<user6> a foaf:Agent ; # should be foaf:Person
    foaf:givenName "Bob" ; # foaf:familyName "???" - missing
    foaf:phone <tel:+.555.222.2222> ; # malformed tel: URL
    foaf:mbox <mailto:alice@example.com> .

A ShEx Schema prescribes the "shape" of RDF Data

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
A resource in the RDF data matching the "Work" shape:

A ShEx Schema prescribes the "shape" of RDF Data

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
A resource matching the "Classification" shape:

A ShEx Schema is tested against RDF Data

<Work> EXTRA rdf:type {
  rdf:type [bf:Work] ? ;
  bf:title LITERAL ;
  bf:class @<Classification> * ;
  bf:creator @<Person> OR @<Organization> + ;
  bf:derivedFrom IRI * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
AND
  EXTRA rdf:type {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
<samples9298996>
  rdf:type bf:Text ;
  rdf:type bf:Work ;
  bf:title "Oliver Twist." ;
  bf:class <id.loc.gov/…/PZ3> ;
  bf:creator [
    rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870." ;
  ] .

<id.loc.gov/…/PZ3>
  rdf:type bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

Shapes can specify use of properties

A <Work> must have exactly one bf:title with value LITERAL:

<Work> {
  bf:title LITERAL ;
}
<samples9298996>
  bf:title "Oliver Twist." .

<samples9298996bad>                 
  bf:title <http://...oliverTwist> .
try it
<samples9298996>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
   validating http://...oliverTwist:
      Error validating http://...oliverTwist
      as nodeKind literal:
        iri found when literal expected

  • one passed, one failed.

Shapes can specify types

Permit some data to have a type arc identifying it as a <Work>:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
}
<samples9298996> a bf:Work ;
  bf:title "Oliver Twist." .

<samples9298996b> a bf:Work ;
  bf:title "Oliver Twist." .

<samples9298996bad> a bf:Krow ;
  bf:title "Oliver Twist" .    
try it
<samples9298996>@<Work>
<samples9298996b>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating http://bibframe.org/vocab/Krow:
      Error validating http://bibframe.org/vocab/Krow
      as values [<http://bibframe.org/vocab/Work>]:
        value http://bibframe.org/vocab/Krow not found in set
        [<http://bibframe.org/vocab/Work>]

Shapes can reference other shapes

Add a reference to another shape:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class @<Classification> * ;
}

<Classification> {
  rdf:type [bf:LCC] ? ;
  bf:label LITERAL ;
}
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:class <http://id.loc.gov/…/PZ3> .
<http://id.loc.gov/…/PZ3> a bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

<samples9298996b>
  bf:title "Oliver Twist." ;
  bf:class [ bf:label "PZ3.D55O165PR4567" ].

<samples9298996bad>                                    
  bf:title "Oliver Twist." ;                           
  bf:class [ a bf:LCD ; bf:label "PZ3.D55O165PR4567" ].
try it
<samples9298996>@<Work>
<samples9298996b>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating _:b1:
      validating http://bibframe.org/vocab/LCD:,
      Error validating http://bibframe.org/vocab/LCD
      as values: [<http://bibframe.org/vocab/LCC>]}:
       value <http://bibframe.org/vocab/LCD> not found
       in set [<http://bibframe.org/vocab/LCC>]

Shapes can specify value sets

Or a reference to an authority:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class @<Classification> * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:class <http://id.loc.gov/…/PZ3> .

<samples9298996bad>                             
  bf:title "Oliver Twist." ;                    
  bf:class <http://id.loc.gov/authorities/PZ3> .
                                                
<samples9298996badb>                            
  bf:title "Oliver Twist." ;                    
  bf:class [ bf:label "PZ3.D55O165PR4567" ].    
try it
<samples9298996>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating <http://id.loc.gov/authorities/PZ3>:
      NodeConstraintError: expected to match
      [<http://id.loc.gov/authorities/classification/>~]
<samples9298996badb>@!<Work>
validating samples9298996bad as Work:
    validating _:b0:
      NodeConstraintError: expected to match
      [<http://id.loc.gov/authorities/classification/>~]

Shapes can combine constraints

Or a reference to a structure identifed by and authority:

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class @<Classification> * ;
}

<Classification>
  [<http://id.loc.gov/…/>~]
  AND {
    rdf:type [bf:LCC] ? ;
    bf:label LITERAL ;
  }
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:class <http://id.loc.gov/…/PZ3> .
<http://id.loc.gov/…/PZ3> a bf:LCC ;
  bf:label "PZ3.D55O165PR4567" .

<samples9298996bad>                             
  bf:title "Oliver Twist." ;                    
  bf:class <http://id.loc.gov/authorities/PZ3> .
<http://id.loc.gov/authorities/PZ3> a bf:LCC ;  
  bf:label "PZ3.D55O165PR4567" .                
                                                
<samples9298996badb>                            
  bf:title "Oliver Twist." ;                    
  bf:class <http://id.loc.gov/…/PZ3999> .       
try it
<samples9298996>@<Work>
<samples9298996bad>@!<Work>
validating samples9298996bad as Work:
    validating >http://id.loc.gov/authorities/PZ3>:
      validating <http://id.loc.gov/authorities/PZ3> as Classification:
      NodeConstraintError: expected to match [<http://id.loc.gov/…/>~]
<samples9298996badb>@!<Work>
validating samples9298996bad as Work:
    validating <http://id.loc.gov/…/PZ3999>:
      validating <http://id.loc.gov/…/PZ3999> as Classification:
      Missing property: <http://bibframe.org/vocab/label>

Shapes can specify choices

You may accept multiple forms of creator.

<Work> {
  rdf:type [bf:Work] ? ;
  bf:label LITERAL ;
  bf:class . * ;
  bf:creator @<Person> OR @<Organization> + ;
}

<Person> {
  rdf:type [bf:Person] ? ;
  bf:label LITERAL ;
}

<Organization> { EXTRA rdf:type
  rdf:type [bf:Organization] ;
  bf:label LITERAL ;
  org:member @<Person> OR @<Organization> *
}
<samples9298996>
  bf:title "Oliver Twist." ;
  bf:creator [ rdf:type bf:Person ;
    bf:label "Dickens, Charles, 1812-1870."
  ] .

<wp-55-45-1>
  bf:title "Nixon blasts 'false charges'" ;
  bf:creator <WP> .

<WP> rdf:type bf:Organization , bf:Newpaper ;
  bf:label "Washington Post" ;
  org:member <BobWoodward>, <CarlBernstein> .

<BobWoodward> bf:label "Bob Woodward" .
<CarlBernstein> bf:label "Carl Bernstein" .
try it
<samples9298996>@<Work>
<wp-55-45-1>@<Work>

(DRY)

see the slides: https://www.slideshare.net/jelabra/shex-by-example/7

load the examples:

ShEx 2.1

IMPORT

IMPORT <User.shex>

:Employee {
  &:name ;
  schema:worksFor @:Company
}
 
:Company {
  schema:employee @:Employee ;
  schema:founder  @:Person ;
}
PREFIX : <http://a.example/ns#>
PREFIX schema: <http://schema.org/>

:Person { 
   $:name ( schema:name . 
          | schema:givenName . ; schema:familyName .
          ) ;
   schema:email .               
}

Inheritance

Start with a general rule about a structure:

<ObservationShape> {
  …
  :component {
    :code . ;
    :value .
  } *
}

An observation contains N components, each with a code and a value.

Inheritance

Use that structure in a derivative shape:

<ObservationShape> {
  …
  :component {
    :code . ;
    :value .
  } *
}
<BP> -<ObservationShape> {
  …
  :component {
    :code [ "systolic" ] ;
    :value .
  };
  :component {
    :code [ "diastolic" ] ;
    :value .
  }
}

An BP observation contains 2 components with codes for systolic and diastolic.

Inheritance

Use that structure in turn in an extended shape:

<ObservationShape> {
  …
  :component {
    :code . ;
    :value .
  } *
}
<BP> -<ObservationShape> {
  …
  :component {
    :code [ "systolic" ] ;
    :value .
  };
  :component {
    :code [ "diastolic" ] ;
    :value .
  }
}
<PostureBP> &<BP> {
  …
  :component {
    :code [ "posture" ] ;
    :value .
  };
}

An PostureBP observation has an additional components with a code for posture.

Severity

HIGH { value: 1, low: 1, high: 1 }
LOW { value: 1, low: 0, high: 1 }

<EventShape> {
  HIGH schema:startDate @<Date> OR @<DateTime> ;
  LOW  schema:endDate @<Date> OR @<DateTime> ;
  LOW  schema:actor @<Person>
}

<Person> {
  LOW schema:name xsd:string
}

Diseases in Wikidata

# Shape Expression for Diseases in Wikidata
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX pr:  <http://www.wikidata.org/prop/reference/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX do: <http://purl.obolibrary.org/obo/DOID_>

start = @<wikidata-disease>

<wikidata-disease> {
  p:P31 { # instance of disease
    ps:P31 [ wd:Q12136 ]; # instance of disease
    $<has-do-reference> prov:wasDerivedFrom @<do-reference>;
  } ;
  p:P279 { # subclass of
    ps:P279 @<wikidata-disease>;
    &<has-do-reference>
  } * ;
  p:P2888 EXTRA prov:wasDerivedFrom { # exact match
    ps:P2888 [ do:~ ];
    prov:wasDerivedFrom @<do-reference> ?
  } + ;
}

<do-reference> {
  # stated in
  pr:P248 @<version-disease-ontology> ;
  # retrieved
  pr:P813 xsd:dateTime ;
  # Disease ontology ID
  pr:P699 @<disease-ontology-id> ;
}

<disease-ontology-id> LITERAL /^DOID:[0-9]+$/

<version-disease-ontology> {
  # edition or translation of Disease Ontology
  p:P629 { ps:P629 [ wd:Q5282129 ] } ;
}
     

Wikidata items on Cancer should have an NCI Thesaurus ID

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>

start = @<wikidata_item>

<wikidata_item> {
  p:P1748 {
    ps:P1748 LITERAL ;
    prov:wasDerivedFrom @<reference>
  }+
}

<reference> {
  pr:P248  IRI ;
  pr:P813  xsd:dateTime ;
  pr:P699  LITERAL
}
     
Endpoint: https://query.wikidata.org/bigdata/namespace/wdq/sparql

Query: SELECT ?item ?itemLabel
WHERE
{ ?item wdt:P279* wd:Q12078 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} LIMIT 10
     
Try it!
Click “Wikidata item on Cancer should have a NCI Thesaurus ID” on the left,
then “Get all Wikidata items on Cancers (SPARQL)” on the right.

FHIR MedicationRequest

<MedicationRequest> {
    :status @<code> AND
        {fhir:value ["active" "on-hold" "cancelled" "completed"]}?;
    :intent @<code> AND
        {fhir:value ["proposal" "plan" "order" "instance-order"]};
    ( :medicationCodeableConcept @<CodeableConcept>  |
      :medicationReference @<MedicationReference> );
    :subject ( @<PatientReference> OR @<GroupReference> );
    :requester @<MedicationRequest.requester>;
    :reasonCode @<CodeableConcept>*;
    :reasonReference
    ( { fhir:link @<Condition> } OR
      { fhir:link @<Observation> } )*;
    :dispenseRequest {
        :dispenseRequest.numberOfRepeatsAllowed @<positiveInt>?;
        :dispenseRequest.quantity @<SimpleQuantity>?;
        :dispenseRequest.performer @<Reference>? }
    :substitution @<MedicationRequest.substitution>?;
}

# Any restrictions on medication substitution
<MedicationRequest.substitution> CLOSED {
    fhir:Element.id @<string>?;             # xml:id (or equivalent in JSON)
    fhir:Element.extension @<Extension>*;   # Additional Content defined by 
                                            # implementations 
    fhir:BackboneElement.modifierExtension @<Extension>*;  # Extensions that cannot be ignored
    :substitution.allowed @<boolean>;  # Whether substitution is allowed or 
                                            # not 
    :substitution.reason @<CodeableConcept>?;  # Why should (not) substitution be 
                                            # made 
    fhir:index xsd:integer?                 # Relative position in a list
}

# Who/What requested the Request
<MedicationRequest.requester> CLOSED {
    fhir:Element.id @<string>?;             # xml:id (or equivalent in JSON)
    fhir:Element.extension @<Extension>*;   # Additional Content defined by 
                                            # implementations 
    fhir:BackboneElement.modifierExtension @<Extension>*;  # Extensions that cannot be ignored
    :requester.agent  # Who ordered the initial 
                                            # medication(s) 
    (   @<PractitionerReference> OR
        @<OrganizationReference> OR
        @<PatientReference> OR
        @<RelatedPersonReference> OR
        @<DeviceReference>
    );
    :requester.onBehalfOf @<Reference>?;  # Organization agent is acting for
    fhir:index xsd:integer?                 # Relative position in a list
}

Three interchangeable concrete syntaxes

Open-source implementations

On-line Tools

Shapes Constraint Language (SHACL)

ShEx and SHACL compared

expressivity

compare to

User example

A User with a name and mbox

PREFIX foaf: <http://xmlns.com/foaf/>

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}

User example

A User with a name and mbox

PREFIX foaf: <http://xmlns.com/foaf/>

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}

User example

A User with a name and mbox

PREFIX foaf: <http://xmlns.com/foaf/>

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}

compare with other schema languages...

RelaxNG Compact Syntax

    (element foaf:name { xsd:string }
     | (element foaf:givenName { xsd:string }+,
        element foaf:familyName { xsd:string })),
    element foaf:mbox { xsd:anyURI }

Regex

(N|(G+F))M
NM
GFM
GGGFM

W3C XML Schema

  <xs:complexType name="UserContent">
    <xs:sequence>
      <xs:choice>
        <xs:element name="name" type="xs:string"/>
        <xs:sequence>
          <xs:element maxOccurs="unbounded" name="givenName" type="xs:string"/>
          <xs:element name="familyName" type="xs:string"/>
        </xs:sequence>
      </xs:choice>
      <xs:element name="mbox" type="xs:anyURI"/>
    </xs:sequence>
  </xs:complexType>

Existing data

future work

User-oriented views of ShEx schemas?

FHIR MedicationRequest logical table

FHIR MedicationRequest (massaged)

 <MedicationRequest> {
   :status                                @<code>;
   :intent                                @<code>;
   (
     :medicationCodeableConcept           @<CodeableConcept>  |
     :medicationReference                 @<MedicationRef> );
   :subject                               ( @<PatientRef> OR @<GroupRef> );
   :requester {
     :requester.agent                     ( @<PractitionerRef> OR @<PatientRef> );
     :requester.onBehalfOf                @<Ref>?;
   :reasonCode                            @<CodeableConcept>*;
   :reasonReference                       ( { fhir:link @<Condition> } OR { fhir:link @<Observation> } )*;
   :dispenseRequest {                    # backbone element
     :dispenseRequest.numberOfRepeatsAllowed @<positiveInt>?;
     :dispenseRequest.quantity           @<SimpleQuantity>?;
     :dispenseRequest.performer          @<Reference>? };
}

Join the ShEx Community!

W3C Shape Expressions Community Group