Provenance and Confidence for NIF annotations¶
NIF 2.1 introduces additional vocabulary to express provenance and confidence information for annotations. This section will present two possible approaches to assign provenance and confidence information to annotations:
- a more compact representation using previously introduced or provided companion properties
- a simpler, but more verbose representation using the generic
nif:provenance
andnif:confidence
properties
For a running example, assume the following short example sentence:
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/nif-21/acq#> .
ex:doc_offset_0_52
a nif:OffsetBasedString, nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "52"^^xsd:nonNegativeInteger ;
nif:anchorOf "Apple acquired Metaio, an Augmented Reality company."^^xsd:string .
Provenance and Confidence using Companion Properties¶
Entity Spotting and Linking¶
Sending this NIF document to an entity spotting and linking service, e.g. possibly a future revision of the FREME e-Entity DBpeida Spotlight Service could yield and RDF result similar to:
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbp: <http://dbpedia.org/resource/> .
@prefix dbpo: <http://dbpedia.org/ontology/> .
@prefix nerd: <http://nerd.eurecom.fr/ontology#> .
@prefix freme-api: <http://api.freme-project.eu/example/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/nif-21/acq#> .
[] a owl:Ontology ;
a nif:OffsetBasedString, nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "52"^^xsd:nonNegativeInteger ;
nif:isString "Apple acquired Metaio, an Augmented Reality company."^^xsd:string .
ex:doc_offset_0_5
a nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "5"^^xsd:nonNegativeInteger ;
nif:anchorOf "Apple"^^xsd:string ;
nif:referenceContext ex:doc_offset_0_52 ;
# entity spotting information
a nif:EntityOccurrence ;
nif:entityOccurrenceConf "0.7"^^xsd:decimal ;
nif:entityOccurrenceProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
# primary entity linking result
itsrdf:taIdentRef dbp:Apple_Inc ;
nif:taIdentConf "0.8"^^xsd:decimal ;
nif:taIdentProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
itsrdf:taClassRef nerd:Organization ;
nif:taClassConf "0.95"^^xsd:decimal ;
nif:taClassProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
# alternative, less probable entity linking result
nif:annotationUnit [
itsrdf:taIdentRef dbp:Apple_Bank_for_Savings ;
nif:taIdentConf "0.3"^^xsd:decimal ;
nif:taIdentProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
itsrdf:taClassRef dbpo:Bank ;
nif:taClassConf "0.9"^^xsd:decimal ;
nif:taClassProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
] .
ex:doc_offset_15_21
a nif:OffsetBasedString ;
nif:beginIndex "15"^^xsd:nonNegativeInteger ;
nif:endIndex "21"^^xsd:nonNegativeInteger ;
nif:anchorOf "Metaio"^^xsd:string ;
nif:referenceContext ex:doc_offset_0_52 ;
# entity spotting information
a nif:EntityOccurrence ;
nif:entityOccurrenceConf "0.95"^^xsd:decimal ;
nif:entityOccurrenceProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
# entity linking result
itsrdf:taIdentRef dbp:Metaio_GmbH ;
nif:taIdentConf "0.9"^^xsd:decimal ;
nif:taIdentProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
itsrdf:taClassRef dbpo:Company ;
nif:taClassConf "0.85"^^xsd:decimal ;
nif:taClassProv freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments .
Note
The current implementations of FREME services do not produce data as described here, thus also
example
is used instead of an acutally valid API version number. The discussed RDF data
should rather be interpreted as suggestion/basis for discussion.
The service introduced two NIF substring resources that were spotted as potential named entities. Each substing resource carries several pieces of annotation information:
- spotting information
- The mere fact that a certain substring has been identified as a (likely) reference
to a named entity. This expressed in NIF 2.1 by assigning the
nif:EnitityOccurrence
class to the substring resource. - entity linking information
- (Candidate) references to Linked Data identifiers for mentioned named
entities or classification or referenced enities into one or several
categories. For referencing, the
itsrdf:taIdentRef
property from ITSRDF is used.
Accroding to the companion properties approach, for each of the used annotating properties and
nif:TextSpanAnnotation
subclasses a pair of specific and related subproperties of
nif:provenance
and nif:confidence
were introduced:
annotation property/class | provenance property | confidence property |
---|---|---|
nif:EntityOccurrence | nif:entityOccurrenceProv | nif:entityOccurrenceConf |
itsrdf:taIdentRef | nif:taIdentProv | nif:taIdentConf |
Provenance properties reference either prov:Agent
or prov:Activity
resources providing
details on either just the annotator (be it man or machine) or also additionally on the annotation
process. An outline for an agent description for our example:
@prefix doap: <http://usefulinc.com/ns/doap#>
@prefix prov: <http://www.w3.org/ns/prov#>
@prefix freme: <http://freme-project.eu/example/>
@prefix freme-api: <http://api.freme-project.eu/example>
freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments
a prov:SoftwareAgent, doap:Version ;
doap:shortdesc "NIF REST API for entity recognition and linking us ing DBPedia Spotlight engine" ;
doap:revision "0.x (example)" .
# [...]
freme:description#project
a doap:Project ;
doap:release freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
doap:vendor freme:description#consortium .
# [...]
Confidence properties provide a numeric measure for the degree of certainty of annotating agent when assigning the annotation as a rational number between 0 and 1.
Obviously only one property assertion for a specific companion property can be made for the same
nif:String
resource without causing ambiguity. Thus, whenever several alternative annotations on
the same aspect are to be expressed, additional nif:AnnotationUnit
resources can be created and
linked to the nif:String
annotated resource. In the example, such an nif:AnnotationUnit
is
used for the alternative, less probable entity link for string ex:doc_offset_0_5
to
dbp:Apple_Bank_for_Savings
. The same nif:AnnotationUnit
resource can be (re-)used to host
multiple annotation statements with provenance and confidence via companion properties, as long as
unequivocalness is ensured.
To ensure possibilities to validate such unambiguity and to ensure that provenance and confidence information using companion properties is completely machine actionable, explicit links between companion properties and their corresponding annotating vocabulary items, as in this excerpt of the current NIF 2.1 Core ontology draft:
nif:EntityOccurrence nif:confidenceProperty nif:entityOccurrenceConf ;
nif:provenanceProperty nif:entityOccurrenceProv .
nif:TermOccurrence nif:confidenceProperty nif:termOccurrenceConf ;
nif:provenanceProperty nif:termOccurrenceProv .
itsrdf:taIdentRef nif:confidenceProperty nif:taIdentConf ;
nif:provenanceProperty nif:taIdentProv .
itsrdf:taClassRef nif:confidenceProperty nif:taClassConf ;
nif:provenanceProperty nif:taClassProv .
Terminology Annotation¶
In a similar way as presented for named entities information about term recognition and referencing can be provided, by APIs like the FREME e-Terminology Service a possible addtion to the result listing in the result listing of the previous section could be:
a nif:OffsetBasedString ;
nif:beginIndex "26"^^xsd:nonNegativeInteger ;
nif:endIndex "43"^^xsd:nonNegativeInteger ;
nif:anchorOf "Augmented Reality"^^xsd:string ;
nif:referenceContext ex:doc_offset_0_52 ;
# entity spotting information
a nif:TermOccurrence ;
nif:termOccurrenceConf "0.7"^^xsd:decimal ;
nif:termOccurrenceProv freme-api:description%2Fe-terminology%2Ftilde ;
# term linking result
itsrdf:termInfoRef <https://term.tilde.com/terms/998795> ;
nif:termInfoConf "0.65"^^xsd:decimal ;
nif:termInfoProv freme-api:description%2Fe-terminology%2Ftilde .
Relation of NIF 2.1. companion properties to ITSRDF properties¶
itsrdf:taConfidence
is very similar to both nif-ann:taIdentConf
and nif-ann:taClassConf
,
but is specified to provide a common confidence value for both the link to a concrete entity reference
and an entity type associated with this entity. Since NIF 2.1 also wanted to be able to express e.g.
output of general entity spotters that also assign classes to spotted entities, but are unable to
conclusively disambiguate them [1], NIF introduced it’s own specialised properties.
In cases when NIF 2.1 is actually to be used to describe term linking output from tools in line with
the ITS premises, itsrdf:taConfidence
and the corresponding itsrdf:taAnnotatorRef
can be use
alternatively to the NIF 2.1 companion properties. The provenance reference for itsrdf:taAnnotatorRef
still should be either prov:Agent
or prov:Activity
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix nif-ann: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-annotation#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbp: <http://dbpedia.org/resource/> .
@prefix dbpo: <http://dbpedia.org/ontology/> .
@prefix nerd: <http://nerd.eurecom.fr/ontology#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/nif-21/dresden#> .
ex:doc_offset_0_7
a nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "7"^^xsd:nonNegativeInteger ;
nif:anchorOf "Dresden"^^xsd:string ;
nif:referenceContext ex:doc_offset_0_62 ;
a nif-ann:EntityOccurrence ;
nif-ann:entityOccurrenceConf "0.88"^^xsd:decimal ;
nif-ann:entityOccurrenceProv ex:simple-spotter-service ;
nif-ann:annotationUnit [
itsrdf:taIdentRef dbp:Dresden ;
itsrdf:taClassRef dbo:City ;
itsrdf:taConfidence "0.80"^^xsd:decimal ;
itsrdf:taAnnotatorRef ex:linker-service
] .
Using Generic Provenance and Confidence Properties¶
Usage of compation properties allows to offer a default value for each annotation aspect and allows
two reduce the number of nif:AnnotationUnit
resources that must be synthesized to prevent
ambiguities. However, they also increase technical complexity for consumption of provenance and
confidence information. Using exclusively the generic nif:provenance
and nif:confidence
properties directly simplifies generation and consumption of this information, at the cost of
additional RDF resources required to express equivalent data. Using only these generic properties to
express the same annotation as discussed in Provenance and Confidence using Companion Properties:
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbp: <http://dbpedia.org/resource/> .
@prefix dbpo: <http://dbpedia.org/ontology/> .
@prefix nerd: <http://nerd.eurecom.fr/ontology#> .
@prefix freme-api: <http://api.freme-project.eu/example/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/nif-21/acq#> .
[] a owl:Ontology ;
a nif:OffsetBasedString, nif:Context ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "52"^^xsd:nonNegativeInteger ;
nif:isString "Apple acquired Metaio, an Augmented Reality company."^^xsd:string .
ex:doc_offset_0_5
a nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "5"^^xsd:nonNegativeInteger ;
nif:anchorOf "Apple"^^xsd:string ;
nif:referenceContext ex:doc_offset_0_52 ;
# entity spotting information
nif:annotationUnit [
a nif:EntityOccurrence ;
nif:confidence "0.7"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
] ;
# primary entity linking result
nif:annotationUnit [
itsrdf:taIdentRef dbp:Apple_Inc ;
nif:confidence "0.8"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
] ;
nif:annotationUnit [
itsrdf:taClassRef nerd:Organization ;
nif:confidence "0.95"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
] ;
# alternative, less probable entity linking result
nif:annotationUnit [
itsrdf:taIdentRef dbp:Apple_Bank_for_Savings ;
nif:confidence "0.3"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments
] ;
nif:annotationUnit [
itsrdf:taClassRef dbpo:Bank ;
nif:confidence "0.9"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments
] .
ex:doc_offset_15_21
a nif:OffsetBasedString ;
nif:beginIndex "15"^^xsd:nonNegativeInteger ;
nif:endIndex "21"^^xsd:nonNegativeInteger ;
nif:anchorOf "Metaio"^^xsd:string ;
nif:referenceContext ex:doc_offset_0_52 ;
# entity spotting information
a nif:EntityOccurrence ;
nif:confidence "0.95"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
# entity linking result
nif:annotationUnit [
itsrdf:taIdentRef dbp:Metaio_GmbH ;
nif:confidence "0.9"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
] ;
nif:annotationUnit [
itsrdf:taClassRef dbpo:Company ;
nif:confidence "0.85"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-entity%2Fdbpedia-spotlight%2Fdocuments ;
] .
ex:doc_offset_26_43
a nif:OffsetBasedString ;
nif:beginIndex "26"^^xsd:nonNegativeInteger ;
nif:endIndex "43"^^xsd:nonNegativeInteger ;
nif:anchorOf "Augmented Reality"^^xsd:string ;
nif:referenceContext ex:doc_offset_0_52 ;
# entity spotting information
a nif:TermOccurrence ;
nif:confidence "0.7"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-terminology%2Ftilde ;
# term linking result
nif:annotationUnit [
itsrdf:termInfoRef <https://term.tilde.com/terms/998795> ;
nif:confidence "0.65"^^xsd:decimal ;
nif:provenance freme-api:description%2Fe-terminology%2Ftilde ;
] .
Note
nif:confidence
and nif:provenance
can only be attatched to nif:AnnotationUnit
instances, not to nif:String
instances directly.
Footnotes
[1] | Think for example for a simple gazeteer-based spotting service. It can easily spot ‘Dresden’ and might contain type data associating occurrences of this string with the category ‘pupulated place’, whilst lacking logic to hazard an informed guess whether it’s the city with that name in Germany, the United Kingdom, the US or Canada. |