Cigs lod docext_kb_20131118
Post on 07-Dec-2014
207 Views
Preview:
DESCRIPTION
Transcript
vision txt2rdf grounding
Can Documents be Linked Data?
Kate Byrne, School of Informatics, University of Edinburgh
CIGS LOD Workshop
18th November 2013
1
vision txt2rdf grounding
1 The semantic web vision
2 Extracting structured knowledge from free text
3 Respect for authority, or, Why we need ontologies
2
vision txt2rdf grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
vision txt2rdf grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
vision txt2rdf grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
vision txt2rdf grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
vision txt2rdf grounding
Simple declarative sentences
“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”
5
vision txt2rdf grounding
Simple declarative sentences
“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”
hobbit hole the groundlives in located in
5
vision txt2rdf grounding
Simple declarative sentences
“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”
hobbit hole the groundlives in located in
nastiness
hobbit hole comfort
does not have
has type
has characteristic
5
vision txt2rdf grounding
A lot of information is in textual form!
6
vision txt2rdf grounding
A lot of information is in textual form!
6
vision txt2rdf grounding
A lot of information is in textual form!
6
vision txt2rdf grounding
A lot of information is in textual form!
6
vision txt2rdf grounding
A lot of information is in textual form!
6
vision txt2rdf grounding
A lot of information is in textual form!
6
vision txt2rdf grounding
Nouns and verbs
subject objectpredicate
7
vision txt2rdf grounding
Nouns and verbs
subject objectpredicate
hobbit hole the groundlives in located in
nastiness
hobbit hole comfort
does not have
has type
has characteristic
7
vision txt2rdf grounding
Nouns and verbs
subject objectpredicate
hobbit hole the groundlives in located in
nastiness
hobbit hole comfort
does not have
has type
has characteristic
nouns
7
vision txt2rdf grounding
Nouns and verbs
subject objectpredicate
hobbit hole the groundlives in located in
nastiness
hobbit hole comfort
does not have
has type
has characteristic
nouns
verbs
7
vision txt2rdf grounding
1 The semantic web vision
2 Extracting structured knowledge from free text
3 Respect for authority, or, Why we need ontologies8
vision txt2rdf grounding
Extracting structured knowledge from free text
fancy NLP processing
and RDFisation
8
vision txt2rdf grounding
Natural Language Processing pipeline
sentenceand para
split
POS tagtokenise
multi−wordtokens and
features trained NERmodel
list of NEsand
classes
removeunwantedrelations
generatetriples
attachsiteids
trained REmodel
set of NEpairs andfeatures
list ofrelations
and classes
sfsjksjwjvssjkljljs sd’lajoen s
jjs kjdlk lksjlkj sks oihhg sk
jjlkjlj jljbjl skj ekw
RDFtranslation
Graphof triples
Pre−processing Named Entity Recognition
Relation Extraction
Text documents
9
vision txt2rdf grounding
Named entities and relations
Evidence of a quartz knapping site was found within the confines of the stone
strongly suggests a domestic site.Besides the quartz implements and corresponding waste, several other artifacts of localorigin occurred including a split pebble axe of greenstone with Shetland EarlyBronze Age affinities. B Beveridge, 1972.Field survey and excavation, as a response to continual wind and marineerosion, was carried out at the Sands of Breckon between1982 and 1983.HP50NW 11.00 was recorded as a stone settings surrounded byoccupational debris (Site 22). Excavation revealed midden deposits of anearly Iron Age date and a surface scatter of artefacts of mixed dates. Thestone settings were tentatively interpreted as the basal stones of longcists.Historic Scotland Archive Project (SW) 2002.
circle, and in conjunction with several structures within the inner ring,
site 20
10
vision txt2rdf grounding
Named entities and relations
site 20
10
vision txt2rdf grounding
Converting text relations to RDF – 1
site 20
site20 − hasEvent − excavationX
excavationX − hasLocation − SandsOfBreckon
excavationX − hasDate − 1982
11
vision txt2rdf grounding
Converting text relations to RDF – 2
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
site20 − hasEvent − excavationX
excavationX − hasLocation − SandsOfBreckon
excavationX − hasDate − 1982
:hasLocation
siteid:site20
:hasClassn
sitetype:stone+settings20w179
sitename:sands+of+breckon
event:excavation20w158
date:1982
event:excavation
address:hp50nw+11.00
address:breckon
12
vision txt2rdf grounding
1 The semantic web vision
2 Extracting structured knowledge from free text
3 Respect for authority, or, Why we need ontologies13
vision txt2rdf grounding
Let’s remind ourselves what’s the point of Linked Data
13
vision txt2rdf grounding
Let’s remind ourselves what’s the point of Linked Data
excavated by Piggot in 1947...
A complex site on the summit of Cairnpapple Hill
site number:
classification: Cairn, henge
sitename: Cairnpapple
NS97SE 16
siteid: 47919
in West Lothian. The stone is from...
ground stone axehead was found at CairnpappleThis stone flake from the cutting edge of a
find spot:
objectid:
Cairnpapple
X.EP 167
Classn/Sitetype#cairn%20+henge
:Siteid#site47919
:Agent/Person#piggot
:Event#excavated47919w10
:Time/Date#1947
:hasClassn
:hasPeriod
:hasEvent
:hasAgent
:hasLocation
:Loc/Place#cairnpapple+hill
:Loc/Sitename#cairnpapple
:Objectid#x.ep+167
:hasLocation
:hasLocation
:hasClassn:hasFindSpot
:Classn/Objtype#axe+flake
Id#ns97se+16:hasId
archaeological site archive
:Loc/Place#west+lothian
museum database
13
vision txt2rdf grounding
But linking Linked Data is actually pretty hard
excavated by Piggot in 1947...
A complex site on the summit of Cairnpapple Hill
site number:
classification: Cairn, henge
sitename: Cairnpapple
NS97SE 16
siteid: 47919
in West Lothian. The stone is from...
ground stone axehead was found at CairnpappleThis stone flake from the cutting edge of a
find spot:
objectid:
Cairnpapple
X.EP 167
Classn/Sitetype#cairn%20+henge
:Siteid#site47919
:Agent/Person#piggot
:Event#excavated47919w10
:Time/Date#1947
:hasClassn
:hasPeriod
:hasEvent
:hasAgent
:hasLocation
:Loc/Place#cairnpapple+hill
:Loc/Sitename#cairnpapple
:Objectid#x.ep+167
:hasLocation
:hasLocation
:hasClassn:hasFindSpot
:Classn/Objtype#axe+flake
Id#ns97se+16:hasId
archaeological site archive
:Loc/Place#west+lothian
museum database
Direct link means spotting identical node in separate graph
How? String matching? Clues from context?
14
vision txt2rdf grounding
Using LOD cloud “Authority Nodes” as intermediaries
15
vision txt2rdf grounding
Using LOD cloud “Authority Nodes” as intermediaries
15
vision txt2rdf grounding
Using LOD cloud “Authority Nodes” as intermediaries
grounding local URIs
against "authority" nodes
is the
next big challenge!
15
vision txt2rdf grounding
Grounding site20 against Monument Thesaurus
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
sitetype:stone+settings20w179
sitetype:stone+setting
"stone setting"
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
:hasClassn
rdf:type
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdfs:subClassOf
16
vision txt2rdf grounding
Grounding site20 against Monument Thesaurus
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:stone+settings20w179
sitetype:
"stone setting"
sitetype:stone+setting
:hasClassn
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdf:type
rdfs:subClassOf
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
16
vision txt2rdf grounding
Grounding against various authorities/ontologies
Placename authorities: Geonames, OS gazetteer, Pleiades
Period: EH draft ontology
Monument classifications: Seneschal project
Bibliographic: LCSH, FRBR
...hundreds of LOD datasets in the cloud
Informatics projects
Edina “Unlock” service – spatial and temporal groundingGAP projects – grounding against maps of the ancient world
17
vision txt2rdf grounding
Grounding against various authorities/ontologies
Placename authorities: Geonames, OS gazetteer, Pleiades
Period: EH draft ontology
Monument classifications: Seneschal project
Bibliographic: LCSH, FRBR
...hundreds of LOD datasets in the cloud
Informatics projects
Edina “Unlock” service – spatial and temporal groundingGAP projects – grounding against maps of the ancient world
17
vision txt2rdf grounding
Unlock Text – find placenames and plot on map
http://unlock.edina.ac.uk/
18
vision txt2rdf grounding
GapVis interface
http://nrabinowitz.github.com/gapvis/ 19
vision txt2rdf grounding
Questions?
20
top related