Cigs lod docext_kb_20131118

Post on 07-Dec-2014

207 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Can documents be Linked Data? / Kate Byrne, School of Informatics, University of Edinburgh, CIGS LOD Workshop Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013

Transcript

vision txt2rdf grounding

Can Documents be Linked Data?

Kate Byrne, School of Informatics, University of Edinburgh

CIGS LOD Workshop

18th November 2013

1

vision txt2rdf grounding

1 The semantic web vision

2 Extracting structured knowledge from free text

3 Respect for authority, or, Why we need ontologies

2

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

vision txt2rdf grounding

The semantic web vision

W3C RDF Concepts, 2002 draft

“RDF ... allows anyone to say anything about anything.”

Tim Berners-Lee, 2006

“The day-to-day mechanisms of trade, bureaucracy and our dailylives will be handled by machines talking to machine, leavinghumans to provide the inspiration and intuition.”

Tim Berners-Lee, 2009

“The web as I envisaged it, we have not seen it yet.”

3

vision txt2rdf grounding

Simple declarative sentences

“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”

5

vision txt2rdf grounding

Simple declarative sentences

“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”

hobbit hole the groundlives in located in

5

vision txt2rdf grounding

Simple declarative sentences

“In a hole in the ground there lived a hobbit. Not a nasty, dirty,wet hole, filled with the ends of worms and an oozy smell, nor yeta dry, bare, sandy hole with nothing in it to sit down on or to eat:it was a hobbit-hole, and that means comfort.”

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

5

vision txt2rdf grounding

A lot of information is in textual form!

6

vision txt2rdf grounding

A lot of information is in textual form!

6

vision txt2rdf grounding

A lot of information is in textual form!

6

vision txt2rdf grounding

A lot of information is in textual form!

6

vision txt2rdf grounding

A lot of information is in textual form!

6

vision txt2rdf grounding

A lot of information is in textual form!

6

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

7

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

7

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

nouns

7

vision txt2rdf grounding

Nouns and verbs

subject objectpredicate

hobbit hole the groundlives in located in

nastiness

hobbit hole comfort

does not have

has type

has characteristic

nouns

verbs

7

vision txt2rdf grounding

1 The semantic web vision

2 Extracting structured knowledge from free text

3 Respect for authority, or, Why we need ontologies8

vision txt2rdf grounding

Extracting structured knowledge from free text

fancy NLP processing

and RDFisation

8

vision txt2rdf grounding

Natural Language Processing pipeline

sentenceand para

split

POS tagtokenise

multi−wordtokens and

features trained NERmodel

list of NEsand

classes

removeunwantedrelations

generatetriples

attachsiteids

trained REmodel

set of NEpairs andfeatures

list ofrelations

and classes

sfsjksjwjvssjkljljs sd’lajoen s

jjs kjdlk lksjlkj sks oihhg sk

jjlkjlj jljbjl skj ekw

RDFtranslation

Graphof triples

Pre−processing Named Entity Recognition

Relation Extraction

Text documents

9

vision txt2rdf grounding

Named entities and relations

Evidence of a quartz knapping site was found within the confines of the stone

strongly suggests a domestic site.Besides the quartz implements and corresponding waste, several other artifacts of localorigin occurred including a split pebble axe of greenstone with Shetland EarlyBronze Age affinities. B Beveridge, 1972.Field survey and excavation, as a response to continual wind and marineerosion, was carried out at the Sands of Breckon between1982 and 1983.HP50NW 11.00 was recorded as a stone settings surrounded byoccupational debris (Site 22). Excavation revealed midden deposits of anearly Iron Age date and a surface scatter of artefacts of mixed dates. Thestone settings were tentatively interpreted as the basal stones of longcists.Historic Scotland Archive Project (SW) 2002.

circle, and in conjunction with several structures within the inner ring,

site 20

10

vision txt2rdf grounding

Named entities and relations

site 20

10

vision txt2rdf grounding

Converting text relations to RDF – 1

site 20

site20 − hasEvent − excavationX

excavationX − hasLocation − SandsOfBreckon

excavationX − hasDate − 1982

11

vision txt2rdf grounding

Converting text relations to RDF – 2

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

site20 − hasEvent − excavationX

excavationX − hasLocation − SandsOfBreckon

excavationX − hasDate − 1982

:hasLocation

siteid:site20

:hasClassn

sitetype:stone+settings20w179

sitename:sands+of+breckon

event:excavation20w158

date:1982

event:excavation

address:hp50nw+11.00

address:breckon

12

vision txt2rdf grounding

1 The semantic web vision

2 Extracting structured knowledge from free text

3 Respect for authority, or, Why we need ontologies13

vision txt2rdf grounding

Let’s remind ourselves what’s the point of Linked Data

13

vision txt2rdf grounding

Let’s remind ourselves what’s the point of Linked Data

excavated by Piggot in 1947...

A complex site on the summit of Cairnpapple Hill

site number:

classification: Cairn, henge

sitename: Cairnpapple

NS97SE 16

siteid: 47919

in West Lothian. The stone is from...

ground stone axehead was found at CairnpappleThis stone flake from the cutting edge of a

find spot:

objectid:

Cairnpapple

X.EP 167

Classn/Sitetype#cairn%20+henge

:Siteid#site47919

:Agent/Person#piggot

:Event#excavated47919w10

:Time/Date#1947

:hasClassn

:hasPeriod

:hasEvent

:hasAgent

:hasLocation

:Loc/Place#cairnpapple+hill

:Loc/Sitename#cairnpapple

:Objectid#x.ep+167

:hasLocation

:hasLocation

:hasClassn:hasFindSpot

:Classn/Objtype#axe+flake

Id#ns97se+16:hasId

archaeological site archive

:Loc/Place#west+lothian

museum database

13

vision txt2rdf grounding

But linking Linked Data is actually pretty hard

excavated by Piggot in 1947...

A complex site on the summit of Cairnpapple Hill

site number:

classification: Cairn, henge

sitename: Cairnpapple

NS97SE 16

siteid: 47919

in West Lothian. The stone is from...

ground stone axehead was found at CairnpappleThis stone flake from the cutting edge of a

find spot:

objectid:

Cairnpapple

X.EP 167

Classn/Sitetype#cairn%20+henge

:Siteid#site47919

:Agent/Person#piggot

:Event#excavated47919w10

:Time/Date#1947

:hasClassn

:hasPeriod

:hasEvent

:hasAgent

:hasLocation

:Loc/Place#cairnpapple+hill

:Loc/Sitename#cairnpapple

:Objectid#x.ep+167

:hasLocation

:hasLocation

:hasClassn:hasFindSpot

:Classn/Objtype#axe+flake

Id#ns97se+16:hasId

archaeological site archive

:Loc/Place#west+lothian

museum database

Direct link means spotting identical node in separate graph

How? String matching? Clues from context?

14

vision txt2rdf grounding

Using LOD cloud “Authority Nodes” as intermediaries

15

vision txt2rdf grounding

Using LOD cloud “Authority Nodes” as intermediaries

15

vision txt2rdf grounding

Using LOD cloud “Authority Nodes” as intermediaries

grounding local URIs

against "authority" nodes

is the

next big challenge!

15

vision txt2rdf grounding

Grounding site20 against Monument Thesaurus

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

sitetype:stone+settings20w179

sitetype:stone+setting

"stone setting"

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

:hasClassn

rdf:type

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdfs:subClassOf

16

vision txt2rdf grounding

Grounding site20 against Monument Thesaurus

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:stone+settings20w179

sitetype:

"stone setting"

sitetype:stone+setting

:hasClassn

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdf:type

rdfs:subClassOf

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

16

vision txt2rdf grounding

Grounding against various authorities/ontologies

Placename authorities: Geonames, OS gazetteer, Pleiades

Period: EH draft ontology

Monument classifications: Seneschal project

Bibliographic: LCSH, FRBR

...hundreds of LOD datasets in the cloud

Informatics projects

Edina “Unlock” service – spatial and temporal groundingGAP projects – grounding against maps of the ancient world

17

vision txt2rdf grounding

Grounding against various authorities/ontologies

Placename authorities: Geonames, OS gazetteer, Pleiades

Period: EH draft ontology

Monument classifications: Seneschal project

Bibliographic: LCSH, FRBR

...hundreds of LOD datasets in the cloud

Informatics projects

Edina “Unlock” service – spatial and temporal groundingGAP projects – grounding against maps of the ancient world

17

vision txt2rdf grounding

Unlock Text – find placenames and plot on map

http://unlock.edina.ac.uk/

18

vision txt2rdf grounding

GapVis interface

http://nrabinowitz.github.com/gapvis/ 19

vision txt2rdf grounding

Questions?

20

top related