Top Banner
MultimediaN Pilot E-Culture
19

E-Culture semantic search pilot

Jun 11, 2015

Download

Technology

Guus Schreiber

Seminar, Staford Medical Informatics, August 2006
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: E-Culture semantic search pilot

MultimediaNPilot E-Culture

Page 2: E-Culture semantic search pilot

2

Pilot E-Culture

Partners: VU, UvA, CWI, DEN, ICN

Subproject of MultimediaN, a 16 MEuro project on multimedia technology funded by the Dutch government

Aim: demonstrate added value of Semantic Web techniques for virtual heritage collections

Page 3: E-Culture semantic search pilot

3

Page 4: E-Culture semantic search pilot

4

Hypothesis

Semantic Web technology is in particular useful in knowledge-rich domains

or formulated differently

If we cannot show added value in knowledge-rich domains, then it may have no value at all

Page 5: E-Culture semantic search pilot

5

Use case: painting style

Find paintings of a similar style

KLIMT, GustavPortrait of Adele Bloch-Bauer I1907Oil and gold on canvas138 x 138 cmAustrian Gallery, Vienna

Page 6: E-Culture semantic search pilot

6

How can we find this other ‘Art nouveau’ painting?

MUNCH, EdvardThe Scream1893Oil, tempera and pastel on

cardboard91 x 73.5 cmNational Gallery, Oslo

Page 7: E-Culture semantic search pilot

7

Issues w.r.t. the use case

Parse annotation to find matches with thesauri terms– E.g. match artists to ULAN individuals

Artists-style links– AAT contains styles; ULAN contains artists, but there

is no link• Learn link from corpora• Derive it from other annotations

– Domain-specific rules/reasoning needed • see example in SWRL doc• Painters may have painted in multiple styles

Page 8: E-Culture semantic search pilot

8

Natural-lang proc.automatic annotation

text stings concepts

Distributedcultuurwijzer.nl collections

OAI-based access

Reasoning supporttime/space reasoning

Web interfacesupport for web collections

Presentation facilitiessemantic presentation

device-specific

InteroperabilityXML/RDF/OWL

Scalability> 10,000,000 triples

OntologiesWordNet, AAT, TGN ULAN, Dutch labels

Search strategiessibling searchsemantic distance

Dublin Corespecializationsdumb-down

semantic annotation

DIGITAL HERITAGE COLLECTIONS

semantic search

BASELINEENHANCEDENHANCEDFEATURESFEATURES

NEWNEWFEATURESFEATURES

Page 9: E-Culture semantic search pilot

9

Architecture

Page 10: E-Culture semantic search pilot

10

Use of thesauri

RDF/OWL data models of Getty thesauri– Issues: scope, preserving structure

WordNet: W3C SWBPD workhttp://www.w3.org/TR/wordnet-rdf/

Multilingualism– Dutch version of AAT

Existing collection metadata are parsed to find matches in thesauri (e.g. creator name => ULAN entry)

Page 11: E-Culture semantic search pilot

11

Distributed vs. centralized collection dataMinimal requirement: collection object has

image URIPreference for external metadata,

accessed through protocol such as OAI In practice, external metadata access is

still cumbersome

Page 12: E-Culture semantic search pilot

12

Search strategies

Basic search: keyword-orientedAdvanced search:

– Tweaking default search parameters– Time-related queries

Faceted searchRelation search

– How are two URIs related?

Page 13: E-Culture semantic search pilot

13

Keyword search with semantic clustering1. Btree of literals plus Porter stem and

metaphone index2. Find resources with matching labels

• Default resources are “Work”s

3. Find related resources by one-way graph traversal

• owl:inverseOf is used• Threshold used for constraining search

4. Cluster results (group instances)

Page 14: E-Culture semantic search pilot

14

Demonstrator

Page 15: E-Culture semantic search pilot

15

Search: WordNet patterns that increase recall without sacrificing precisions

(Hollink)

Page 16: E-Culture semantic search pilot

16

Triple statistics

Page 17: E-Culture semantic search pilot

17

Status

4-year project, now in month 18Short-term goals:

– Adding more ethnological collections– Location-oriented presentation– User studies with professional users (museum

people) and interested lay persons– Multi-lingual interface (English, Dutch,

Indonesian)

Page 18: E-Culture semantic search pilot

18

Issues

Getting access to collections is mainly a social process– There is usually no principled objection to make data,

metadata and thesauri publicly available, but it still feels threatening

Cultural heritage is a good area for a Semantic Web “island”:– lots of domain-specific knowledge– strong application pull– enormous amount of existing annotations, which have

been built up over centuries

Page 19: E-Culture semantic search pilot

19

On-line demohttp://e-culture.multimedian.nl