Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment
Jan 02, 2016
Rutherford Appleton Laboratory
SKOSEcoterm 2006
Alistair MilesCCLRC Rutherford Appleton Laboratory
Semantic Web Best Practices and Deployment
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 2
Reminder: what is it?
• Simple Knowledge Organisation System• Formal language for representing
controlled structured vocabularies (thesauri, classification schemes, … ?)
• Subject metadata & information retrieval …– ‘this document is about romantic love’.– ‘this document is about the cure of tuberculosis by x-
ray in India in the 1950s’.
• Application of RDF
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 3
Since Ecoterm 2005 …
• SKOS Core Guide & SKOS Core Vocabulary Specification …– First Working Draft May 2005– Second Working Draft October 2005
• Minor changes
• Quick Guide to Publishing a Thesaurus on the Semantic Web …– First Working Draft May 2005
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 4
What comes next … ?
• Life after SWBPD-WG … ?• Plans for next phase of W3C
Semantic Web Activity …• New WG?• SKOS W3C Recommendation by end
2007?• N.B. Not yet approved!
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 5
If Rec then …
• What is the scope? What is the fundamental design goal?
• First part of SKOS Rec would be requirements specification.
• Between now and Sept/Oct 2006 … define scope and requirements.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 6
What I’d like to do here …
• Talk about some of the assumptions behind SKOS.
• Sketch some ideas on how to define scope and requirements for SKOS.
• Get your [email protected]
“SKOS: Requirements for Standardization”isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 7
Brief history of scope …
• 2003-04: SWAD-Europe– ISO 2788 thesauri– “Non-standard” thesauri via extensibility e.g.
GeMET– Classification scheme (PACS)– Multilingual thesauri– Semantic mapping
• 2004: W3C Glossaries• 2005: Discussion re “terminologies”• Subject headings? Gazeteers?
Folksonomies? Taxonomies?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 8
Assumptions: purpose …
• Formal representation of controlled structured vocabularies intended for use in information retrieval applications.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 9
Assumptions: workflow …
a) Build a vocabularyb) Build an indexc) Retrieve
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 10
Assumptions: components …
• Vocabulary Development Application– Something to help build a vocabulary
• Indexing Application– Something to help build an index
• Retrieval Application– Something to help retrieve things
• SKOS ultimately designed to support interoperation of these three “key components”.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 11
Proposed scope …
• SKOS is a formal language for representing controlled structured vocabularies intended for use within information retrieval applications.
• SKOS is required to support the interoperation of these three key components.
• I.e. define the requirements for SKOS by describing a set of functionalities that must be enabled.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 12
Other components …
• Vocabulary mapping … ?• Metadata registries … ?• … ?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 13
Component specs …
• … first discuss social and technological context, then return to component specs …
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 14
Context …
• What is the social and technological context in which controlled structured vocabs are used?
• Assume two basic needs…– Locate something I already know about.– Discover something new.
• N.B. a good location service is not necessarily a good discovery service.
– Cf. Google and del.icio.us
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 15
Strategies …
• Basic strategies for implementing retrieval services …
1. Statistical text analysis2. Analysis of user behaviour3. Index with controlled vocab
• Other strategies …1. … kos-assisted text analysis?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 16
Cost problem …
• Given that applying controlled structured vocab for retrieval involves significant initial and ongoing investment…
• Given that other strategies are cheaper…
• Huge pressure to drive down cost and increase utility.
• Requirement for seamless integration.– I.e. controlled vocab is seldom used in isolation, most
applications will combine strategies.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 17
Use case …
• Search portal …• Use combined strategies.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 18
Component specs …
• Important factors …
• Minimise cost.– Decentralisation.– Assistance.
• Maximise “utility”.– Query expansion.– Smart ranking.– Maximize lifetime.
• Use the Semantic Web!– Situation A. search across many collections, where
indexers use same controlled vocab.– Situation B. search across many collections, where
indexes use different controlled vocabs.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 19
Focus areas …
• Decentralisation requires different models of collaboration and change.
• Representing change a key factor to keeping a vocab applicable.
• Ranking and scoring well understood for text, less so for controlled index.
• Theory of query expansion? Field trials of query expansion?
• Strategies for providing assistance?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 20
Change and collaboration
• Continuum of collaboration models: centralized <-> decentralised
• Continuum of change management models: continuous <-> discrete
• Decentralization can reduce cost of development and maintenance
• Change management can ensure continued utility – maximize ROI
• Support for declarative representation of change a requirement for SKOS.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 21
Semantic Web architecture…
• Exploit Semantic Web facility to distribute and merge data.
• However, publication of data in the Semantic Web, best practices need work.
• See “Best Practice Recipes for Publishing RDF Vocabularies” W3C Working Draft (Google “publishing RDF”).
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 22
Semantic Web architecture
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 23
Direct interaction …
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 24
Information retrieval…
• Indexing and query evaluation well understood for text content.
• Less well understood for controlled metadata.
• Query types?• Query evaluation strategies, e.g.
query expansion?• Ranking?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 25
Assistance for indexers …
• Provide suggestions– Comparison of labels and annotations– Machine learning – Exploit lexical resources– … ?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 26
Assistance for mappers …
• Provide suggestions …– Analysis of labels and annotations– Exploit lexical resources– … ?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 27
Summary
• SKOS: fundamental requirement to support information retrieval using controlled structured vocabularies.
• Define requirements by describing information retrieval functionalities.
• Divide functionalities into:– Presentation styles– Query types e.g. compound queries, coordination …– Query evaluation strategies
• Assumptions:– Key components– Semantic Web interaction– Context – pressure to make vocabularies “profitable”– … Issues: change, assistance, theory …