Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007.
Post on 29-Mar-2015
216 Views
Preview:
Transcript
Controlled Vocabularies in TELPlus
Antoine ISAACVrije Universiteit Amsterdam
EDLProject Workshop22-23 November 2007
Agenda
• TELPlus Context
• Improving subject access– 3 sub-tasks
• Services for TEL
TELPlus Context
• Started October 2007• Running 27 months
• Content WPs– OCRing previously digitised material– Improving the usability of TEL through OAI
PMH compliancy– Improving Access– Integrating services with TEL portal– User personalisation services– Extending TEL to Bulgaria & Romania
WP3 – Improving Access
• Task 1: Indexing for usability– Review/test state-of-the-art semantic search
engines• On content of documents
• Task 2: Improving subject access• Task 3: FRBR aggregation, search and
browsing– Create/exploit FRBR metadata repositories
• Task 4: Focus on users– Focus groups on prototypes
WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic
alignment between subjects
• Search through collections– Using metadata– In a controlled setting
• Paving the way for enhanced usages– Advanced treatments mentioned in TELplus
need conceptual structures and links between these structures
• E.g. clustering
WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic
alignment between subjects
• Reference: MACS project– Manually-built semantic equivalences
between Rameau, SWD & LCSH headings
MACS: Querying Collections
MACS: Query Reformulation Options
WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic
alignment between subjects
• Reference: MACS project– Manual equivalences between Rameau,
SWD, LCSH headings
• Here: an experiment on deploying automatic alignment techniques– Determining possible strategies– Assessing feasibility and usefulness– MACS context
WP3.2 Sub-tasks
• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)
• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects
• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one
subject list to the other
Converting subjects to standard representation language
Goal: solving syntactic heterogeneity between vocabularies
• Enabling the use of standard tools– E.g. for query (re)formulation
• Paving the way for dealing with semantic heterogeneity– Definitions of concepts expressed according
to a common model
Converting subjects to standard representation language
Approach: Semantic Web and SKOS• Semantic Web
– Knowledge objects as web resources (URIs)– Description by linking resources (RDF)– Description using shared formal
vocabularies (ontologies)
• SKOS – A standard Semantic Web model (ontology)– For knowledge organization systems
(thesauri, subject heading lists…)
http://www.iconclass.nl/s_11
http://www.iconclass.nl/s_11F
skos:Concept
rdf:type
skos:broader
skos:prefLabel
“the Virgin Mary”@en
skos:prefLabel“la Vierge Marie”@fr
http://www.iconclass.nl/
skos:inScheme
skos:ConceptScheme
rdf:type
SKOS: Example
Converting subjects to standard representation language - Process
• Getting processable versions from owners – E.g. XML
• Analyzing the models
• Converting to SKOS
WP3.2 Sub-tasks
• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)
• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects
• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one
subject list to the other
Vocabulary Alignment
• Specifying required alignment format (links)– Type of mapping links: equivalence, broader– Cardinality: one-to-one, one-to-many– Taking application context (TEL) into account
Vocabulary Alignment
• Specifying required alignment format (links)
• Selecting (& running) alignment techniques/tools– Inspired by semantic web approaches
Vocabulary Alignment Techniques
• Similar to ontology alignment problem• Existing approaches for (semi-) automatic
ontology alignment– Using techniques from linguistics, computer
science, statistics
• Problem: performances do not allow 100% automatic alignment
• Problem: multilingual case– Some techniques cannot be used
Backgroundknowledge
Potential Technique: Using Background Knowledge
• Using a shared conceptual reference to find links
SHL 1 SHL 2
“Calendar”
“Publication”
Potential Technique: Statistical Alignment
• Object information (book indexing)
SHL 1 SHL 2
Dually-indexedbooks
“DutchLiterature”
“Dutch”
Vocabulary Alignment
• Specifying required alignment format (links)
• Selection (& running) of tool/method
• Evaluation (& cleaning)– Considering application
Evaluation of Alignments
• MACS has produced mappings!– Possible gold standard
• But: has MACS produced all mappings?– Which proportion of the SHLs is covered?– Taking into account all indexing strings?
• Are MACS mappings the only interesting ones?– “Serendipity” mappings
• Concepts that are not equivalent but could bring useful results when added to queries
– Compensating for indexing variability
Evaluation of Alignments
• Several scenarios for using and evaluating alignments– Concept-based search– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search– Navigation
Evaluation of Alignments
• Several scenarios for using and evaluating alignments– Concept-based search
• Retrieving books indexed by SHL1 using SHL2 concepts
– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search
• Matching user search terms to both SHL1 or SHL2 concepts
– Navigation• Browsing several collections using one SHL
structure
Evaluation of Alignments
• Several settings for a single scenario– Fully automatic reformulation vs assisted
reformulation (candidates)
• Different evaluation measures– Good mappings vs acceptable ones– Number of candidates for reformulation– Semantic closeness to original query
Vocabulary Alignment
• Specifying required alignment format (links)
• Selection (& running) of tool/method
• Evaluation (& cleaning)
• Assessment of the approach– Efforts required, quality, extendibility
WP3.2 Sub-tasks
• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)
• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects
• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one
subject list to the other
Deploying the alignment knowledge obtained into TEL framework
• Observing integration of MACS data into TEL– Conceptual input for alignment requirements
• Integration of the obtained alignment in TEL
• Assessment of the alignment integration– Technical aspects, usage aspects
Reminder
• Alignment is a difficult problem• Application-specific alignment pretty much
unexplored in Semantic Web research
More a feasibility study than a complete solution to the problem
Practical goal: investigate how automatic techniques could help MACS-like initiatives
• Manual mapping is labour-intensive
Agenda
• TELPlus Context
• Improving subject access– 3 sub-tasks
• Services for TEL
WP4 – Integrating services with the European Library portal
Theo van Veen (KB)
Tasks:• Identifying services that are going to give the
user the greatest return• Creating new services• Integrating services within TEL…
WP4 – Some Services Mentioned
Preliminary inventory: no official commitment!
Services based on controlled vocabularies:• Thesaurus and name authority service
– Providing terms linked to query terms
• Semantic enrichment service– Users can annotate search results with
terms
• Distance between terms and related terms
WP4 – Some Services Mentioned
Preliminary inventory: no official commitment!
Services based on controlled vocabularies:• Thesaurus and name authority service• Semantic enrichment service• Distance between terms and related terms
Adding more value from controlled vocabularies and alignments between them
Thanks!
top related