Dr. Sven Strobel IATUL 2015 July 9, 2015; Hannover Semantic Retrieval of the TIB|AV-Portal
Aug 14, 2015
Dr. Sven StrobelIATUL 2015
July 9, 2015; Hannover
Semantic Retrieval of the TIB|AV-Portal
2
Semantic Retrieval of the TIB|AV-Portal
1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval
Contents
3
av.getinfo.de
1. TIB|AV-Portal
• Free web-based portal for scientific videos from the realms of science & technology
• Automatic video analysis: scene, text, speech and image recognition
Profile
• Competence Centre for Non-Textual Materials at TIB in cooperation with Hasso Plattner Institute
• 2011-2014; Launch: April 2014
Development
• Teachers, students, researchersTarget Group
4
1. TIB|AV-Portal
• 2900 videos / 1900 film credits with external links (June 2015)
• Most videos under open access
Content
• Videos from the fields of engineering, architecture, chemistry, informatics, mathematics and physics (TIB core subjects)
• Recordings of lectures and conferences, experiments, interviews, animations, simulations etc.
Focus of the collection
av.getinfo.de
5
Semantic Retrieval of the TIB|AV-Portal
1. TIB|AV-Portal2. Automatic Video Analysis3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval
Contents
2. Automatic Video Analysis
• Permanent linking / citability
• Time-related video segments
• Full-text search in the OCR transcript
• Full-text search in the speech transcript
• Search for image motifs
• Linking textual metadata (OCR / speech transcripts) with GND ontology
DOI assignment
Named‐entity recognition
7
Semantic Retrieval of the TIB|AV-Portal
1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval
Contents
3. Named-Entity Recognition
Named-Entity RecognitionLinking automatically extracted textual metadata with terms of a knowledge base
Definition
GND subject sections for the 6 TIB core subjects
OCR transcript
OCR transcript
OCR transcriptOCR transcript
Speech transcript
Speech transcript
Speech transcript
Speech transcript
Textual Metadata
63 000 GND subject headings
Knowledge Base
8
Textual MetadataSpeech transcript
9
Knowledge Base
10
Video segments indexed by GND subject headings
Algorithm of Named-Entity Recognition
disambiguateGND: Thermodynamik
context
Figure is based on slide 37 from Steinmetz, N.; Sack, H.: Cross-Lingual Semantic Mapping of Authority Files. Presentation held at ‚Semantic Web in Libraries 2013‘. Hamburg (2013).
11
12
Benefits of Named-Entity Recognition
• Fine-grained descriptions of the video segments enable pinpoint segment-based searches within the video content.
• Linking textual metadata with the GND ontology enables a semantic search.
13
Semantic Retrieval of the TIB|AV-Portal
1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval
Contents
4. Metadata and Retrieval
Speech transcriptOCR transcript Automatic indexing
Keyword‐based full‐text search in the writtencontent of the video
Keyword‐based full‐text search in the spokencontent of the video
Taxonomic Schema
14
Metadata
Manual metadata Automatic metadata
‐ Coarse‐grained‐ Highly reliable
Search for ‚classical‘ metadata (title, author...)
‐ Fine‐grained‐ Less reliable
4. Metadata and RetrievalTaxonomic Schema
16
Semantic Retrieval of the TIB|AV-Portal
1. TIB|AV-Portal2. Automatic Video Analysis 3. Named-Entity Recognition4. Metadata and Retrieval5. Semantic Retrieval
Contents
5. Semantic Retrieval
• 63 356 GND subject headings plus synonyms• English translations of the GND subject
headings from DBpedia, LCSH, MACS and WTI Thesaurus
• Semantic search is based on the TIB|AV-Portal knowledge base. This knowledge base includes among other things:
17
Textual Query
• When the user enters a search term, all available synonyms and English (or German) translations from the TIB|AV-Portal knowledge base are automatically included in the query.
18
Textual QueryExample: „Wärmelehre“
„Thermodynamik“ (speech transcript)
„Thermodynamics“ (speech transcript)
„Thermodynamik“ (GND term)
„Thermodynamics“ (manual metadata)
19
Semantic faceted search
Facets:
•Subject
•Language
•Author & contributors
•Publisher
•Licence
•Year of Publication
•Person
•Organization
•Image motif20
Refine search results
Semantic faceted search
21
• Facet terms are terms from GND. • Search index stores:
• URI of the GND term• ID of the video• Position, which was assigned to that term
Search returns videos that contain the selected faceted term and highlights the corresponding video segments
Search index
• No keyword-based search but rather an ‚entity‘-search
Semantic faceted search
22
Example of a search result
Semantic faceted search
23
• By clicking on a facet, that term, synonyms of that term and translations of that term are included in the query.
• GND facet terms are disambiguated.
Improving Recall
Improving Precision
Benefits
Thank you for your attention!