Xinnovations 2011 - Suchen ist nicht immer gleich Suchen

Suchen ist nicht gleich SuchenExplorative semantische

MultimediasucheWorkshop ,Corporate Semantic Web‘

XinnovationsBerlin, 19 Sep. 2011

Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering

University of Potsdam

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011

■ HPI was founded in October 1998 as a Public-Private-Partnership

■ HPI Research and Teaching is focussed onIT Systems Engineering

■ 10 Professors and 100 Scientific Coworkers■ 450 Bachelor / Master Students ■ HPI is winner of CHE-Ranking 2010

http://hpi.uni-potsdam.de/

http://hpi.uni-potsdam.de

http://hpi.uni-potsdam.de


■ Research Topics□ Semantic Web Technologies□ Ontological Engineering□ Information Retrieval□ Multimedia Analysis & Retrieval□ Social Networking□ Data/Information Visualization

■ Research Projects

Semantic Technologies & Multimedia Retrieval


Overview(1) Suche in audiovisuellen Medien(2) Semantische Multimediaanalyse(3) Explorative semantische Multimediasuche

Suchen ist nicht gleich SuchenExplorative semantische MultimediasucheWorkshop Corporate Semantic Web, Xinnovations, Berlin, 19. Sep 2011


Die Google-Suche...


The World according to Google...







lineareErgebnisliste





Multimedia



Multimedia

Suchfacetten







offene Fragen:‣habe ich das tatsächlich gesucht...?‣ist das alles...?‣gibt es nicht noch mehr...?‣wie komme ich weiter...?‣welche Suchbegriffe muss ich wählen...?‣wie finde ich heraus, was es noch alles gibt...?


Die Google-Suche und Multimediadaten...


Wie findet Google Multimediadaten?






...<a href="/mission_pages/shuttle/shuttlemissions/sts134/multimedia/index.html">

<IMG WIDTH="100" ALT="Close-up view of Endeavour's crew cabin prior to docking with the International Space Station" TITLE="Close-up view of Endeavour's crew cabin prior to docking with the International Space Station" SRC="/images/content/549665main_2011-05-18_1600_100-75.jpg" HEIGHT="75" ALIGN="Bottom" BORDER="0" /></a><p><a href="/mission_pages/shuttle/shuttlemissions/sts134/multimedia/index.html">&rsaquo; STS-134 Multimedia</a></p>

...


‣Suche erfolgt nach Link Kontext


Wie durchsuche ich ein Multimedia-Archiv?


Step 1: Digitalization of analog data



Step 1: Digitalization of analogue data

Step 2: Annotation with (textbased) metadata



• manuelle Annotation mit inhaltsbeschreibendentextbasierten Metadaten



• manuelle Annotation mit inhaltsbeschreibendentextbasierten Metadaten


...geht das auch mit automatischen Verfahren?


Overview(1) Suche in audiovisuellen Medien(2) Semantische Multimediaanalyse(3) Explorative semantische Multimediasuche



Automatisierte Medienanalyse







Face Detection



Face Detection

Genre Analysis

Classification:StudioIndoor

Nachrichten



Face Detection

overlay text

Genre Analysis


Nachrichten



Face Detection

overlay text

Genre Analysis


Nachrichten

scenetext



Face Detection

overlay text

Logo Detection

Genre Analysis


Nachrichten

scenetext



Face Detection

overlay text

Logo Detection

Genre Analysis


Nachrichten

scenetext

Audio-Mining

structuralanalysis

AutomatedSpeech

Recognitionspeaker

identification


• Structural Analysis• Intelligent Character Recognition (ICR)

• Character/Logo Detection• Character Filtering• Character Recognition

• Audio Analysis • Speaker Detection • Automated Speech Recognition (ASR)

• Genre Analysis / Categorization•graphic / real• indoor / outdoor•day / night•...

• Face/Body/Object Detection, Tracking & Clustering



video

• Zerlegung zeitbezogener Medien in inhaltlich zusammenhängende, kohärente Unterabschnitte

Strukturelle Analyse


video

scenes




video

scenes

shots




video

scenes

shots

subhots




video

scenes

shots

subhots

frames




shots

• Shot Boundary Detection

• Identification of• Hard Cuts• Drop Outs• Soft Cuts, as e.g., Dissolve, Wipe, Cross-Fade, etc.

Analytical Shot Boundary Detection• Analysis of Luminance/Chrominance Histograms• Analysis of Edge Distribution• Analysis of Motion Vectors

Machine Learning• Classification of Hard/Soft Cuts based on Image Features• K-Nearest Neighbor• Random Forrest • Support Vector Machines

Histogram Difference Analysis

Motion Vector Analysis



• Structural Analysis• Intelligent Character Recognition (ICR)

• Character/Logo Detection• Character Filtering• Character Recognition

• Audio Analysis • Speaker Detection • Automated Speech Recognition (ASR)

• Genre Analysis / Categorization•graphic / real• indoor / outdoor•day / night•...

• Face/Body/Object Detection, Tracking & Clustering



• Preprocessing• Character Identification• Text Preprocessing

• Text Filtering• Adaption of script geometry (Deskew)• Image quality enhancement

• Optical Character Recognition (OCR)• Standard OCR software (OCRopus)

• Postprocessing• Lexical analysis • Statistical / context based filtering

Ermittlungen nachBombenfunden

Intelligent Character Recognition


• Preprocessing• Character Identification

Filtering• Local Binary Patterns (LBP)• Histogram of Oriented Gradients



• Preprocessing• Character Identification

Filtering• Local Binary Patterns (LBP)• Histogram of Oriented Gradients



Original Image Bounding Box



Advanced Image Enhancement



Standard OCR (OCRopus)



Context-based Spell Correction



• Ergebnis: Multimediadaten mit spatiotemporalen Annotationen

Metadata Extraction


Metadata (e.g. MPEG-7) ... <Video> <TemporalDecomposition> <VideoSegment> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <MediaTime> <MediaTimePoint> T00:05:05:0F25 </MediaTimePoint> <MediaDuration> PT00H00M31S0N25F </MediaDuration> </MediaTime> ... </VideoSegment> </TemporalDecomposition> </Video> ...

time


• Ergebnis: Multimediadaten mit spatiotemporalen Annotationen

Metadata Extraction


Metadata (e.g. MPEG-7) ... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...


Aber wie werden die Metadaten semantisch?

... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...


Named Entity Recognition

Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

is a

is a



Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

Entities

Classes(Ontologies) is a

is a



Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

is a

is a


Video Analysis /Metadata Extraction

Semantic Multimedia Analysis

timemetadata

metadatametadata

metadatametadata




timemetadata

metadatametadata

metadatametadata

e.g., person xylocation yzevent abc

e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Recognition/ Mapping


Named Entity Recognition• Mapping keyterms (text) to semantic entities

• Context Analysis and Disambiguation





JaguarKeyterm / User Tag





JaguarKeyterm / User Tag


Jaguar (Car)

Jaguar (Cat)

Jaguar (OS)

Jaguar (Aircraft)

?

?

?

?

Semantic Entities


RDF graph to find relations between entities co-occurringin a text maintaining the hypothesis that disambiguationof co-occurring elements in a text can be obtained byfinding connected elements in an RDF graph [7]. In orderto regard the special compilation of non-textual data, staticand user-genrated metadata in audio-visual content our novelapproach combines the use of semantic technologies andLinked Data with linguistic methods.

III. METHOD

According to a study about structure and characteristicsof folksonomy tags [8] an average of 83% of user-generatedtags are single terms. Also, an average of 82% of thereviewed tags are nouns. Based on these study results, weignore tag practices, such as camel case (”barackObama”)and treat tags as subjects or categories describing a resource.As a tag could also be part of a group of nouns representingan entity or a name (”flying machine”,”albert einstein”) thetags stored as single words without any given order have tobe combined in term groups of two or more terms to findall appropriate entities. Hence, every tag or group of tagswithin a given context may represent a distinct entity. Theterm combination process and subsequent mapping of termsand term groups to entities are described in sect. III-B.

To disambiguate ambiguous terms we combine two meth-ods: a co-occurences analysis of the terms in the context inWikipedia articles and an analysis of the page link graph ofthe Wikipedia articles of entity candidates. The scores forboth analysis steps are calculated to a total score.

A. Context Definition

Metadata exists in a certain context and has to be inter-preted according to this context. For tags of audio-visualcontent we identified two dimensions:

• temporal dimension• user-centered dimensionIn the temporal dimension a context can be defined as the

entire video, a segment or a single timestamp in the video.The user-centered dimension classifies a context by howmany users created the concerning metadata - only tags by acertain user or all tags regardless of which user. Fig. 1 showsthe combinations of the two dimensions of contexts formetadata in audio-visual content the interpretation regardingthe significance of a context.

Audio-visual content also provides the opportunity tosupply spatial information. Thus, tags in the same regionof a video frame are considered as related to each other.In the current approach we did not consider this contextdimension.

To describe our approach we use a sample context of ourtest set (see sect. IV). This sample context is composed oftags by only one user at a certain timestamp in the video.The video containing this sample context is a presentation

Figure 1. Dimensions of context definition in audio-visual content

by Dr. Garik Israelian at the TED conference3 entitled ”Howspectroscopy could reveal alien life”4. Our sample contextconsists of the tags ”hubble”, ”spitzer”, ”carbon”, ”dioxide”,”methan”, ”co2”, and ”water”.

B. Preprocessing

Term Combination: Our combination algorithm takesall tags of a specified spatio-temporal context (at a certaintimestamp/in a certain segment of a video, of a singleURL/image and generates every possible combination of atmost three terms of the context in every possible order. Inthat way we make sure to rectify groups of single termsthat belong together. We chose to generate combinationsof three words to make sure to also hit named entitiesconsisting of more than two words, such as ”public keycryptography” or ”alberto santos dumont”. About 90% ofthe DBpedia [9] labels consist of at most three words, butless than 5% consist of 4 words. Due to these numbersand performance issues we decided to limit the number ofterms to be combined to three. Subsequently in this paperby terms we will refer to single terms as well as generatedterm groups. The number c of combinations is calcultaed byc =

�jk=1

n!(n�k)! .

For our sample context containing 7 tags and at most3 terms in a combination (j = 3), 259 combinations aregenerated.

Term Mapping: The terms then have to be mapped tosemantic entities. For our approach we use entities of theLinked Open Data Cloud [10], in particular of the DBpedia,version 3.5.1.

DBpedia provides labels for the identification of distinctentities in 92 languages. We use English and German aswell as Finnish labels, as we noticed that neither English northe German labels contain important acronyms as labels, butthe Finnish language version does. As tagging users prefer tokeep it simple and short[2], resources dealing with ”DomainName System” would rather be tagged with ”DNS” than”Domain Name System”.

After simple string matching of the terms of the contextto DBpedia URIs, the URIs are revised for redirects and

3http://www.ted.com4http://yovisto.com/play/14415

Context Analysis and DisambiguationWhat defines a Context in AV-Data?

• Temporal Coherence • Spatial Coherence• Provenance




III. METHOD











B. Preprocessing


�jk=1

n!(n�k)! .









Spatial Dimension



III. METHOD











B. Preprocessing


�jk=1

n!(n�k)! .









Temporal Dimension

Spatial Dimension



III. METHOD











B. Preprocessing


�jk=1

n!(n�k)! .









User-centered Dimension

Temporal Dimension

Spatial Dimension


Statistische Analyse

1956 wheel rimsteve mcqueen

context?

CooccurrenceAnalysis

„jaguar“http://dbpedia.org/resource/Jaguar_(Cats)

http://dbpedia.org/resource



Statistische Analyse

„jaguar“http://dbpedia.org/resource/Jaguar_(Cars)

1956 wheel rimsteve mcqueen

context?

CooccurrenceAnalysis




jaguarKeyterm / User Tag

LOD Cloud

Semantic Graph Analysis

1956 Stevejaguar

McQueenrim wheel

context

Jaguar (Car)Steve McQueen

1956

Jaguar (Cat)Jaguar (OS)


Overview(1) Suche in audiovisuellen Medien(2) Semantische Multimediaanalyse(3) Explorative semantische

Multimediasuche



Searching is not always just searching


ein Beispiel:

Ich suche nach dem Roman „Wem die Stunde schlägt“ von Ernest Hemingway, am besten in der ersten deutsch-sprachigen Auflage


Wem die Stunde schlägt. - Ernest H E M I N G W A Y. (Stockholm usw., Bermann-Fischer Verlag, 1941) 560 S. 8“

II 1, 2506, 34548

ein Beispiel:

Ich suche nach dem Roman „Wem die Stunde schlägt“ von Ernest Hemingway, am besten in der ersten deutsch-sprachigen Auflage


aber was mache ich, wenn...

...mir das Buch ,Wem die Stunde schlägt‘ gut gefallen hat und ich jetzt nicht weiß, was ich als nächstes lesen soll...


aber was mache ich, wenn...

...mir das Buch ,Wem die Stunde schlägt‘ gut gefallen hat und ich jetzt nicht weiß, was ich als nächstes lesen soll...


Explorative Suche• Der Nutzer weiß nicht genau, welchen Suchstring er benutzen soll

• Die Antwort ist nicht in einem Dokument aleine zu finden• Der Nutzer kennt sich im gesuchten Themengebiet nicht aus• Der Nutzer sucht einen Gesamtüberblick über ein Thema• ...

• ...,Stöbern‘ statt ,Suchen‘• ...etwas zufällig finden• ...Serendipity• ...einen Überblick gewinnen• ...den Suchraum erkunden


Wie realisiert man eine explorative

Multimediasuche?



Explorative Multimediasuche

timemetadata

metadatametadata

metadatametadata

e.g., person xylocation yzevent abc

e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Recognition/ Mapping


Data is a precious thing and will last longer than the systems themselves. (Tim Berners-Lee) http://linkeddata.org/

The Web of Data - The Semantic Web

http://linkeddata.org

http://linkeddata.org


dbpedia:For_Whom_the_Bell_Tolls

What facts for dbpedia:For_Whom_the_Bell_Tollsare relevant?

http://dbpedia.org/page/For_Whom_the_Bell_Tolls

DBPedia - the Semantic Wikipedia

...use heuristics


dbpedia-owl:author

dbpedia:Ernest_Hemingwaydbpedia:For_Whom_the_Bell_Tolls



dbpedia-owl:author


dbpe

dia-

owl:a

utho

r



dbpedia-owl:author


dbpe

dia-

owl:a

utho

r

dbpedia-owl:author



dbpedia-owl:author


dbpe

dia-

owl:a

utho

r

dbpedia-owl:author

dbpedia-owl:author



dbpedia-owl:author




dbpedia-owl:author


dbpedia:Raymond_Carver

dbpedia-

owl:influenced_by



dbpedia-owl:author



dbpedia-

owl:influenced_by

dbpedia:Jack_Kerouac

dbpe

dia-

owl:i

nflu

ence

d_by



dbpedia-owl:author



dbpedia-

owl:influenced_by

dbpedia:Jack_Kerouac

dbpe

dia-

owl:i

nflu

ence

d_by

dbpedia-owl:influenced_by

dbpedia:Jerome_D._Salinger



dbpedia:Jack_Kerouac dbpedia:Raymond_Carverdbpedia:Jerome_D._Salinger




dbpedia-owl:notableWork




dbpedia-owl:notableWork dbpedia-owl:notableWork




dbpedia-owl:notableWork dbpedia-owl:notableWork dbpedia-owl:notableWork



Wie könnte eine explorative semantische

Multimediasuche aussehen...?


29

http://mediaglobe.yovisto.com:8080

http://mediaglobe.yovisto.com:8080/

http://mediaglobe.yovisto.com:8080/


2929

Semantische SuchtechnologienExplorative Suche in audiovisuellen Daten

J. Waitelonis, H. Sack, Z. Kramer, J. Hercher:Semantically Enabled Exploratory Video Search, in Proc. of Semantic Search Workshop (SemSearch10) at the 19th Int. World Wide Web Conference (WWW2010), 26-30 April 2010, Raleigh, NC, USA, 2010.

http://km.aifb.kit.edu/ws/semsearch10/Files/video.pdf




http://km.aifb.kit.edu/ws/semsearch10


http://www2010.org/www/



2929












29











29


29



29


29



29


29



29

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011http://bit.ly/SeMEX

http://mediaglobe.yovisto.com:8080/mggui/%23start













http://bit.ly/SeMEX

http://bit.ly/SeMEX


Overview(1) Suche in audiovisuellen Medien(2) Semantische Multimediaanalyse(3) Explorative semantische

Multimediasuche



Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam

Homepage:http://www.hpi.uni-potsdam.de/meinel/team/sack.html http://www.yovisto.com/Blog: http://moresemantic.blogspot.com/E-Mail: [email protected] [email protected]: lysander07 / biblionomicon / yovisto

Vielen Dank für Ihre

Aufmerksamkeit!

http://www.hpi.uni-potsdam.de/meinel/team/sack.html

http://www.hpi.uni-potsdam.de/meinel/team/sack.html

http://www.yovisto.com

http://www.yovisto.com

http://moresemantic.blogspot.com

http://moresemantic.blogspot.com

mailto:[email protected]




Xinnovations 2011 - Suchen ist nicht immer gleich Suchen

Technology

mir das buch

automatisierte

statistische

flu ow pedidbbydbpedia

der ersten

semantic search

corporate

explorative