www.petamedia.eu IRP of Special Interest Group 2 - Leader: TU Berlin Tools for Tag Generation Introduction The aim of this integrative research project (IRP) is the generation of tags and metadata using sig- nal processing and/or users’ annotation. This IRP deals with algorithms for key frame ex- traction and video shot clustering to enable users to tag more easily videos. TUB Database TUB TUD Annotation TUB Shot/subshot boundary detection EPFL Visual quality TUB Key framing TUB Text detection / feature extraction TUB EPFL TUD QMUL Video clustering Integration acitivies The IRP’s main topic is the integration of different expertise in the areas of image search engines, au- dio/video signal processing, machine learning and text detection/recognition. Preparations The first activity was to set up a database of videos from the unstructured channel “Travel” on YouTube.com using “NUE YouTube Downloader”. This tool is also useful for other IRPs, i. e. “Social Media Acquisition”. Tool used for setting up the database This common database, consisting of 100 videos and affiliated metadata (keywords, comments, user information etc.), was then annotated for shot boundaries. Key Frame Extraction Temporal video segmentation divides the video stream into a set of segments from each of which one representative frame is extracted based on at- tention features. Key frame extraction methods are simple yet ef- fective form of summarizing a long video sequence and can be used for applications that only work on images, like search engines (CBIR) or image clus- ter algorithms. These key frames can also be used for automatic or manual tagging, because they fa- cilitate users’ annotation. Extracted key frames of a video sequence Tag Generation A topic of this IRP is the generation of tags. Fol- lowing aspects have been considered: • Quality Tags: Key frames are used to pro- duce quality tags by no-reference video qual- ity assessment. good quality Tags derived by no-reference video quality assessment • Tags derived from Text: Recognizing text within video sequences is also a possibility for generating tags. Another possibility to produce tags is to find out persons and locations by analyzing the sentence structure of affiliated descriptions and user comments. • Tags generated by concept detectors (like indoor/outdoor, face, etc.) Clustering A fundamental step in this video summarization is to create a similarity matrix and organize key frames into the tree-structure using ant-tree clus- tering method. Tree structuring of video frames by ant-tree clustering Low-level features and tags of key frames are clus- tered to find related video content, and visualized using the FastMap algorithm applied on distance matrices. Propagation of widely common tags within compact clusters is to be studied. Clustered key frames to perform similarity search Future Work • Automatic ROI Image Tagging A solution for automatic image tagging can be archived by object duplicate detection in static images or key frames. The goal of detection is to propagate tags of objects re- garding from a training set. What is this object? Taj Mahal OK Taj Mahal Taj Mahal Taj Mahal User Annotated Object Automatic Image Tagging Tag propagation by object duplicate detection • Subject Classification Automatic subject tagging of video involves the assignment of a subject label to a video object or to a time point within a video ob- ject. The subject label reflects the semantic theme treated by the video; it reflects what the video is about rather than what is de- picted in the visual channel. • Semantic Key Frame Extraction Semantic Key Frame extraction is the task of selecting one or more keyframes to rep- resent the intellectual content of a video or a given segment of video stream. • Visual Reranking to improve Video Retrieval Low-level visual features will be exploited for improving semantic-theme based retrieval of videos indexed using speech recognition transcripts of their spoken content. Contact Coordination: Pascal Kelm Web: www.petamedia.eu Email: {[email protected]}