Top Banner
TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

TEMPO

RAL VID

EO

BOUNDARIES

-PART

ONE-

SNUEE

K IM K

YUNG

MIN

Page 2: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

Why do we need temporal segmen-tation of

videos?

How do we set up boundaries in be-tween video

frames?

How do we merge two separate but uniform

segments?

Page 3: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

ABSTRACT

Much work has been done in automaticvideo analysis. But while techniques likelocal video segmentation, object detectionand genre classification have beendeveloped, little work has been done onretrieving overall structural properties of

avideo content.

Page 4: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

ABSTRACT(2)

Retrieving overall structure in a video con-tent

means splitting the video into meaningful tokens

by setting boundaries within the video. =>Temporal Video Boundary Segmentation

We define these boundaries into 3 cate-gories : micro-, macro-, mega- boundaries.

Page 5: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

ABSTRACT(3)

Our goal is to have a system for au-tomatic video

analysis, which should eventually work for

applications where a complete metadata is

unavailable.

Page 6: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

INTRODUCTION

What’s going on?Great increase in quantity of video con-tents.

More demand for content-aware apps.Still the majority of video contents have insufficient metadata.

=> More demand for information on temporal video boundaries.

Page 7: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

BOUNDARIES : DEFINITIONS

Micro-bound-aries : the shortest ob-servable tem-poral seg-ments. Usually bounded within a se-quence of con-tiguously shot video frames.

(frames under the same micro-bound-aries.)

Page 8: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

Micro-boundaries are associated to the smallest video units, for which a given attribute is constant or slowly varying. The attribute can be visual, sound or text.

Depending on which attribute, mi-cro-boundaries can differ.

Page 9: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

BOUNDARIES : DEFINITIONS(2)

Macro-bound-aries : boundaries between dif-ferent parts of the narra-tive or the segments of a video con-tent.

(frames under the same macro-bound-aries.)

Page 10: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

Macro-boundaries are boundaries between micro-boundaries that are clearly identifiable organic parts of an event defining a struc-tural or thematic unit.

Page 12: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

Mega-Boundaries are boundaries between macro-boundaries which typically exhibit a structural and feature consistency.

Page 13: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

BOUNDARIES : FORMAL DEFINITION

A video content contains three types of

modalities : visual, audio, textual

and each modality has three levels : low-, mid,

high-

These levels describe the “amount of details”

in each modality in terms of granu-larity and

abstraction.

Page 14: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

BOUNDARIES : FORMAL DEFINITION(2)

For each modality and levels is an attribute. An

attribute defined as below. (at-tribute vector)

: denotes modality( ex : m=1, 2 and 3 means visual, au-dio and text respectively.

: denotes the index for the attributes. (ex : m=1 and =1 indexes color )

: denotes the total number of vector components.

: time constant ( can be expressed in integers or milliseconds.)

Page 15: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

BOUNDARIES : FORMAL DEFINITION(3)

If time interval is defined as , the average and

the deviation of an attribute throughout the

video can be expressed as below := avg of

(deviation) =

Where

Page 16: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

BOUNDARIES : FORMAL DEFINITION(4)

By using the vectors defined previously, we now have

two different methods to estimate temporal boundaries :

Local Method Global Method

Has no memory Has memory

Given a threshold , and distance metric ‘Dist’, ifDist( )is larger than , then there exists a boundary at instant

The difference computed over a series of time. So we calculate the distance metric between the universal average, instead of the previous attribute.

If Dist ,a boundary exists at in-stant

Page 17: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES

In multi-media, the term “shot” or “take” is widely used.

Similar concept can be used to define the segment

between micro-boundaries, which is often called a

“family of frames.”

Each segment has an representative frame called

“keyframe.” The keyframe of a family has audio/video

data that well represents the segment. But the method

to pick out the keyframe may vary.

Page 18: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES(2)Each family has a “family histogram” to eventu-

ally form a

“superhistogram.”

A family histogram is a data structure that repre-sents

the color information of a family of frames.

A superhistogram is a data structure that contains the

information about non-contiguous family histograms

within the larger video segment.

Page 19: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES(3)

Generation of family histograms and superhis-tograms

may vary depending on pre-defined dimen-sions below.

1) The amount of memory

-No memory means comparing only with the pre-

vious frame.

2) Contiguity of compared families

-Determining the time step.

3) Representation for a family

-How we choose the keyframe.

Page 20: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : FAMILY OF FRAMESAn image histogram is a vector representing

the color values and the frequency of their occurrence in the image.

Finding the difference between consecutive his-tograms and merging similar histograms en-able generating family of frames.

For each frame, we compute the histogram( ) and then search the previ-ously computed family histograms( ) to find the closest match.

Page 21: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : FAMILY OF FRAMES(2)Several ways to generate histogram

difference :

Among them, the L1 and bin-wise histogram intersection gave the best results.

Page 22: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : BOUNDARY DE-TECTIONIf the difference between two family his-

tograms is less than a given threshold, the current histogram is merged into the family histogram.

Each family histogram consists of :

1) pointers to each of the constituent histograms and frame numbers.

2) a merged family histogram.

Page 23: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : BOUNDARY DE-TECTION(2)Merging of family histograms is per-

formed as below:

(basically, the mean of all histograms in the given video.)

Page 24: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : BOUNDARY DE-TECTION(3)Multiple ways to compare and merge families,

depends on the choice of contiguity and memory.

1) Contiguous with zero memory

2) Contiguous with limited memory

3) Non-contiguous with unlimited memory

4) Hybrid : first a new frame histogram is compared using the contiguous frames and then the generated family histograms are merged using the non-contiguous case.

Page 25: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : EXPERIMENTS

CNN News Sample.

27,000 frames

Tested with 9, 30, 90, 300 bins in HSB, 512 bins in RGB

Multiple histogram comparisons: L1, L2, bin-wise inter-section and his-togram intersec-tion.

Tried on 100 thresh-old values.

Page 26: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : EXPERIMENTS(2)

Tested on a video clip, best results showed when threshold 10 with the L1 comparison/contiguous with limited memory boundary method/HSB space quantized to 9 bins.

Page 27: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MICRO-BOUNDARIES : EXPERIMENTS(3)

Page 28: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES

A story is a complete narrative structure, con-veying a continuous thought or event. We want micro-segments with the same story to be in the same macro-segment.

Usually we need textual cues(transcripts) for setting such boundaries, but this paper sug-gests methodologies that does the job solely with audio and visual cues.

We focus on the observation that stories are characterized by multiple constant or slowly varying multimedia attributes.

Page 29: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES(2)

Two types of uniform segment detection :

Unimodal and multimodal

Unimodal(under the same modality) : when a video segment exhibits the “same” charac-teristic over a period of time using a single type of modality.

Multimodal : vice versa

Page 30: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES : SINGLE MODAL-ITY SEGMENTATIONIn case of audio-based segmentation:

1) Partition a continuous audio stream into non-overlapping segments.

2) Classify the segments using low-level au-dio features like bandwidth.

3) Divide the audio signal into portions of dif-ferent classes.(speech, music, noise etc.)

Page 31: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES : SINGLE MODAL-ITY SEGMENTATION(2)In case of textual-based segmentation :

1) If transcript doesn’t exist, extract text data from the audio stream using speech-to-text conversion.

2) The transcript segmented with respect to a predefined topic list.

3) A frequency-of-word-occurrence metric is used to compare incoming stories with the pro-files of manually pre-categorized stories.

Page 32: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES : MULTIMODAL SEG-MENTSWhat we want to do : Retrieve better seg-

mentation results by using the results from various unimodal segmentations.

What we need to do : first the pre-merging steps, and then the descent steps.

Page 33: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES : MULTIMODAL SEG-MENTS(2)

Pre-merging Steps : detect micro-segments that exhibit uniform properties, and deter-mine attribute templates for further segmen-tation.

1) Uniform segment detection

2) Intra-modal segment clustering

3) Attribute template determination -attribute template : a combination of numbers that characterize the attribute.

4) Dominant attribute determination

5) Template application

Page 34: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES : MULTIMODAL SEG-MENTS(3)

Descent Methods : By making combinations of multimedia segments across multiple modalities, each attribute with its segments of uniform values is associated with a line.

Page 35: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES : MULTIMODAL SEG-MENTS(4)

Single descent method describes the process of generating story segments by combining these segments.

1) Single descent with intersecting union

2) Single descent with intersection

3) Single descent with secondary attribute

4) Single descent with conditional union

Page 36: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

MACRO-BOUNDARIES : EXPERIMENTS

Single descent process with conditional union.

Used text transcript as the dominant attribute.

-uniform visual/audio segments

-uniform audio segments

You can find a lag be-tween the story begin-ning and the produc-tion of transcript.

Page 37: TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

Questions?