Top Banner
Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1
31

Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

1

Nonchronological Video Synopsis and

IndexingTPAMI 2008

Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE

Page 2: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

2

OutlineIntroduction

Related Work on Video Abstraction

Synopsis by Energy Minimization

Object-based Synopsis

Synopsis of Endless Video

Limitations and Failures

Demo

Page 3: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

3

Introduction

Video browsing and retrieval is time consuming, most captured video is never watched or examined.

Video synopsis provides a short video representation, while preserving the essential activities of the original video.

The activity in the video is condensed into a shorter period by simultaneously showing multiple activities, even when they originally occurred at different times.

The synopsis video is also an index of the original video by pointing to the original time of each activity.

Page 4: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

4

Introduction

The properties of video synopsis:The video synopsis should be shorter than the original video.

The video synopsis is also a video, expressing the dynamics of the scene.

Reduce as much as spatiotemporal redundancy as possible, the relative timing between activities may change.

Visible seams and fragmented objects should be avoided.

Page 5: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

5

Introduction

Video synopsis can make surveillance cameras and webcams more useful by giving the viewer summaries.

Synopsis server analyzes the live video feed of interesting events and record the object-based description of the video.

The description includes duration, location , appearance.

For example:Event飛機起飛Object飛機Description 起飛的時間 , 機場 , 飛機的外觀

In a 3D space-time description of the video, object is represented by a “tube”.

Page 6: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

6

Introduction

Fig. 1. The input video shows a walking person and, after a period of inactivity, displays a flying bird. A compact video synopsis can be produced by playing the bird and the person simultaneously.

Fig. 2. Basic temporal rearrangement of object. Objects of interest are defined and viewed as tubes in the space-time volume. (a) Two objects recorded at different

times are shifted to the same time interval in the shorter video synopsis.

(b) A single object moving during a long time is broken into segments having a shorter duration and those segments are shifted in time and played simultaneously.

(c) Intersection of objects does not disturb the synopsis when object tubes are broken into segments.

Input video

Video synopsis

Page 7: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

7

Related Work on Video Abstraction

Fast forwardingIndividual frames or groups of frames are skipped in fixed or adaptive intervals.

Simple but only complete frames can be removed.

Video condensation ratio is relatively low.

Video summarizationKey frames are extracted and usually presented simultaneously as a storyline.

Loses the dynamics of the original video

Page 8: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

8

Related Work on Video Abstraction

Video montageSpatial and temporal shifts are applied to object to create a video summary.

This paper uses only temporal transformations, keeping spatial location intact.

Fig. 3. Comparison between “video montage” and our approach .(a) A frame from a “video montage.” Two space-time regions were shifted in both time and space and then stitched together.Visual seams between the different regions are unavoidable. (b) A frame from a “video synopsis.” Only temporal shifts were applied, enablingseamless stitching.

Page 9: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

Synopsis by Energy MinimizationAny synopsis pixel S(x,y,t) can come from an input pixel I(x,y,M(x,y,t)). The time shift M is obtained by minimizing the following cost function:

Ea(M) : activity, indicates the loss in activities.

Activity measure: difference from the background

Ed(M) : discontinuity, indicates the sum of color difference across seams between spatiotemporal neighbors.

ei : the six unit vectors representing the six temporal neighbors. Four spatial and two temporal neighbors.9

Page 10: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

10

Synopsis by Energy Minimization

Fig. 4 (a) The shorter video synopsis S is generated from the input video I by including most active pixels together with their spatiotemporal neighborhood. To assuresmoothness, when pixel A in S corresponds to pixel B in I, their “cross border” neighbors in space as well as in time should be similar.

Page 11: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

11

Synopsis by Energy Minimization

A seam exists between two neighboring locations (x1,y1) and (x2,y2) in S if M (x1,y1) != M (x2,y2) .

ei : four unit vectors describing the four spatial neighbors.

K : # of frames in the output.

N : # of frames in the input.

Fig. 4. (b) An approximate solution can be obtained by restricting consecutive synopsis pixels to come from consecutive input pixels.

Page 12: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

12

Object-based Synopsis

Low-level approach for video synopsis as described earlier is limited to satisfying local properties such as avoiding visible seams.

Higher level object-based properties can be incorporated when objects can be detected and tracked.

To enable segmentation of moving foreground objects, we start with background construction.

For short video clips, using a temporal median over entire clip.

For surveillance cameras, using a temporal median over a few minutes before and after each frames.

Page 13: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

13

Object-based Synopsis

Fig. 6. Background images from a surveillance camera at Stuttgartairport. The bottom images are at night, while the top images are indaylight. Parked cars and parked airplanes become part of thebackground.

Page 14: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

14

Object-based Synopsis

Use a simplification of [32] to compute the space-time tubes representing dynamic objects.

High quality and real time.

A single video sequence with a moving foreground object and stationary background, using background subtraction, color and contrast cues to extract a foreground accurately and efficiently.

Each tube b is represented by its characteristic function:

tb : the time interval in which this object exists.

[32] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” Proc. Ninth European Conf. Computer Vision, pp. 628-641, 2006.

Page 15: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

15

Object-based Synopsis[32] J. Sun, W. Zhang, X. Tang, and H. Shum, “Background Cut,” Proc. Ninth European Conf. Computer Vision, pp. 628-641, 2006.

Fig. 7. Four extracted tubes shown “flattened” over the corresponding backgrounds from Fig. 6. The left tubes correspond to ground vehicles, while the right tubes correspond to airplanes on the runway at the back.

Page 16: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

16

Object-based Synopsis

Create a synopsis having maximum activity while avoiding collision between objects.

Optimal synopsis video as the one that minimizes the following energy function:

Page 17: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

17

Object-based Synopsis

Activity cost : penalize for objects that are not mapped to a valid time in the synopsis.

Collision cost : for every two shifted tubes, define the collision cost as the volume of their space-time overlap weighted by their activity measures.

Reducing the weights of collision cost will result in a denser video where object may overlap.

Increasing this weight will result in a sparser video where objects do not overlap and less activity is presented.

Page 18: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

Object-based SynopsisTemporal consistency cost : preserving the chronological order of events.

The amount of interaction d(b, b’) between each pair of tubes is estimated from their relative spatiotemporal distance.

d(b,b’,t) : euclidean distance

: the extent of the space interaction between tubes.

If tube b and b’ do not share a common time at the synopsis video.

: extent of time in which events still have temporal interaction.

18

Page 19: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

19

Object-based Synopsis

Energy minimizationUse simple greedy optimization.

The optimization was applied in the space of all possible temporal mappings M.

Initial state, use the state in which all tubes are shifted to the beginning of the synopsis video.

In order to accelerate computation, restrict the temporal shifts of tubes to be in jumps of 10 frames.

Page 20: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

20

Object-based SynopsisStroboscopic panoramic synopsis

Long tubes exist in the input video, the duration of the synopsis video is bounded.

Page 21: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

21

Object-based Synopsis

Surveillance application

Fig. 12. Video synopsis from street surveillance. (a) A typical frame fromthe original video (22 seconds). (b) A frame from a video synopsis movie(2 seconds) showing condensed activity. (c) A frame from a shorter video synopsis (0.7 second) showing even more condensed activity.

Page 22: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

22

Synopsis of Endless Video

Make the webcam resource more useful. Build a system which is based on the object-based synopsis allows dealing with endless videos.

Query to systemFor example: I would like to watch in one minute a synopsis of the video from this camera captured during the last hour.

Respond to queryThe most interesting events(tubes) are collected from the desired period and are assembled into a synopsis video of the desired length.

The synopsis video is an index into the original video as each object includes a pointer to its original time.

Page 23: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

23

Synopsis of Endless Video

Page 24: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

24

Synopsis of Endless Video

Removing stationary framesFilter out frames with no activity during online phase.

Record frames according to two criteria A global change in the scene, measured by SSD between the incoming frame and the last kept frame. For lighting change.

The existence of a moving object measured by the maximal SSD in small windows.

By assuming that moving objects with a very small duration are not important(e.g. less than a second). Video activity can be measured only once in every 10 frames.

Page 25: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

Synopsis of Endless VideoThe object queue

Main challenge is handling endless videos.

The naive scheme is to throw out the oldest activity. Not good

Estimate the importance of each object to possible future queries and throw objects out accordingly.

Importance (activity)

Collision potential (spatial activity distribution)

Age

Other options like specify activity is of interest.

25

Fig. 14. The spatial distribution of activity in the airport scene (intensity is log of activity value). The activity distribution of a single tube is on the left and the average over all tubes is on the right. As expected, the highest activity is on the car lanes and on the runway. The potential for the collision of tubes is higher in regions having a higher activity.

Page 26: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

26

Synopsis of Endless Video

Synopsis generationGenerate a background video.

A consistency cost is computed for each object and for each possible time in the synopsis.

An energy minimization determines which tubes appear in the synopsis and at what time.

The selected tubes are combined with the background.

Page 27: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

27

Synopsis of Endless Video

Time-lapse backgroundRepresent the background changes over time.

Represent the background of the activity tubes.

Constructing two temporal histograms:A temporal activity histogram Ha of the video stream.

A uniform temporal histogram Ht of the video stream.

Compute a third histogram by interpolating the two histograms

Page 28: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

28

Synopsis of Endless Video

Consistency with backgroundPrefer to stitch tubes to background images having a similar appearance.

Final energy function:

Page 29: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

Synopsis of Endless Video

Stitching the synopsis videoUse the modification of Poisson editing to deal with objects coming from different lighting condition.

Overlapping tubes are blended together by letting each pixel be a weighted average of the corresponding pixels from the stitched activity tubes, with weights proportional to the activity measures.

29

[20] M. Gangnet, P. Perez, and A. Blake, “Poisson Image Editing,”Proc. ACM SIGGRAPH ’03, pp. 313-318, July 2003.

Page 30: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

30

Limitations and failures

Video synopsis is less applicable in several cases, some of which are listed below:

Video with already dense activity. All locations are active all the time. An example is a camera in a busy train station.

Edited video, like a feature movie. The intentions of the movie creator may be destroyed by changing the chronological order of events.

Page 31: Nonchronological Video Synopsis and Indexing TPAMI 2008 Yael Pritch, Alex Rav-Acha, and Shmuel Peleg, Member, IEEE 1.

31

Demo

http://www.vision.huji.ac.il/video-synopsis/