www.monash.edu.au A real-time behavior recognition framework for visual surveillance Mahfuzul Haque Manzur Murshed
May 24, 2015
www.monash.edu.au
A real-time behavior recognition
framework for visual surveillance
Mahfuzul Haque
Manzur Murshed
www.monash.edu.au
2
Motivation
Are we really protected?
www.monash.edu.au
3
Motivation
Deployment of large number of surveillance cameras in recent years
London Heathrow airport has more than 5000 cameras!!
www.monash.edu.au
4
Motivation
Dependability on human monitors has increased.
Reliability on surveillance system has decreased.
www.monash.edu.au
5
Research Question
How to recognize unusual, unsafe and
abnormal human and group behaviors from a
surveillance video stream in real-time?
Automatic detection of abnormal
behaviors to aid the human
monitors
Reduce the dependability on
human monitors
Improve the reliability of
surveillance systems for ensuring
human security
www.monash.edu.au
6
Proposed Research Framework
A real-time behavior recognition framework for visual surveillance
Surveillance
video stream
Identified
active agents
Classified
active agents
Tracked
trajectories
Pattern
database High level
description of
unusual actions
and interactions
1.
Environment
Modeling
2.
Feature Extraction
and Agent
Classification
3.
Agent Tracking
with Occlusion
Handling
4.
Event/Behavior
Recognition Alarm!
www.monash.edu.au
7
Targeted Behaviors
Mob violence
Crowding
Sudden group
formation/deformation
Shooting
Public panic
www.monash.edu.au
8
Research Problems
www.monash.edu.au
9
1. Environment Modeling
How to extract the active regions from surveillance video stream?
Challenges!!
• Background initialization is not a practical approach in real-world
• Dynamic nature of background environment due to illumination variation, local motion, camera displacement and shadow
Background Subtraction
- =
Current frame Background Moving foreground
www.monash.edu.au
10
Environment Modeling in Literature (1 of 4)
Single Gaussian Model (Wren et al. PAMI’ 97)
Gaussian Mixture Model (Stauffer et al. CVPR’ 99, Lee PAMI’ 05)
Generalized Gaussian Mixture Model (Allili et al. CRV’ 07)
Gaussian Mixture Model with SVM (Zhang et al. THS’ 07)
Cascaded Classifiers (Chen et al. WMVS’ 07)
Environment modeling
Background subtraction
Background modeling
Background maintenance
Foreground detection
Moving foreground detection
Object detection
Moving object detection
Pixel-based approaches
www.monash.edu.au
11
Environment Modeling in Literature (2 of 4)
Region and texture-based approaches Incorporates neighborhood information using block or texture measure. (Sheikh et al. PAMI’ 07, Heikkila et al. PAMI’ 06, Schindler et al. ACCV’ 06)
Shape-based approaches Use shape-based features instead of color features. (Noriega et al. BMVC’ 06, Jacobs et al. WMVC’ 07)
Environment modeling
Background subtraction
Background modeling
Background maintenance
Foreground detection
Moving foreground detection
Object detection
Moving object detection
www.monash.edu.au
12
Environment Modeling in Literature (3 of 4)
Predictive modeling Uses probabilistic prediction of the expected background. (Toyama et al. ICCV’ 99, Monnet et al. ICCV’ 03)
Model initialization approaches Recovering clear background from a given sequence containing moving objects. (Gutchess et al. ICCV’ 01, Wang et al. ACCV’ 06, Figueroa et al. IVC’ 06)
Environment modeling
Background subtraction
Background modeling
Background maintenance
Foreground detection
Moving foreground detection
Object detection
Moving object detection
www.monash.edu.au
13
Environment Modeling in Literature (4 of 4)
Nonparametric background modeling Density estimation based on a sample of intensity values. (Elgammal et al. ECCV’ 00)
Stationary foreground detection Uses multiple model operating on multiple time scale. (Cheng et al. WMVC’ 07)
Environment modeling
Background subtraction
Background modeling
Background maintenance
Foreground detection
Moving foreground detection
Object detection
Moving object detection
www.monash.edu.au
14
2. Agent Classification
How to classify the active regions in real-time?
Challenges!! • Identifying the appropriate features for the targeted behaviors
• Real-time classification using the those features
Vehicle People in Group Person carrying
object Single Person
Active Regions
Human Non-human
Single Person People in
Groups
Carrying
Object Not Carrying any
Object
Features
• Position
• Width/Height
• Centroid/Perimeter
• Aspect Ratio
• Compactness
• Others….
Which features to use?
B. Liu and H. Zhou (NNSP’ 03)
www.monash.edu.au
15
Agent Classification in Literature
Binary image classification techniques
Algorithms for calculating ellipticity, rectangularity, and triangularity
Feature evaluation techniques
Agent
Classification
Generic
Classification
Approaches
Domain
Specific
Classifiers
Classification
Using Tracked
Trajectories
Coastline
Surveillance
System
Traffic
Monitoring
System
Industrial
Robot
Manipulator
Residential
Security
System
For identifying humans, pets, and other objects.
For classifying objects on moving conveyor.
Vehicle (including motorcycle, car, bus and truck)
And human (including pedestrian and bicycler)
For classifying different kinds of ships.
www.monash.edu.au
16
3. Occlusion Handling during Tracking
Occlusion handling is a major problem in visual surveillance.
During occlusion only portions of each objects are visible and often at very low resolution.
Challenges!!
Better models need be developed to cope with the correspondence between features for eliminating errors during tracking multiple objects.
www.monash.edu.au
17
Occlusion Handling in Literature (1 of 3)
Most practical method for addressing occlusion is through
the use of multiple cameras.
Progress is being made using statistical methods to predict
object pose, position, and so on, from available image
information.
www.monash.edu.au
18
Occlusion Handling in Literature (2 of 3)
Region-based tracking works well in scenes containing
only a few objects (such as highways).
Active contour-based tracking reduces computational
complexity and track under partial occlusion but sensitive
to the initialization of tracking.
www.monash.edu.au
19
Occlusion Handling in Literature (3 of 3)
Model-based tracking – high computational cost,
unsuitable for real-time implementations.
Feature-based tracking can handle occlusion
between two objects as long as velocity of
centriods are distinguishable.
Centroid of
the bounding
box
width
height
(x,y)
www.monash.edu.au
20
4. Behavior Recognition
Challenges!!
• Identifying the time-varying features
for a particular behavior
• Automatic learning of behaviors
• Recognizing the learned behaviors
in different scenarios
How to learn and recognize
a particular behavior?
Movement pattern
Behavior
Recognition
Crowd
Violence
Sudden group
formation
Pattern
Database
www.monash.edu.au
21
Behavior Recognition in Literature (1 of 3)
Real-time system for recognizing human behaviors including following another person and altering one’s path to meet another. (Oliver et al. PAMI’ 00)
Real-time system to determine whether people are carrying objects, depositing an object, exchanging bags. (Haritaoglu et al. PAMI’ 00)
Following another person
Altering one’s path to
meet another
Carrying object
Depositing an object
Exchanging objects
Behavior
Recognition
www.monash.edu.au
22
Behavior Recognition in Literature (2 of 3)
Identifying abnormal movement patterns. (Grimson et al. CVPR’ 98)
Interaction patterns among a group of people based on simple statistics computed on tracked trajectories. Behaviors: loitering, stalking and following. (Wei et al. ICME’ 04)
Real-time behavior interpretation from traffic video for producing lexical output. (Kumar et al. ITS’ 05)
Abnormal movement pattern
Loitering
Stalking
Following
Target moving towards point
Target crossing a point
Target stopped at a point
Behavior
Recognition
www.monash.edu.au
23
Behavior Recognition in Literature (3 of 3)
Tracking groups of people in metro scene and recognizing abnormal behaviors. Appearance/disappearance of groups, dynamics (split and merge) and failure of motion detector. (Cupillard et al. WAVS’ 01)
Analyzing vehicular trajectories for recognizing driving patterns. (Niu et al. ICSP’ 03)
Surveillance event primitives: entry/exit, crowding, splitting and track loss. (Guha et al. VSPETS’ 05)
Appearance of groups
Disappearance of groups
Merging of groups
Splitting of groups
Turn/Stop
Entry/Exit
Crowding
Track loss
Behavior
Recognition
www.monash.edu.au
24
Addressed Research Problem
www.monash.edu.au
25
Surveillance
video stream
Identified
active agents
Classified
active agents
Tracked
trajectories
Pattern
database High level
description of
unusual actions
and interactions
1.
Environment
Modeling
2.
Feature Extraction
and Agent
Classification
3.
Agent Tracking
with Occlusion
Handling
4.
Event/Behavior
Recognition Alarm!
Environment Modeling in the Proposed Framework
www.monash.edu.au
26
Environment Modeling
Surveillance
video stream Identified
moving objects
Environment
Modeling
Baseline
Pixel-based approaches are more suitable for visual surveillance
Most popular and widely used pixel-based method was introduced at MIT by Stauffer and Grimson (CVPR’ 99)
Gaussian Mixture Model (GMM) was used for environment modelling
Improved adaptability proposed by Lee (PAMI’ 05)
www.monash.edu.au
27
Environment Modeling using Gaussian Mixtures
Sky
Cloud
Leaf
Moving Person
Road
Shadow
Moving Car
Floor
Shadow
Walking People
P(x)
x µ
σ2
P(x)
x µ
σ2
P(x)
x µ
σ2
P(x) Sky
Cloud
Person Leaf
x (Pixel intensity)
www.monash.edu.au
28
Moving Object Detection
ω1
σ12
µ1
road
ω2
σ22
µ2
shadow
ω3
σ32
µ3
car
road shadow car road shadow
Frame 1 Frame N
65% 20% 15%
b
1k
kb TωargminBBackground Models
T = 70%
T is minimum portion of data in the environment accounted for background.
Matched model for a new pixel value Xt, |Xt - µ | < Mth * σ
Models are ordered by ω/σ
K models
www.monash.edu.au
29
An Observation
Current frame Moving foreground
Background
Model
T = 70% T = 90%
This model is sensitive to
environment!!
Not an ideal approach for
the proposed framework!!
www.monash.edu.au
30
Background Representation
ω1
σ12
µ1
m1
road
ω2
σ22
µ2
m2
shadow
ω3
σ32
µ3
m3
car
Models are ordered by ω/σ
i
i
Kiargmaxj
where jm
Background
Representation
How to obtain a visual representation of the background from the
environment model?
road shadow car road shadow
Frame 1 Frame N
Background
Model
Why?
Which value should be
used to represent the
background?
- =
Current frame Background Moving foreground
www.monash.edu.au
31
Representation of the Computed Background
(a) Test Frame
(b) Lee’s Formulation
(c) Proposed Approach
(a) (b) (c)
Lee (PAMI' 05) gave an intuitive solution to
compute the expected value of the
observations believed to be background.
Kj jj
Kk kkk
K
kkk
GPGBP
GPGBPBGPGXEBXE
1
1
1 )()|(
)()|()|(]|[]|[
www.monash.edu.au
32
Another Observation
K = 3
K models
Selecting the least probable model for the new pixel value could
sacrifice the most appropriate model representing the background!
road shadow car road shadow
Frame 1 Frame N
road shadow car
65% 20% 15%
Models are ordered by ω/σ
ω1
σ12
µ1
m1
ω2
σ22
µ2
m2
ω3
σ32
µ3
m3
Contradiction in model dropping strategy!!
Which model should be dropped?
ω
σ2
µ
m
www.monash.edu.au
33
Model Dropping Strategy
K = 3
K models
road shadow car road shadow
Frame 1 Frame N
road shadow car
65% 20% 15%
Models are ordered by ω/σ
ω1
σ12
µ1
m1
ω2
σ22
µ2
m2
ω3
σ32
µ3
m3
ω
σ2
µ
m
To have a realistic background representation
To retain the most contributing background models as
long as possible
Objectives
The model having the least evidence for representing the background.
Which model should be dropped?
www.monash.edu.au
34
Representation of the Computed Background
(a) (b) (c) (d)
(a) Test Frame
(b) Lee’s Formulation
(c) Proposed (ODS)
(d) Proposed (MDS)
ODS - Original Dropping Strategy
MDS - Modified Dropping Strategy
And it works!
www.monash.edu.au
35
Background Response from Pixel Model - 1
www.monash.edu.au
36
Background Response from Pixel Model - 1
www.monash.edu.au
37
Background Response from Pixel Model - 2
www.monash.edu.au
38
Background Response from Pixel Model - 2
www.monash.edu.au
39
Experiments
Total 14 test sequences
5 PETS sequences (Performance Evaluation for Tracking and Surveillance)
7 Wallflower sequences (Microsoft Research)
2 other sequences
Datasets
Moving Object Detection
- =
Current frame Background Moving foreground
Evaluation
Compared with two most widely used GMM-based methods:
Stauffer and Grimson (CVPR’ 99) and Lee (PAMI’ 05)
Results are evaluated both visually and numerically
False Positive (FP)
False Negative (FN)
False Classification
www.monash.edu.au
40
Involved parameters, thresholds and constants
Learning Rate (α)
Maximum number of distribution per pixel model (K)
Matching threshold (Mth)
Subtraction threshold (Sth)
Initial high variance assigned to a new distribution (V0)
Initial low weight assigned to a new distribution (W0)
K = 3
www.monash.edu.au
41
First
Frame
Test
Frame
Ground
Truth
GMM
(Stauffer) GMM
(Lee)
Proposed
(ODS)
Proposed
(MDS)
(1)
(2)
(3)
(4)
(5)
(1) PETS2000; (2) PETS2006-S7-T6-B-1; (3) PETS2006-S7-T6-B-2; (4) PETS2006-S7-T6-B-3; and (5) PETS2006-S7-T6-B-4.
Experimental Results (PETS Dataset)
www.monash.edu.au
42
First
Frame
Test
Frame
Ground
Truth
GMM
(Stauffer) GMM
(Lee)
Proposed
(ODS)
Proposed
(MDS)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
Experimental Results (Wallflower Sequences)
(6) Bootstrap; (7) Camouflage; (8) Foreground Aperture; (9) Light Switch; (10) Moved Object; (11) Time Of Day; and (12) Waving Tree
www.monash.edu.au
43
Experimental Results (Football and Walk)
First
Frame
Test
Frame Ground
Truth
GMM
(Stauffer) GMM
(Lee)
Proposed
(ODS)
Proposed
(MDS)
(13)
(14)
(13) Football; and (14) Walk
www.monash.edu.au
44
Experimental Results (Numeric Evaluation)
False Negative
www.monash.edu.au
45
Experimental Results (Numeric Evaluation)
False Positive
www.monash.edu.au
46
Experimental Results (Numeric Evaluation)
False Negative + False Positive
www.monash.edu.au
47
Environment Modeling
Contributions
• Independent of any environment sensitive parameter
• Improved detection quality than existing GMM-based methods
• No post-processing step required
• Operational with same parameter setting in different environments
• Fault tolerant with small camera displacement
Surveillance
video stream Identified
moving objects
Environment
Modeling
www.monash.edu.au
48
Timetable
Behavior Recognition
Thesis Writing
Object Classification
Literature Review
Tracking/Occlusion
Environment Modeling
First Year Second Year Third Year Task
Pattern
database
Environment
Modeling
Feature Extraction
and Agent
Classification
Agent Tracking
with Occlusion
Handling Behavior
Recognition Alarm!
www.monash.edu.au
49
Acknowledgments
• http://www.fotosearch.com/DGV464/766029/
• http://www.cyprus-trader.com/images/alert.gif
• http://security.polito.it/~lioy/img/einstein8ci.jpg
• http://www.dtsc.ca.gov/PollutionPrevention/images/question.jpg
• http://www.unmikonline.org/civpol/photos/thematic/violence/streetvio2.jpg
• http://www.airports-worldwide.com/img/uk/heathrow00.jpg
• http://www.highprogrammer.com/alan/gaming/cons/trips/genconindy2003/exhibit-hall-crowd-2.jpg
• http://www.bhopal.org/fcunited/archives/fcu-crowd.jpg
• http://img.dailymail.co.uk/i/pix/2006/08/passaPA_450x300.jpg
• http://www.defenestrator.org/drp/files/surveillance-cameras-400.jpg
• http://www.cityofsound.com/photos/centre_poin/crowd.jpg
• http://www.hindu.com/2007/08/31/images/2007083156401501.jpg
• http://paulaoffutt.com/pics/images/crowd-surfing.jpg
• http://msnbcmedia1.msn.com/j/msnbc/Components/Photos/070225/070225_surveillance_hmed.hmedium.jpg
URLs of the images used in this presentation
www.monash.edu.au
50
Thank you!
Q&A