http://imagelab.ing.unimo.it Tutorial: multicamera and distributed video surveillance Third ACM/IEEE International Conference on Distributed Smart Cameras ICDSC 2009 30/08/2009 Como (Italy) Prof. Rita Cucchiara Università di Modena e Reggio Emilia, Italy
95
Embed
multicamera and distributed surveillance - AImageLabimagelab.ing.unimore.it/imagelab/pdf/tutorial-cucchiara_icdsc09.pdf · recognition for multicamera and distributed video surveillance.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
http://imagelab.ing.unimo.it
Tutorial: multicamera and distributed
video surveillance
Third
ACM/IEEE
International
Conference on
Distributed
Smart Cameras
ICDSC 2009
30/08/2009
Como (Italy)
Prof. Rita Cucchiara
Università di Modena e Reggio Emilia, Italy
Agenda
This tutorial addresses algorithms and techniques of computer vision and pattern
recognition for multicamera and distributed video surveillance.
When multiple (heterogeneous) cameras are connected in a forest of sensors, standard techniques
used in single- fixed camera surveillance are not sufficient anymore.
Different approaches should be taken into account depending on the camera layout (e.g., with
overlapped or not overlapped field of view), the camera motion (e.g.,
fixed or PTZ cameras), the network capability and the availability of computational resource
in the smart camera for early processing.
The tutorial aims at presenting a short survey of the research activities in this area, mainly
focusing on people surveillance; models and algorithms for object segmentation
and tracking in multi-camera environments will be presented in details with several demos
from ImageLab of Modena.
Techniques for people detection in cluttered environment will be presented. Finally, recent
advances in trajectory analysis for people behavior classification in distributed cameras systems
will be discussed.
Benchmark videos with ground truth and tutorial material will be available for the tutorial
CV &PR for multicamera and distributed surveillance
Segmentation
(into blobs)
Tracking
(observation-model
correspondence)
At each frame
Region of interest
selection
Tracking
(model-observation
correspondence)
At each frame
Data
association,
and
consistent
labeling
across
camera
views
Recognition, Identification
and classification
Event detection
Situation assessment
Action interaction
Bheavior analisys
At server side, or in the distributed nodes
Multi camera surveillance with overlapping FoVs
Using multiple sensors/cameras has many
advantages:
Wider coverage of the scene
Multi-modal sensoring
Redundant data (improved accuracy)
Fault tolerance
Multiple cameras require consistent
labeling
Consistent labeling requires homography
and techniques for solving tracking in
overlapping FoVs
in a planar constraint
[26]A. C. Sankaranarayanan,
A.Veeraraghavan, and R.Chellappa,
Object Detection, Tracking and
Recognition for Multiple Smart
Camera Proceedings of the IEEE | Vol. 96, No. 10, October 2008
[45]Saad M. Khan and Mubarak Shah;
Tracking Multiple Occluding People
by Localizing on Multiple Scene
Planes; IEEE TRANS. ON
PAMI, VOL. 31, NO. 3, MARCH
2009
[46]Saad M. Khan and Mubarak Shah;
multi view approach of tracking
people in crowded scenes using
planar homography constraint ECCV
2006
Consistent labeling: overview
Appearance-based approaches base the
matching essentially on the color of the
objects: color histograms, faces, …
Geometry-based approaches exploit
geometrical relations and constraints
between different views: homography-based, epipolar geometry, calibration-less,
…
Mixed approaches combine information
about the geometry with information
provided by the visual appearance:
PRObabilistic information
fusion, Bayesian Networks, …
[47]J. Orwell, P. Remagnino, G. Jones, Multi-camera
colour tracking, in: Proc. of Second IEEE Workshop
on Visual Surveillance, (VS’99), 1999, pp. 14–21.
[48]K. Nummiaro, E. Koller-Meier, T. Svoboda, D.
Roth, L. Van Gool, Color-based object tracking in
multi-camera environments, in: DAGM03, 2003, pp.
591–599.
[49]Z. Yue, S. Zhou, R. Chellappa, Robust two-
camera tracking using homography, in: Proc. of
IEEE Intl Conf. on Acoustics, Speech, and Signal
Processing, Vol. 3, 2004, pp. 1–4
[50]A. Mittal, L. Davis, M2tracker: A multi-view
approach to segmenting and tracking people in a
cluttered scene, IJCV 51 (3) (2003) 189–203
[51]S. Khan, M. Shah, Consistent labeling of tracked
objects in multiple cameras with overlapping fields of
view, IEEE Trans. on PAMI 25 (10) (2003) 1355–
1360
[52]J. Kang, I. Cohen, G. Medioni, Continuous
tracking within and across camera streams, in: Proc.
of IEEE Int’l Conference on Computer Vision and
Pattern Recognition, Vol. 1, 2003, pp. I–267 – I–272.
[53]S. Dockstader, A. Tekalp, Multiple camera
tracking of interacting and occluded human motion,
Proc. of the IEEE 89 (10) (2001) 1441–1455.
Consistent labeling: overview
[52] J. Kang, I. Cohen, G. Medioni, Continuous tracking within and across
camera streams, in: Proc. of IEEE CVPR 2003
[54] C. Stauffer, K. Tieu, Automated multi-camera planar tracking
correspondence modeling, in: Proc. of IEEE CVPR 2003
[47] J. Orwell, P. Remagnino, G. Jones, Multi-camera colour tracking, in:
Proc. Of (VS’99),
[55] S. Chang, T.-H. Gong, Tracking multiple people with a multi-camera
system, in: Proc. of IEEE Workshop on Multi-Object Tracking, 2001
[56] Z. Yue, S. Zhou, R. Chellappa, Robust two-camera tracking using
homography, in: Proc. of IEEE ACSSP 2004
[51] S. Khan, M. Shah, Consistent labeling of tracked objects in multiple
cameras with overlapping fields of view, IEEE Trans. on PAMI 25
(10) (2003)
[53] S. Dockstader, A. Tekalp, Multiple camera tracking of interacting and
occluded human motion, Proc. of the IEEE 89 (10) (2001)
[57] H. Tsutsui, J. Miura, Y. Shirai, Optical flow-based person tracking by
multiple cameras, in: Proc. 2001 Int. Conf. on Multisensor Fusion
[58] J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, S. Shafer, Multi-
camera multiperson tracking for easyliving, in: Proc. of IEEE
IntlWorkshop VS00
[59] Q. Zhou, J. Aggarwal, Object tracking in an outdoor environment
using fusion of features and cameras, Image and Vision
Computing 24 (11) (2006)
[60] J. Black, T. Ellis, Multi camera image tracking, Image and Vision
Computing 24 (11) (2006)
[48]] K. Nummiaro, E. Koller-Meier, T. Svoboda, D. Roth, L. Van Gool,
Color-based object tracking in multi-camera environments, in:
DAGM03, 2003,
[61] M. H. Tan, S. Ranganath, Multi-camera people tracking using bayesian
networks, FPCRM 2003
[62] A. Mittal, L. Davis, Unified multi-camera detection and tracking using
region matching, in: Proc. of IEEE Workshop on Multi-Object
Tracking, 2001,
[50] A. Mittal, L. Davis, M2tracker: A multi-view approach to segmenting
and tracking people in a cluttered scene, IJCV 51 (3) (2003)
From PAMI 2008
The solution at Imagelab
HECOL (Homography and
Epipolar-based
COnsistent Labeling)
Ground-plane homography and
epipolar geometry automatically
computed from training videos
Person‟s main axis warped to the
other view and Bayesian inference is
used for validate hypotheses
[63]S. Calderara, A. Prati, R. Cucchiara,
―Bayesian-competitive Consistent
Labeling for People Surveillance― on
IEEE Trans on PAMI, feb. 2008
[64]S. Calderara, A. Prati, R. Cucchiara,
―HECOL: Homography and Epipolar-
based Consistent Labeling for
Outdoor Park Surveillance"
Computer Vision and Image
Understanding, 2008
Video surveillance at ImageLab
HeadTracking
Face selection
Face obscurationFace Recognition & People Identification
PTZ Control
High ResolutionDetection
Segmentation
Tracking
F
Fixed camera
Segmentation
Tracking
F
Fixed camera
Segmentation
F
Fixed cameraSakbot
People Annotation
Trajectory
Analysis
Bheavior recognition
Action
analysis
Posture
analysis
P
PTZ camera
ROI and
model-
based
Tracking
M
Moving and mobile
camera
Mosaicing
Segmentation
& Tracking
S
Sensors
Sensor
data
acquisitionTrackingAd-hoc
Consistent Labeling and Multicamera trackingGeometry
Recovery
Homography &
Epipoles
Hecol
Video
surveillance
Ontology
VISOR
WEBMPEG
StreamingAnnotated
video storage
Security control
centers
Mobile surveillance
platforms
Moses
Hecol on-line off line processes
PRMA, Barcelona 2006
1) off-line process for homography and
epipole detection and construction of a
Camera Transition Graph
2) detection and tracking in each single camera system and
3) consistent labeling at each new track detection
Homography
One to One
One point in the 3D space is projected
in a single point in the image plane
Thus, known the position of the point
in the space and the camera parameters
I can know if the point can fall in the
image plane
One to many
Instead, a point in the image plane can
correspond to many points: to all the
points of the line passing through the
optical center and the source point
O
A‟( )x ,yAA
A (X ,Y ,Z )A A A
I
x
y
X
Y
Z
C
O
A‟( )x ,yAA
A (X ,Y ,Z )A A A
I
x
y
X
Y
Z
C
Horizontal homography
But if I have at least one
constraint (e.g. the point is
at z=0) from a point P
in the image plane i can
know its position in the
real space
O
I
x
y C
(pavimento; Y=0)
P(X ,Y ,Z )P P P=0
O
I
x
y C
(parete; Z=c)
P(X ,Y ,Z =c)P P P
Horizontal homography
Rectification
Hypothesis to have all the points in the ground plane
Homographic transformation matrix
Homography is a linear non singular tranformation of the projective plane in itself
Given a plane (e.g. ground plane) and given two bidimensional projections ( two image planes)
the homographic transformation links the coordinates of the points in the two reference systems
H is defined at less of a scale factor and thus with 8 degrees of freedom
It can be defined with 4 points of correspondence ( 4 points 8 variables)
1 1
3 3
2 2
1 2 3
3 3
4 5 6
7 8
'
'
1 1
1
x
x
x x
x H x
h h h
H h h h
h h
Homography computation
The homography is a planar projective
transformation that relates point coordinates
lying on the planes used to construct the
warping matrix.
In projective coordinates a planar homography
can be described by a 3x3 matrix and 8
independent parametrs.
Selections of 4 far not aligned points:
Manual selection in initial calibration
step
Automatic selection with human probe
[65]Lihi Zelnik-Manor and Michal Irani
Multiview Constraints on
Homographies Trans on PAMI Feb
2002
Off-line stage
Off-line process automatically computes
ground-plane homography
with E2oFoV and epipolar constraints
Define the Entry Edge Field of Views
[51]S. Khan and M. Shah. Consistent
labeling of tracked objects in multiple
cameras with overlapping fields of
view. IEEE Trans. on PAMI,
25(10):1355–1360, October 2003.
[66] S.Calderara, R. Vezzani, A. Prati, R.
Cucchiara, "Entry Edge of Field of View for
multi-camera tracking in distributed video
surveillance" in IEEE International
Conference on AVSS 2005, Como, Italy,
pp. 93-98, 2005
38
39
Automatic Homography computation 1/2
Automatic learning phase to compute
overlapping zones and ground-plane
homography.
Take many correspondences among
ground plane support points ( with a
tracking algorithm for a single
people)
Define the Entry Edge Field of Views
E2oFoV using Least Square
Optimization
Define the overlapping zones and the
extremes points
Compute the homography from
points correspondences
E2oFoV EoFoV
Examples
From (http://www-
sop.inria.fr/orion/ETISEO/)
[67]G. Kayumbi and A. Cavallaro
Multiview TrajectoryMapping Using
Homography with
Lens Distortion Correction
EURASIP Journal on Image and
Video Processing2008
Engineering Campus
of University Of Modena
Homography and data association
If the homography is correct, the planar constaint is verified and in
absence of noise data association or consistent labeling can be done
at point level ,
eg,. support points
Homography and data association (cont)
In real context noise, errors in homography, lack of planar
constraints introduce uncertainly in the position
From [26]A. C. Sankaranarayanan, A.Veeraraghavan, and
R.Chellappa, Object Detection, Tracking and Recognition for Multiple
Smart Camera Proceedings of the IEEE | Vol. 96, No. 10, October
2008
Epipolar geometry recovery
Pure ground-plane homography-based matching is not reliable in case of
segmentation errors and groups of people we need another 3D information.
Using only homography we can detect only the presence in the planar world.
For recovering the epipolar geometry we must obtain point correspondences not on
the ground plane.
Thus we take the heads! ( upper points of the blob)
Epipole computation is performed with RANSAC to improve numerical stability of
the solution.
43
[68] Q. Luong, O. Faugeras, The
fundamental matrix: theory,
algorithms
and stability analysis, Int. J.
Comput. Vis. 17 (1996) 43–75.
EPIPLOAR GEOMETRY
Exploiting the parallax property of perspective planes given a plane
to plane homography 2 points correspondences are sufficient to
compute epipole location.
Given a point up and its projections in the two cameras C1 and C2
the epipolar line can be computed using the homography matrix H
and point projections.
Where l is the epipolar line in the image plane of C1 passing
through up projections
H is the homography matrix from C2 to C1 ground planes
Indicates the line passing through points a and b
)(, 2121,1 upHeupHupl up
ba,
The intersections of two lines univocally identify the epipole
EPIPLOAR GEOMETRY ALGORITHM
Point correspondences can be affected by errors if extracted
automatically
RANSAC is used to iteretively choose the two lines that give the best
epipole locations
ALGORITHM:1. during the previous E2oFoV training phase, correspondences between object’s head
projections up in both cameras are sampled frame by frame.
2. after choosing a camera, C , we randomly compute two epipolar lines taken from the
sample set of N frames using epipolar line equation and detect the epipole location.
3. we evaluate the consensum of remaining samples counting how many samples are
close enough to the epipolar lines computed using estimated epipole
4. iterate from 2 until the consensum is above a fixed threshold.
From [64]
CTG: modeling the camera overlaps
A Camera Transition Graph is built to speed up the
serach process:
The Graph incorporates the cameras topology
learnt during off-line geometry learning stage
Each node consists of one camera m arcs
represent the overlapping constraint among
cameras FoVs
Variables (tracks) at each node must satisfy the
unary constraint of having different labels
When a new object appears (variable added at a
node) the binary constraint that two istances of
the same object on different nodes must have the
same label is verified (Costraint Satisfaction
Problem)
When a new track appears on one camera the search for
possible matching tracks could be time consuming
See also state transition graphs
[69]Atsushi NAKAZAWA, Hirokazu
KATO and Seiji INOKUCHI
Human Tracking Using Distributed
Vision Systems ICPR1998
Example
48
On-line Stage: Consistency resolver
HECOL defines a Bayesian-competitive approach with warping of vertical axis and a two-contributions check
Each hypothesis in two overlapped cameras is accounted both as single and as group
Consistency resolver for each
„„detection event”.
•a camera handoff,
•The entrance of a completely new object,
•the splitting of a group into single people,
•or a segmentation error
an appearance-based
tracking is needed
49
Bayesian-competitive framework
At each detection events the space l of hypotheses is generated ( all the
possible combinations of single persons, pairs, or groups of people tracked
in another camera and acceptable within CTG constraints).
For each hypothesis, the prior and the likelihood are computed in order to
obtain intra-camera and inter-camera probabilities.
Given a new track ;
Each hypothesis in two overlapped cameras is accounted both
as single and as group;
Inter-camera MAP:
prior
50
Prior computation
The prior is computed on existing objects
of N3 warped in the camera space of N1
A hypothesis consisting of a single object will gain higher prior if the warped l.s.p. is far enough from the other objects‟ support points.
A hypothesis consisting of two or more objects (i.e., a possible group) will gain higher prior if the objects composing it are close to each other after the warping, and, at the same time, the whole group is far from other objects.
51
Prior computation
Priors must account for different probability in
case of multiple hypotheses:
52
Likelihood
Likelihood is computed as the max of two contributions (forward - backward):
a2j
a1j
homography
homography
epipolar geometry
<a1j ,a2
j>
Fitness measure
MAP label assignment
FG
Group detection
In this case the group #110, #111 is enetring in the FOV of C2.
the prior of group is high since they are close each other
The likelihood for the hypothesis of a single object ( either #100 or
#111) is lower than the likelihood of a group
thus the new object in C2 is labeled with #110o
Example
55
Experimental results (1)
The system has been tested in a setup at our campus