-
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER
2004 1459
Statistical Modeling of Complex Backgroundsfor Foreground Object
Detection
Liyuan Li, Member, IEEE, Weimin Huang, Member, IEEE, Irene
Yu-Hua Gu, Senior Member, IEEE, andQi Tian, Senior Member, IEEE
AbstractThis paper addresses the problem of backgroundmodeling
for foreground object detection in complex environ-ments. A
Bayesian framework that incorporates spectral, spatial,and temporal
features to characterize the background appearanceis proposed.
Under this framework, the background is representedby the most
significant and frequent features, i.e., the principalfeatures, at
each pixel. A Bayes decision rule is derived for back-ground and
foreground classification based on the statistics ofprincipal
features. Principal feature representation for both thestatic and
dynamic background pixels is investigated. A novellearning method
is proposed to adapt to both gradual and suddenonce-off background
changes. The convergence of the learningprocess is analyzed and a
formula to select a proper learning rateis derived. Under the
proposed framework, a novel algorithm fordetecting foreground
objects from complex environments is thenestablished. It consists
of change detection, change classification,foreground segmentation,
and background maintenance. Experi-ments were conducted on image
sequences containing targets ofinterest in a variety of
environments, e.g., offices, public buildings,subway stations,
campuses, parking lots, airports, and sidewalks.Good results of
foreground detection were obtained. Quantitativeevaluation and
comparison with the existing method show that theproposed method
provides much improved results.
Index TermsBackground maintenance, background mod-eling,
background subtraction, Bayes decision theory, complexbackground,
feature extraction, motion analysis, object detection,principal
features, video surveillance.
I. INTRODUCTION
I N COMPUTER vision applications, such as video surveil-lance,
human motion analysis, human-machine interaction,and object based
video encoding (e.g., MPEG4), objects of in-terest are often the
moving foreground objects in an image se-quence. One effective way
of foreground object extraction isto suppress the background points
in the image frames [1][6].To achieve this, an accurate and
adaptive background model isoften desirable.
Background usually contains nonliving objects that remainpassive
in the scene. The background objects can be stationaryobjects, such
as walls, doors and room furniture, or nonsta-tionary objects such
as wavering bushes or moving escalators.
Manuscript received June 19, 2003; revised January 29, 2004. The
associateeditor coordinating the review of this manuscript and
approving it for publica-tion was Dr. Luca Lucchese.
L. Li, W. Huang, and Q. Tian are with Institute for Infocomm
Research ,Singapore, 119613 (e-mail: [email protected];
[email protected];[email protected]).
I. Y.-H. Gu is with the Department of Signals and Systems,
ChalmersUniversity of Technology, SE-412 96 Gteborg, Sweden
(e-mail: [email protected]).
Digital Object Identifier 10.1109/TIP.2004.836169
The appearance of background objects often undergoes
variouschanges over time, e.g., the changes in brightness caused
bychanging weather conditions or the switching on/off of lights.The
background image can be described as consisting of staticand
dynamic pixels. The static pixels belong to the stationary
ob-jects, and the dynamic pixels are associated with
nonstationaryobjects. The static background part can be converted
to a dy-namic one as time advances, e.g., by turning on a
computerscreen. A dynamic background pixel can also turn to a
staticone, such as a pixel in the bush when the wind stops. To
de-scribe a general background scene, a background model mustbe
able to
1) represent the appearance of a static background pixel;2)
represent the appearance of a dynamic background pixel;3)
self-evolve to gradual background changes;4) self-evolve to sudden
once-off background changes.
For background modeling without specific domain knowledge,the
background is usually represented by image features ateach pixel.
The features extracted from an image sequence canbe classified into
three types: spectral, spatial, and temporalfeatures. Spectral
features could be associated with gray-scaleor color information,
spatial features could be associated withgradient or local
structure, and temporal features could beassociated with interframe
changes at the pixel. Many existingmethods utilize spectral
features (distributions of intensities orcolors at each pixel) to
model the background [4], [5], [7][9].To be robust to illumination
changes, some spatial features arealso exploited [2], [10], [11].
The spectral and spatial featuresare suitable to describe the
appearance of static backgroundpixels. Recently, a few methods have
introduced temporalfeatures to describe the dynamic background
pixels associatedwith nonstationary objects [6], [12], [13]. There
is, however,a lack of systematic approaches to incorporate all
three typesof features into a representation of a complex
backgroundcontaining both stationary and nonstationary objects.
The features that characterize stationary and dynamic
back-ground objects should be different. If a background model
candescribe a general background, it should be able to learn
thesignificant features of the background at each pixel and
providethe information for foreground and background
classification.Motivated by this, a Bayesian framework which
incorporatesmultiple types of features for modeling complex
backgroundsis proposed in this paper. The major novelties of the
proposedmethod are as follows.
1) A Bayesian framework is proposed for incorporatingspectral,
spatial, and temporal features in the back-ground modeling.
1057-7149/04$20.00 2004 IEEE
-
1460 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11,
NOVEMBER 2004
2) A new formula of Bayes decision rule is derived for
back-ground and foreground classification.
3) The background is represented using statistics of prin-cipal
features associated with stationary and nonsta-tionary background
objects.
4) A novel method is proposed for learning and
updatingbackground features to both gradual and once-off
back-ground changes.
5) The convergence of the learning process is analyzed anda
formula is derived to select a proper learning rate.
6) A new real-time algorithm is developed for foregroundobject
detection from complex environments.
Further, a wide range of tests is conducted on a variety
ofenvironments, including offices, campuses, parks,
commercialbuildings, hotels, subway stations, airports, and
sidewalks.
The remaining part of the paper is organized as follows.After a
brief literature review of existing work in Section I-A,Section II
describes the statistical modeling of complex back-ground based on
principal features. First, a new formula ofBayes decision rule for
background and foreground classi-fication is derived. Based on this
formula, an effective datastructure to record the statistics of
principal features is estab-lished. Principal feature
representation for different backgroundobjects is addressed. In
Section III, the method for learningand updating the statistics of
principal features is described.Strategies to adapt to both gradual
and sudden once-offbackground changes are proposed. Properties of
the learningprocess are analyzed. In Section IV, an algorithm for
foregroundobject detection based on the statistical background
modelingis described. It contains four steps: change detection,
changeclassification, foreground segmentation, and background
main-tenance. Section V presents the experimental results on
variousenvironments. Evaluations and comparisons with an
existingmethod are also included. Finally, conclusions are given
inSection VI.
A. Related WorkA simple and direct way to describe the
background at each
pixel is to use the spectral information, i.e., the gray-scaleor
color of the background pixel. Early studies describebackground
features using an average of gray-scale or colorintensities at each
pixel. Infinite impulse response (IIR) orKalman filters [7], [14],
[15] are employed to update slowand gradual changes in the
background. These methods areapplicable to backgrounds consisting
of stationary objects. Totolerate the background variations caused
by imaging noise,illumination changes, and the motion of
nonstationary objects,the statistical models are used to represent
the spectral featuresat each background pixel. The frequently used
models includegaussian [8], [16][22] and mixture of gaussians (MoG)
[4],[23][25]. In these models, one or a few gaussians are used
torepresent the color distributions at each background pixel.
Amixture of Gaussian distributions can represent various
back-ground appearances, e.g., road surfaces under the sun or in
theshadows [23]. The parameters (mean, variance, and weight)
foreach gaussian are updated using an IIR filter to adapt to
gradualbackground changes. Moreover, by replacing an old
gaussianwith a newly learned color distribution, MoG can adapt
to
once-off background changes. In [9], a nonparametric modelis
proposed for background modeling, where a kernel-basedfunction is
employed to represent the color distribution of eachbackground
pixel. The kernel-based distribution is a general-ization of MoG
which does not require parameter estimation.The computation is high
for this method. A variant model isused in [5], where the
distribution of temporal variationsin color at each pixel is used
to model the spectral feature ofthe background. MoG performs better
in a time-varying envi-ronment where the background is not
completely stationary.But, the method can lead to misclassification
of foreground ifthe background scenes are complex [19], [26]. For
example,if the background contains a nonstationary object with
signif-icant motion, the colors of pixels in that region may
changewidely over time. Foreground objects with similar colors
(thecamouflage foreground objects) could easily be misclassifiedas
background.
The spatial information has recently been exploited to im-prove
the accuracy of background representation. The localstatistics of
the spectral features [27], [28], local texture features[2], [3],
or global structure information [29] are found helpfulfor accurate
foreground extraction. These methods are mostsuitable to stationary
background. Paragios and Ramesh [10]use a mixture model (gaussians
or laplacians) to represent thedistributions of background
differences for static backgroundpoints. A Markov random field
(MRF) model is developed toincorporate the spatio-spectral
coherence for robust foregroundsegmentation. In [11], gradient
distributions are introduced toMoG to reduce the misclassification
purely depending on colordistributions. Spatial information helps
to detect camouflageforeground objects and suppress shadows. The
spatial featuresare however not applicable to nonstationary
background objectsat pixel level since the corresponding spatial
features vary overtime.
A few more attempts to segment foreground objects
fromnonstationary background have been made by using
temporalfeatures. One way is to estimate the consistency of optical
flowover a short duration of time [13], [30]. The dynamic
featuresof nonstationary background objects are represented by the
sig-nificant variation of accumulated local optical flows. In
[12],Li et al. propose a method to employ the statistics of
colorco-occurrence between two consecutive frames to model
thedynamic features associated with a nonstationary
backgroundobject. Temporal features are suitable to model the
appearanceof nonstationary objects. In Wallflower [6], Toyama et
al. usea linear Wiener filter, a self-regression model, to
represent in-tensity changes for each background pixel. The linear
predictorcould learn and estimate the intensity variations of a
backgroundpixel. It works well for periodical changes. The linear
regressionmodel is difficult to predict shadows and background
changeswith varying frequency in natural scene. A brief summary of
theexisting methods based on the types of used features is listed
inTable I. Further, most existing methods perform the backgroundand
foreground classification with one or more heuristic thresh-olds.
For backgrounds with different complexities, the thresh-olds should
be adjusted empirically. In addition, these methodsare often tested
only on a few background environments (e.g.,laboratories, campuses,
etc.).
-
LI et al.: STATISTICAL MODELING OF COMPLEX BACKGROUNDS 1461
TABLE ICLASSIFICATION OF PREVIOUS METHODS AND THE PROPOSED
METHOD
II. STATISTICAL MODELING OF THE BACKGROUNDA. Bayes
Classification of Background and Foreground
For arbitrary background and foreground objects or regions,the
classification of the background and the foreground can
beformulated under Bayes decision theory.
Let be the position of an image pixel, be theinput image at time
, and be a -dimensional feature vectorextracted from the position
at time from the image sequence.Then, the posterior probability of
the feature vector from thebackground at can be computed by using
the Bayes rule
(1)
where indicates the background. is the probabilityof the feature
vector being observed as a background at ,
is the prior probability of the pixel belonging to
thebackground, and is the prior probability of the featurevector
being observed at the position . Similarly, the
posteriorprobability that the feature vector comes from a
foregroundobject at is
(2)
where denotes the foreground. Using the Bayes decision rule,a
pixel is classified as belonging to the background accordingto its
feature vector observed at time if
(3)Otherwise, it is classified as belonging to the foreground.
Notethat a feature vector observed at an image pixel comes
fromeither background or foreground objects, it follows:
(4)Substituting (1) and (4) into (3), it follows that the Bayes
deci-sion rule (3) becomes
(5)
Using (5), the pixel with observed feature vector at time canbe
classified as a background or a foreground point, providedthat the
prior and conditional probabilities , , and
are known in advance.
B. Principal Feature Representation of BackgroundTo apply (5)
for classification of background and foreground,
the probability functions , , and should beknown in advance, or
can be properly estimated. For complexbackgrounds, the forms of
these probability functions are un-known. One way to estimate these
probability functions is to usethe histogram of features. The
problem that would be encoun-tered is the high cost for storage and
computation. Assumingis a -dimension vector and each of its element
is quantized to
values, the histogram would contain cells. For example,assuming
the resolution of color has 256 levels, the histogramwould contain
256 cells. The method would be unrealistic interms of computational
and memory requirements.
It is reasonable to assume that if the selected features
repre-sent the background effectively, the intraclass spread of
back-ground features should be small, which implies that the
distri-bution of background features will be highly concentrated in
asmall region in the histogram. Further, features from
variousforeground objects would spread widely in the feature
space.Therefore, there would be less overlap between the
distribu-tions of background and foreground features. This implies
that,with a proper selection and quantization of features, it
wouldbe possible to approximately describe the background by
usingonly a small number of feature vectors. A concise data
structureto implement such representation of background is created
asfollows.
Let be the quantized feature vectors sorted in descendingorder
with respect to for each pixel . Then, for aproper selection of
features, there would be a small integer
, a high percentage value , and a low percentage value(e.g., and
) such
that the background could be well approximated by
and (6)
The value of and the existence of and dependon the selection and
quantization of the feature vectors. The
feature vectors are defined as the principal features of
thebackground at the pixel .
To learn and update the prior and conditional probabilities
forthe principal feature vectors, a table of statistics for the
possibleprincipal features is established for each feature type at
. Thetable is denoted as
(7)
where is the learned based on the observation ofthe features and
records the statistics of the most
-
1462 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11,
NOVEMBER 2004
Fig. 1. One example of learned principal features for a static
background pixel in a busy scene. The left image shows the position
of the selected pixel. The tworight images are the histograms of
the statistics for the most significant colors and gradients, where
the height of a bar is the value of , the light gray part is
, and the top dark gray part is . The icons below the histograms
are the corresponding color and gradient features.
frequent feature vectors at pixel . Eachcontains three
components
(8)
where is the dimension of the feature vector . Thein the table
are sorted in descending order with respectto the value . The first
elements from the table ,together with , are used in (5) for
background and fore-ground classification.
C. Feature SelectionThe next essential issue for principal
feature representation
is feature selection. The significant features of different
back-ground objects are different. To achieve effective and
accuraterepresentation of background pixels with principal
features,the employment of proper types of features is important.
Threetypes of features, the spectral, spatial, and temporal
features,are used for background modeling.
1) Features for Static Background Pixels: For a pixel be-longing
to a stationary background object, the stable and mostsignificant
features are its color and local structure (gradient).Hence, two
tables are used to learn the principal features. Theyare and with
and rep-resenting the color and gradient vectors, respectively.
Since thegradient is less sensitive to illumination changes, the
two typesof feature vectors can be integrated under the Bayes
frameworkas the following.
Let and assume that the and are indepen-dent, the Bayes decision
rule (5) becomes
(9)For the features from static background pixels, the
quantizationmeasure should be less sensitive to illumination
changes. Here,a normalized distance measure based on the inner
product oftwo vectors is employed for both color and gradient
vectors. Thedistance measure is
(10)
where can be or , respectively. If is less thana small value ,
and are matched to each other. The ro-bustness of the distance
measure (10) to illumination changesand imaging noise is shown in
[2]. The color vector is directlyobtained from the input images
with 256 resolution levels foreach component, while the gradient
vector is obtained by ap-plying Sobel operator to the corresponding
gray-scale input im-ages with 256 resolution levels. With , isfound
accurate enough to learn the principal features for
staticbackground pixels. An example of principal feature
representa-tion for static background pixel is shown in Fig. 1,
where thehistograms for the most significant color and gradient
featuresin and are displayed. The histogram of the colorfeatures
shows that only the first two are the principal colors forthe
background, and the histogram of the gradients shows thatthe first
six, excluding the fourth, are the principal gradients forthe
background.
2) Features for Dynamic Background Pixels: For dynamicbackground
pixels associated with nonstationary objects, colorco-occurrences
are used as their dynamic features. This is be-cause the color
co-occurrence between consecutive frames hasbeen found to be
suitable to describe the dynamic features asso-ciated with
nonstationary background objects, such as movingtree branches or a
flickering screen [12]. Giving an interframechange from the color
to
at the time instant and the pixel ,the feature vector of color
co-occurrence is defined as
. Similarly, a table ofstatistics for color co-occurrence is
maintained at eachpixel. Let be the inputcolor image; the color
co-occurrence vector is generated byquantizing color components to
low resolution. For example,by quantizing the color resolution to
32 levels for each com-ponent and selecting , one may obtain a
goodprincipal feature representation for dynamic background
pixel.An example of the principal feature representation with
colorco-occurrence for a flickering screen is shown in Fig. 2.
Com-pared with the quantized color co-occurrence feature space
of
cells, implies that with a very small number offeature vectors,
the principal features are capable of modelingthe dynamic
background pixels.
-
LI et al.: STATISTICAL MODELING OF COMPLEX BACKGROUNDS 1463
Fig. 2. One example of learned principal features for dynamic
background pixels. The left image shows the position of the
selected pixel. The right image is thehistogram of the statistics
for the most significant color co-occurrences in , where the height
of a bar is the value of , the light gray part is , andthe top dark
gray part is . The icons below the histogram are the corresponding
color co-occurrence features. In the screen, the color changes
amongwhite, dark blue, and light blue periodically.
III. LEARNING AND UPDATING THE STATISTICSFOR PRINCIPAL
FEATURES
Since the background might undergo both gradual andonce-off
changes, two strategies to learn and update thestatistics for
principal features are proposed. The convergenceof the learning
process is analyzed and a formula to select aproper learning rate
is derived.
A. For Gradual Background ChangesAt each time instant, if the
pixel is identified as a static
point, the features of color and gradient are used for
fore-ground and background classification. Otherwise, the feature
ofcolor co-occurrence is used. Let us assume that the featurevector
is used to classify the pixel at time based on theprincipal
features learned previously. Then the statistics of
thecorresponding feature vectors in the table ( and ,or ) is
gradually updated at each time instant by
(11)where the learning rate is a small positive number and
. In (11), means that is classified as abackground point at time
in the final segmentation, otherwise,
. Similarly, means that the th vector of thetable matches the
input feature vector , and otherwise
.
The above updating operation states the following. If the
pixelis labeled as a background point at time , is slightly
increased from due to . Further, the probabilitiesfor the
matched feature vector are also increased due to .However, if ,
then the statistics for the un-matchedfeature vectors are slightly
decreased. If there is no match be-tween the feature vector and the
vectors in the table , the
th vector in the table is replaced by a new feature vector
(12)If the pixel is labeled as a foreground point at time ,and
are slightly decreased with . However, thematched vector in the
table is slightly increased.
The updated elements in the table are resorted in a de-scending
order with respect to , such that the table may keep
the most frequent and significant feature vectors observedat
pixel .
B. For Once-Off Background ChangesAccording to (4), the
statistics of the principal features satisfy
(13)These probabilities are learned gradually with operations
de-scribed by (11) and (12) at each pixel . When a
once-offbackground change has happened, the new background
appear-ance soon becomes dominant after the change. With the
replace-ment operation (12), the gradual accumulation operation
(11)and resorting at each time step, the learned new features will
begradually moved to the first few positions in . After sometime
duration, the term on the left hand of (13) becomes large( 1) and
the first term on the right hand of (13) becomes verysmall since
the new background features are classified as fore-ground. From (6)
and (13), new background appearance at canbe found if
(14)In (14), denotes the previous background before the
once-offchange and denotes the new background appearance after
theonce-off change. The factor prevents errors caused bya small
number of foreground features. Using the notation in (7)and (8),
the condition (14) becomes
(15)
Once the above condition is satisfied, the statistics for the
fore-ground should be tuned to be the new background
appearance.According to (4), the once-off learning operation is
performedas follows:
(16)
for .
-
1464 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11,
NOVEMBER 2004
C. Convergence of the Learning ProcessIf the time-evolving
principal feature representation has suc-
cessfully approximated the background, thenshould be satisfied.
Hence, it is desirable that willconverge to 1 with the evolution of
the learning process. Weshall show in the following that the
learning operation (11) in-deed meets such a condition.
Suppose at time , and the th vector in thetable matches the
input feature vector which has beendetected as background in the
final segmentation at time . Then,according to (11), we have
(17)which implies the sum of the conditional probabilities of
theprincipal features being background will remain equal or closeto
1 during the evolution of the learning process.
Let us suppose at time due to some reasonssuch as the
disturbance from foreground objects or the operationof once-off
learning, and the from the first vectorsin matches the input
feature vector , then we have
(18)
If the pixel is detected as a background point at time , it
leadsto
(19)
If , then . In thiscase, the sum of the conditional
probabilities of the principalfeatures being background increases
slightly. On the other hand,If , there will be ,and the sum of the
conditional probabilities of the principalfeatures being background
decreases slightly. From these twocases, it can be concluded that
the sum of the conditional prob-abilities of the principal features
being background converges to1 as long as the background features
are observed frequently.
D. Selection of the Learning RateIn general, for an IIR
filtering-based learning process, there
is a tradeoff in the selection of the learning rate . To makethe
learning process adapt to the gradual background changessmoothly
and not to be perturbed by noise and foreground ob-jects, a small
value should be selected for . On the other hand,if is too small,
the system becomes too slow to respond to theonce-off background
changes. Previous methods select it em-pirically [4], [5], [8],
[14]. Here, a formula is derived to select
according to the required time for the system to respond
toonce-off background changes.
An ideal once-off background change at time can beassumed to be
a step function. Suppose the features beforefall into the first
vectors in the table ,
and the features after fall into the next elements of. Then, the
statistics at time can be described
as
(20)
Since the new background appearance at pixel after timeis
classified as foreground before the once-off updating with(16), ,
and decrease exponentially,whereas increases exponentially and will
beshifted to the first positions in the updated tablewith sorting
at each time step. Once the condition of (15) ismet at time , the
new background state is learned. To makethe expression simpler, let
us assume that there is no resortingoperation. Then the condition
(15) becomes
(21)
From (11) and (20), it follows that at time , the
followingconditions hold:
(22)
(23)
(24)
By substituting (22)(24) to (21) and rearranging terms, one
canobtain
(25)
where is the number of frames required to learn the new
back-ground appearance. Equation (25) implies that if one wishesthe
system to learn the new background state in no later thanframes,
one should choose , such that (25) is satisfied. For ex-ample, if
the system is to respond to an once-off backgroundchange in 20 s
with the frame rate being 20 fps and ,
should be satisfied.
IV. FOREGROUND OBJECT DETECTION: THE ALGORITHMWith the Bayesain
formulation of background and foreground
classification, as well as the background representation
withprincipal features, an algorithm for foreground object
detectionfrom complex environments is developed. It consists of
fourparts: change detection, change classification, foreground
objectsegmentation, and background maintenance. The block diagramof
the algorithm is shown in Fig. 3. The white blocks from left
toright correspond to the first three steps, and the blocks with
grayshades correspond to background maintenance. In the first
step,unchanged background pixels in the current frame are
filtered
-
LI et al.: STATISTICAL MODELING OF COMPLEX BACKGROUNDS 1465
Fig. 3. Block diagram of the proposed method.
out by using simple background and temporal differencing.
Thedetected changes are separated into static and dynamic
pointsaccording to interframe changes. In the second step, the
detectedstatic and dynamic change points are further classified as
back-ground or foreground using the Bayes rule and the statistics
ofprincipal features for background. Static points are
classifiedbased on the statistics of principal colors and
gradients, whereasdynamic points are classified based on those of
principal colorco-occurrences. In the third step, foreground
objects are seg-mented by combining the classification results from
both staticand dynamic points. In the fourth step, background
models areupdated. It includes updating the statistics of principal
featuresfor background as well as a reference background image.
Briefdescriptions of the steps are presented in the following.
A. Change Detection
In this step, simple adaptive image differencing is used
tofilter out nonchange background pixels. The minor variationsof
colors caused by imaging noise are filtered out to save
thecomputation for further processing.
Let be the input image andbe the reference background image
maintained at
time with denoting a color component. Thebackground difference
is obtained as follows. First, image dif-ferencing and thresholding
for each color component are per-formed, where the threshold is
automatically generated usingthe least median of squares (LMedS)
method [31]. The back-ground difference is then obtained by fusing
the resultsfrom the three color components. Similarly, the temporal
(or in-terframe) difference between two consecutive frames
and is obtained. If both and, the pixel is classified as a
nonchange back-
ground point. In general, more than 50% of the pixels would
befiltered out in this step.
B. Change ClassificationIf is detected at a pixel , it is
classified as
a dynamic point, otherwise, it is classified as a static point.
Achange that occurs at a static point could be caused by
illumina-tion changes, once-off background changes, or a
temporarilymotionless foreground object. A change detected at a
dynamicpoint could be caused by a moving background or
foregroundobject. They are further classified as background or
foregroundby using the Bayes decision rule and the statistics of
the corre-sponding principal features.
Let be the input feature vector at and time . The proba-bilities
are estimated as
(26)
where is a feature vector set composed of those inwhich match
the input vector , i.e.
and(27)
If no principal feature vector in the table matches ,both and
are set as 0. Then, the change point isclassified as background or
foreground as the following.
Classification of Static Point: For a static point, the
probabil-ities for both color and gradient features are estimated
by (26)with and , respectively, where the vector distancemeasure in
(27) is calculated as (10). In this work, thestatistics of the two
type principal features ( and )are learned separately. In general
cases, there would be
. The Bayes decision rule (9) can be applied for back-ground and
foreground classification. In some complex cases,
-
1466 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11,
NOVEMBER 2004
one type of the features from the background might be
unstable.One example is the temporal static states of a wavering
watersurface. For these states, the gradient features are not
constant.Another example is video captured with an auto-gain
camera.The gain is often self-tuned due to the motion of objects
and thegradient features are more stable than the color features
for staticbackground pixels. To work stably in various conditions,
the fol-lowing method is adopted. Let and
. If (in our test), the color and gradient features are
coincident andboth features are used for classification using the
Bayes rule (9).Otherwise, only one type of the features with a
larger prior value
is used for classification using the Bayes rule
(5).Classification of Dynamic Point: For a dynamic point at
time
, the feature vector of color co-occurrence is generated.
Theprobabilities for are calculated as (26), where the
distancebetween two feature vectors in (27) is computed as
(28)
and is chosen. Finally, the Bayes rule (5) is applied
forbackground and foreground classification. As observed fromour
experiments, for the dynamic background points, only asmall
percentage of them are wrongly classified as foregroundchanges
[12]. Further, the remainders have become isolatedpoints, which can
easily be removed by a smoothing operation.
C. Foreground Object SegmentationA post processing is applied to
segment the remaining change
points into foreground regions. This is done by firstly
applyinga morphological operation (a pair of open and close) to
sup-press the residual errors. Then the foreground regions are
ex-tracted, holes are filled and small regions are removed.
Further,an AND operation is applied to the resulting segments in
con-secutive frames to remove the false foreground regions
detectedby temporal differencing [32].
D. Background MaintenanceWith the feedback from the above
segmentation, background
models are updated. First, the statistics of principal features
areupdated as described in Section IV. For the static points,
thetables and are updated. For the dynamic points,the table is
updated. Meanwhile, a reference backgroundimage is also maintained
to make the background differenceaccurate. Let be a background
point in the final segmentationresult at time . If it is identified
as an unchanged backgroundpoint in the change detection step, the
background referenceimage at is smoothly updated by
(29)
where and is a small positive number. If isclassified as
background in change classification step, the back-ground reference
image at is replaced by the new backgroundappearance
(30)
Fig. 4. Summary of the complete algorithm.
With (30), the reference background image can follow the
dy-namic background changes, e.g., the changes of color betweentree
branch and sky, as well as once-off background changes.
E. Memory Requirement and Computational TimeThe complete
algorithm is summarized in Fig. 4. The major
part of memory usage is to store the tables of the statistics( ,
and ) for each pixel. In our implementa-tion, the memory
requirement for each pixel is approximately1.78 KB. For a video
with image sized 160 120 pixels, therequired memory is
approximately 33.4 MB. While for imagesized 320 240 pixels,
133.5-MB memory is required. For astandard PC, this is still
feasible. With a 1.7-GHz Pentium CPUPC, real-time processing of
image sequences is achievable at arate of about 15 frames per
second (fps) for images sized 160
120 pixels and at a rate of 3 fps for images sized 320
240pixels.
-
LI et al.: STATISTICAL MODELING OF COMPLEX BACKGROUNDS 1467
Fig. 5. Experimental results on a meeting room environment (MR)
with wavering curtains in the winds. The two examples are the
results of the frame 1816 and2268.
Fig. 6. Experimental results on a lobby environment (LB) in an
office building with switching on/off lights. Upper row: a frame
before switching off some lights(364). Lower row: the frame 15 s
after switching off some lights (648).
V. EXPERIMENTAL RESULTS
The proposed method has been tested on a variety of indoorand
outdoor environments, including offices, campuses, parkinglots,
shopping malls, restaurants, airports, subway stations, side-walks,
and other private and public sites. It has also been testedon image
sequences captured in various weather conditions, in-cluding sunny,
cloudy, and rainy weather, as well as night andcrowd scenes. In all
the tests, the proposed method was automat-ically initialized
(bootstrap) from blinking background (i.e.,
, , and for and). The system gradually learned the most
signifi-
cant features for both stationary and nonstationary
backgroundobjects. Once the once-off updating is performed, the
systemis able to separate the foreground from the background
well.
MoG [4] is a widely-used adaptive background subtractionmethod.
It performs quite well for both stationary and nonsta-tionary
backgrounds among the existing methods [6]. The pro-posed method
has also been compared with MoG in the experi-ments. The same
learning rate was used for both the proposedmethod and MoG in each
test.1 Further, for a fair comparison,the post processing used in
the proposed method was appliedfor the MoG method as well.
1The similar analysis of the learning process and dynamic
performance forMoG can be made as in Section III-C and III-D.
The visual examples and quantitative evaluations of the
ex-periments are described in the following two subsections,
re-spectively.
A. Examples on Various EnvironmentsSelected results on five
typical indoor and outdoor environ-
ments are displayed in this section. The typical environments
areoffices, campuses, shopping malls, subway stations, and
side-walks. In the figures of this subsection, pictures are
arranged inrows. In each row, the images from left to right are the
inputframe, the background reference image maintained by the
pro-posed method at the moment , the manually gen-erated ground
truth, the results of the proposed method andMoG.
1) Office Environments: Office environments include of-fices,
laboratories, meeting rooms, corridors, lobbies, andentrances. An
office environment is usually composed ofstationary background
objects. The difficulties for foregrounddetection in these scenes
can be caused by shadows, changesof illumination conditions, and
camouflage foreground objects(i.e., the color of the foreground
object is similar to that of thecovered background). In some cases,
background may consistof dynamic objects, such as waving curtains,
running fans,and flickering screens. Examples from two test
sequences areshown in Figs. 5 and 6, respectively.
-
1468 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11,
NOVEMBER 2004
Fig. 7. Experimental results on a campus environment (CAM)
containing wavering tree branches in strong winds. They are frame
1019, 1337, and 1393.
The first sequence (MR) was captured by an auto-gaincamera in a
meeting room. The background curtain wasmoving in winds. The first
example in the upper row came froma scenario containing significant
motion of the curtain, as wellas background changes caused by
automatic gain adjustment.In the next example, the person wore
clothes of bright colors,which are similar to the color of the
curtain. In both cases, theproposed method separated the background
and foregroundsatisfactorily.
The second sequence (LB) was captured from a lobby in anoffice
building. On this occasion, background changes weremainly caused by
switching on/off lights. Two examples fromthis sequence are shown
in Fig. 6. The first example shows ascene before some lights are
switched off. A significant shadowof the person can be observed.
The result of the proposedmethod is rather satisfactory apart from
a small shadow in-cluded. The second example shows a scene at about
220 frames(about 15 s) after some lights have been switched off. In
thisexample, even through the background reference image had
notbeen recovered completely, the proposed method detected
theperson successfully.
2) Campus Environments: The second type of environmentsare
campuses or parks. Changes in the background are oftencaused by
motion of tree branches and their shadows on theground surface, or
the changes in the weather. Three examplesdisplayed in Fig. 7 were
from a sequence (CAM) captured ina campus containing moving tree
branches. The great motionof tree branches was caused by strong
winds which can be ob-served from the waving yellow flag in the
left of the images.The moving tree branches also resulted in the
changes of treeshadows. The three example frames contain vehicles
of differentcolors. The results have shown that the proposed method
has de-tected the vehicles quite well in such an environment.
3) Shopping Malls: The third type of typical environmentsare
shopping centers, hotels, museums, airports, and restaurants.In
these environments, the lighting are distributed from the ceil-
ings and there are some specular highlight ground surfaces.
Insuch cases, if multiple persons move in the scene, the shadowson
the ground surface vary significantly in the image sequences.In
these environments, the shadows can be classified into umbraand
penumbra [33]. The umbra corresponds to the backgroundarea where
the direct light is almost totally blocked by the fore-ground
object, whereas in the penumbra area of the background,the lighting
is partially blocked.
Three examples from such environments are shown in Fig. 8.They
were from a busy shopping center (SC), an airport (AP),and a buffet
restaurant (BR) [6]. Significant shadows of movingpersons cast on
the ground surfaces from different directionscan be observed. As
one can see, the proposed method hasobtained the satisfactory
results apart from where small partsof the shadows have been
detected in these three environments.The recognized shadows could
also be observed in the main-tained background reference images.
This can be explainedas a) the feature distance measure (10) that
is robust to theillumination changes has played a major role in
suppressingthe penumbra areas; b) the learned color co-occurrences
of thechanges from the normal background appearance to umbra
andvice versa could identify many background pixels in the
umbraareas. Hence, without special models for the shadows,
theproposed method has suppressed much of the various shadowsin
these environments.
4) Subway Stations: Subway stations are other public sitesthat
often require monitoring. In these situations, the motion
ofbackground objects (e.g. trains and escalators) would make
thebackground modeling difficult. Further, the background modelis
hard to be established if there are frequent human crowdsin the
scenes. Fig. 9 shows two examples from a sequence ofa subway
station (SS) recorded on a tape by a CCTV surveil-lance system. The
scene contains three moving escalators andfrequent human flows in
the right side of the images. In addition,there are significant
background changes caused by variation oflighting conditions due to
many glass and stainless steel mate-
-
LI et al.: STATISTICAL MODELING OF COMPLEX BACKGROUNDS 1469
Fig. 8. Experimental results on shopping mall environments which
contain specular ground surfaces. The three examples came from a
busy shopping center (SC),an airport (AP), and a buffet restaurant
(BR), respectively.
Fig. 9. Experimental results on a subway station environments
(SS). The examples are the frame 1993 and 2634.
rials inside the building. Another difficulty for this sequence
iscaused by the noise which is due to the old video recording
de-vice. The busy flow of human crowds can be observed from
thefirst example in the figure. Our test results have shown that
theproposed method performed quite satisfactorily for such
diffi-cult scenarios.
5) Sidewalks: The pedestrians are often the targets of
interestin many video surveillance systems. In such a case, a
surveillancesystem may monitor the scene from day to night with a
range ofweather conditions. The tests were performed on such an
envi-ronment around the clock. The image sequences (SW) were
ob-tained from highly compressed MPEG4 videos through a
localwireless network. There were large variations of background
inthe images. Five examples and test results are shown in Fig.
10.These correspond to sunny, cloudy and rainy weather
conditions,as well as the night and crowded scenes. The interval
between thefirst two frames was less than 10 s. Comparing the
results withthe ground truths, one can find that the proposed
method per-formed very robustly in this complex environment.
From the comparisons with MoG in these examples shown inFigs.
510, one can find that the proposed method has outper-formed the
MoG method in these selected difficult situations.
The parameters used for these tests are listed in Tables II
andIII. The parameters in Table II were applied to all tests.
Thelearning rates in the first row of Table III were applied to
alltests except for three shorter sequences where larger rates
(inthe second row of the table) were applied. This is because if
theimage sequences are short, a slightly faster learning rate
shouldbe used to speed up the initial learning. Since the decision
(5) forthe classification of background and foreground is not
directlydependent on any threshold, the performance of the
proposedmethod is not very sensitive to these parameters.
B. Quantitative EvaluationsTo get a systematic evaluation of
proposed method, the per-
formance of the proposed method was also evaluated
quantita-tively on randomly selected samples from ten
sequences.
-
1470 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11,
NOVEMBER 2004
Fig. 10. Experimental results of pedestrian detection from a
sidewalk environment (SW) around the clock. From top to bottom are
the frames from sunny, cloudy,rainy, night, and crowd scenes.
TABLE IIPARAMETERS USED FOR ALL TEST EXAMPLES
TABLE IIILEARNING RATES USED IN THE TEST EXAMPLES
In the previous work [6], the results were evaluated
quantita-tively from the comparison with the ground truths in terms
of
1) false negative error: the number of foreground pixels thatare
missed;
2) false positive error: the number of background pixels thatare
misdetected as foreground.
However, it is found that when averaging the measures over
var-ious environments, they are not accurate enough. In this
paper,a new similarity measure is introduced to evaluate the
results offoreground segmentation. Let be a detected region and
be
the corresponding ground truth, then the similarity
measurebetween regions and is defined as
(31)
Using this measure, approaches to a maximum value1.0 if and are
the same. Otherwise, varies between1 and 0 according to their
similarity. It approaches to 0 withthe least similarity. It
integrates the false positive and negativeerrors in one measure.
But one drawback of the measure (31) isthat it is a nonlinear
measure. To obtain a visual impression ofthe quantities of the
similarity measures, some matching imagesand their similarity
values are displayed in Fig. 11.
For systematic evaluation and comparison, the similaritymeasure
(31) has been applied to the experimental results withthe proposed
method and the MoG method. A total of tenimage sequences were used,
including those in Figs. 510,as well as two others [watersurface
(WS) and fountain (FT)].We randomly selected 20 frames from each
sequence, leadingto total of 200 sample frames for evaluation. The
groundtruths of these 200 frames were generated manually by
fourinvited persons. All of the ten test sequences, the results,
andthe ground truths of the sample frames are available.2 The
2http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html.
-
LI et al.: STATISTICAL MODELING OF COMPLEX BACKGROUNDS 1471
Fig. 11. Some examples of matching images with different
similarity measure values. In the images, the bright color
indicates the intersection of the detectedregions and the ground
truths, the dark gray color indicates the false negatives, and the
light gray color indicates the false positives.
TABLE IVQUANTITATIVE EVALUATION AND COMPARISON RESULT: VALUES
FROM THE TEST SEQUENCES
averaging values of similarity measures for each
individualsequence and for ten sequences are shown in Table IV.
Thecorresponding values obtained from MoG method are alsoincluded.
The ten test sequences are chosen among the difficultsequences,
containing global background changes as well aspersons staying
motionless for quite a while besides the variousbackground changes
described in the previous subsection.Taking these situations into
account, the obtained evaluationvalues for both methods are quite
good. Comparing the resultsin Table IV and in Fig. 11, the
performance of the proposedmethod is rather satisfactory. The
comparison shows that theproposed method has provided improved
results over thosefrom the MoG method, especially for image
sequences withcomplex background.
C. Limitations of the MethodSince the statistics are related to
each individual pixel
without considering its neighborhood, the method can
wronglyabsorb a foreground object into the background if the
objectremains motionless for a long time duration. For example, ifa
foreground moving person or car suddenly stopped movingin the scene
and remains still for a long time duration. Furtherimprovement
should be made, e.g., by combining informationfrom high-level
object recognition and tracking in backgroundupdating [34],
[35].
Another potential problem is that the method can wronglylearn
the features of foreground objects as the background ifcrowded
foreground objects (e.g., crowds) are constantly pre-sented in the
scenes. Adjusting the learning rate based on thefeedback from the
optical flow could provide a possible solu-tion [36]. A method of
controlling the learning processes frommultilevel feedbacks is
being investigated in order to further im-prove the results.
VI. CONCLUSION
For detecting foreground object from complex environments,this
paper proposed a novel statistical method for backgroundmodeling.
In the proposed method, the background appearanceis characterized
by the principal features and their statistics.
Foreground objects are detected through foreground and
back-ground classification under Bayesian framework. Our test
re-sults have shown that the principal features are effective in
rep-resenting the spectral, spatial, and temporal characteristics
ofthe background. A learning method to adapt to the
time-varyingbackground features has been proposed and analyzed.
Experi-ments have been conducted on a variety of environments,
in-cluding offices, public buildings, subway stations,
campuses,parking lots, airports, and sidewalks. The experimental
resultshave shown the effectiveness of the proposed method.
Quanti-tative evaluation and comparison with the existing method
haveshown that an improved performance for foreground object
de-tection in complex background has been achieved. Some
limi-tations of the method have been discussed with suggestions
topossible improvement.
ACKNOWLEDGMENT
The authors would like to thank R. Luo, J. Shang, X. Huang,and
W. Liu for their work to generate the ground truths
forevaluation.
REFERENCES[1] D. Gavrila, The visual analysis of human movement:
A survey,
Comput. Vis. Image Understanding, vol. 73, no. 1, pp. 8298,
1999.[2] L. Li and M. Leung, Integrating intensity and texture
differences for
robust change detection, IEEE Trans. Image Processing, vol. 11,
pp.105112, Feb. 2002.
[3] E. Durucan and T. Ebrahimi, Change detection and background
extrac-tion by linear algebra, Proc. IEEE, vol. 89, pp. 13681381,
Oct. 2001.
[4] C. Stauffer and W. Grimson, Learning patterns of activity
using real-time tracking, IEEE Trans. Pattern Anal. Machine
Intell., vol. 22, pp.747757, Aug. 2000.
[5] I. Haritaoglu, D. Harwood, and L. Davis, : Real-time
surveillanceof people and their activities, IEEE Trans. Pattern
Anal. Machine Intell.,vol. 22, pp. 809830, Aug. 2000.
[6] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, Wallflower:
Princi-ples and practice of background maintenance, in Proc. IEEE
Int. Conf.Computer Vision, Sept. 1999, pp. 255261.
[7] K. Karmann and A. Von Brandt, Moving object recognition
using anadaptive background memory, Time-Varing Image Process.
MovingObject Recognit., 2, pp. 289296, 1990.
[8] C. Wren, A. Azarbaygaui, T. Darrell, and A. Pentland,
Pfinder: real-time tracking of the human body, IEEE Trans. Pattern
Anal. MachineIntell., vol. 19, pp. 780785, July 1997.
[9] A. Elgammal, D. Harwood, and L. Davis, Non-parametric model
forbackground subtraction, in Proc. Eur. Conf. Computer Vision,
2000.
-
1472 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11,
NOVEMBER 2004
[10] N. Paragios and V. Ramesh, A MRF-based approach for
real-timesubway monitoring, in Proc. IEEE Conf. Computer Vision and
PatternRecognition, vol. 1, Dec. 2001, pp. I-1034I-1040.
[11] O. Javed, K. Shafique, and M. Shah, A hierarchical approach
to robustbackground subtraction using color and gradient
information, in Proc.IEEE Workshop Motion Video Computing, Dec.
2002, pp. 2227.
[12] L. Li, W. M. Huang, I. Y. H. Gu, and Q. Tian, Foreground
object detec-tion in changing background based on color
co-occurrence statistics, inProc. IEEE Workshop Applications of
Computer Vision, Dec. 2002, pp.269274.
[13] L. Wixson, Detecting salient motion by accumulating
directionary-con-sistent flow, IEEE Trans. Pattern Anal. Machine
Intell., vol. 22, pp.774780, Aug. 2000.
[14] N. J. B. McFarlane and C. P. Schofield, Segmentation and
tracking ofpiglets in images, Mach. Vis. Applicat., vol. 8, pp.
187193, 1995.
[15] D. Koller, J. Weber, T. Huang, J. Malik, G. Ogasawara, B.
Rao, and S.Russel, Toward robust automatic traffic scene analysis
in real-time, inProc. Int. Conf. Pattern Recognition, 1994, pp.
126131.
[16] A. Bobick, J. Davis, S. Intille, F. Baird, L. Cambell, Y.
Irinov, C. Pin-hanez, and A. Wilson, Kidsroom: Action recognition
in an interactivestory environment, Mass. Inst. Technol.,
Cambridge, Perceptual Com-puting Tech. Rep. 398, 1996.
[17] J. Rehg, M. Loughlin, and K. Waters, Vision for a smart
kiosk, in Proc.IEEE Conf. Computer Vision and Pattern Recognition,
June 1997, pp.690696.
[18] T. Olson and F. Brill, Moving object detection and event
recognitionalgorithm for smart cameras, in Proc. DARPA Image
UnderstandingWorkshop, 1997, pp. 159175.
[19] T. Boult, Frame-rate multi-body tracking for surveillance,
in Proc.DARPA Image Understanding Workshop, 1998.
[20] T. Darell, G. Gordon, M. Harville, and J. Woodfill,
Integrated persontracking using stereo, color, and pattern
detection, in Proc. IEEEConf. Computer Vision and Pattern
Recognition, June 1998, pp.601608.
[21] A. Shafer, J. Krumm, B. Brumitt, B. Meyers, M. Czerwinski,
andD. Robbins, The new EasyLiving project at microsoft, in
Proc.DARPA/NIST Smart Space Workshop, 1998.
[22] C. Eveland, K. Konolige, and R. C. Bolles, Background
modeling forsegmentation of video-rate stereo sequences, in Proc.
IEEE Conf. Com-puter Vision and Pattern Recognition, June 1998, pp.
266271.
[23] N. Friedman and S. Russell, Image segmentation in video
sequences:a probabilistic approach, in Proc. 13th Conf. Uncertainty
Artificial In-telligence, 1997.
[24] A. J. Lipton, H. Fujiyoshi, and R. S. Patil, Moving target
classificationand tracking from real-time video, in Proc. IEEE
Workshop Applicationof Computer Vision, Oct. 1998, pp. 814.
[25] M. Harville, G. Gordon, and J. Woodfill, Foreground
segmentationusing adaptive mixture model in color and depth, in
Proc. IEEEWorkshop Detection and Recognition of Events in Video,
July 2001, pp.311.
[26] X. Gao, T. Boult, F. Coetzee, and V. Ramesh, Error analysis
of back-ground adaption, in Proc. IEEE Conf. Computer Vision and
PatternRecognition, June 2000, pp. 503510.
[27] K. Skifstad and R. Jain, Illumination independent change
detectionfrom real world image sequence, Comput. Vis., Graph. Image
Process.,vol. 46, pp. 387399, 1989.
[28] S. C. Liu, C. W. Fu, and S. Chang, Statistical change
detection with mo-ments under time-varying illumination, IEEE
Trans. Image Processing,vol. 7, pp. 12581268, Aug. 1998.
[29] N. Oliver, B. Rosario, and A. Pentland, A Bayesian computer
visionsystem for modeling human interactions, IEEE Trans. Pattern
Anal.Machine Intell., vol. 22, pp. 831843, Aug. 2000.
[30] A. Iketani, A. Nagai, Y. Kuno, and Y. Shirai, Detecting
persons onchanging background, in Proc. Int. Conf. Pattern
Recognition, vol. 1,1998, pp. 7476.
[31] P. Rosin, Thresholding for change detection, in Proc. IEEE
Int. Conf.Computer Vision, Jan. 1998, pp. 274279.
[32] Q. Cai, A. Mitiche, and J. K. Aggarwal, Tracking human
motion in anindoor environment, in Proc. IEEE Int. Conf. Image
Processing, Oct.1995, pp. 215218.
[33] C. Jiang and M. O. Ward, Shadow identification (in June),
in Proc.IEEE Int. Conf. Computer Vision and Pattern Recognition,
1992, pp.606612.
[34] L. Li, I. Y. H. Gu, M. K. H. Leung, and Q. Tian,
Knowledge-based fuzzyreasoning for maintenance of moderate-to-fast
background changes invideo surveillance, in Proc. 4th IASTED Int.
Conf. Signal and ImageProcessing, 2002, pp. 436440.
[35] M. Harville, A framework for high-level feedback to
adaptive,per-pixel, mixture-of-gaussian background models, in Proc.
Eur. Conf.Computer Vision, 2002, pp. 543560.
[36] D. Gutchess, M. Trajkovic, E. Cohen-Solal, D. Lyons, and A.
K. Jain,A background model initialization algorithm for video
surveillance, inProc. IEEE Int. Conf. Computer Vision, vol. 1, July
2001, pp. 733740.
Liyuan Li (M96) received the B.E. and M.E. de-grees from
Southeast University, Nanjing, China, in1985 and 1988,
respectively, and the Ph.D. degreefrom Nanyang Technological
University, Singapore,in 2001.
From 1988 to 1999, he was on the faculty at South-east
University, where he was an Assistant Lecturer(1988 to 1990),
Lecturer (1990 to 1994), and Asso-ciate Professor (1995 to 1999).
Since 2001, he hasbeen a Research Scientist at the Institute for
Info-comm Research, Singapore. His current research in-
terests include video surveillance, object tracking, event and
behavior under-standing, etc.
Weimin Huang (M97) received the B.Eng. degreein automation and
the M.Eng. and Ph.D. degreesin computer engineering from Tsinghua
University,Beijing, China, in 1989, 1991, and 1996,
respec-tively.
He is a Research Scientist at the Institute for Info-comm
Research, Singapore. He has worked on theresearch of handwriting
signature verification, bio-metrics authentication, and audio/video
event detec-tion. His current research interests include image
pro-cessing, computer vision, pattern recognition, human
computer interaction, and statistical learning.
Irene Yu-Hua Gu (M94SM03) received the Ph.D.degree in electrical
engineering from the EindhovenUniversity of Technology, Eindhoven,
The Nether-lands, in 1992.
She is an Associate Professor in the Departmentof Signals and
Systems, Chalmers University ofTechnology, Gteborg, Sweden. She was
a Re-search Fellow at Philips Research Institute IPO,The
Netherlands, and Staffordshire University,Staffordshire, U.K., and
a Lecturer at The Universityof Birmingham, Birmingham, U.K., from
1992 to
1996. Since 1996, she has been with the Department of Signals
and Systems,Chalmers University of Technology. Her current research
interests includeimage processing, video surveillance and object
tracking, video communica-tions, and signal processing applications
to electric power systems.
Dr. Gu has served as an Associate Editor for the IEEE
TRANSACTIONS ONSYSTEMS, MAN, AND CYBERNETICS since 2000, and she is
currently the Chair-Elect of the IEEE Swedish Signal Processing
Chapter.
Qi Tian (M83SM90) received the B.S. and M.S.degrees in
electrical and computer engineering fromthe Tsinghua University,
Beijing, China, in 1967 and1981, respectively, and the Ph.D. degree
in electricaland computer engineering from the University ofSouth
Carolina, Columbia, in 1984.
He is a Principal Scientist at the Media Divi-sion, Institute
for Infocomm Research, Singapore.His main research interests are
image/video/audioanalysis, indexing and retrieval, media content
iden-tification and security, computer vision, and pattern
recognition. He joined the Institute of System Science, National
Universityof Singapore, in 1992. Since then, he has been working on
robust characterID recognition and video indexing. He was the
Program Director for theMedia Engineering Program, Kent Ridge
Digital Labs, then Laboratories forInformation Technology, from
2001 to 2002.
Dr. Tian has served on editorial boards of professional journals
and as chairsand members of technical committees of the IEEE
Pacific-Rim Conferenceon Multimedia (PCM), the IEEE International
Conference on Multimedia andExpo (ICME), etc.