Page 1
8/18/2019 Carry Object Detection
http://slidepdf.com/reader/full/carry-object-detection 1/6
Detection of People Carrying Objects : a Motion-based Recognition Approach
Chiraz BenAbdelkader and Larry Davis
Computer Vision Laboratory
University of Maryland
College Park,
MD 20742 USA
chiraz,[email protected]
Abstract
We describe a method to detect instances of a walking
person carrying an object seen from a stationary camera.
We take a correspondence-free motion-based recognition
approach, that exploits known shape and periodicity cues
of the human silhouette shape. Specifically, we subdivide
the binary silhouette into four horizontal segments, and an-
alyze the temporal behavior of the bounding box width over
each segment. We posit that the periodicity and amplitudes
of these time series satisfy certain criteria for a natural
walking person, and deviations therefrom are an indication
that the person might be carrying an object. The method is tested on 41 360x240 color outdoor sequences of peo-
ple walking and carrying objects at various poses and cam-
era viewpoints. A correct detection rate of 85% and a false
alarm rate of 12% are obtained.
1 Introduction
An important class of human activities are those involv-
ing interactions of people with objects in the scene, such as
depositing an object, picking up an object, and the exchange
of an object between two people. Given the time intervals
during which objects are carried by any one person, we ex-
pect that a temporal logical reasoning system will be able to
infer events of object pickup, object deposit and object ex-
change. In this paper, we address the visual processing task
of determining these time intervals (during which an object
is being carried by a person). Carried object detection is
also of interest to person identification applications, since
carried objects often alter the person’s the dynamics and/or
appearance of a person’s gait, and hence might affect the
performance of a gait recognition method.
The clinical gait analysis and ergonomics research com-
munities (among others) have studied the effect of load-
carrying on human gait as a function of the load size and
the way it is carried [10, 13]. According to these studies,people carrying a (heavy) object adjust the way they walk
in order to minimize their energy expenditure (in fact this is
a general concept in gait dynamics that applies to any walk-
ing conditions) [9, 12]. Consequently, their cadence tends
to be higher and their stride length shorter. Also, the dura-
tion of the double-support phase of the gait cycle (i.e. the
period of time when both feet are on the ground) tends to be
larger for a person carrying an object.
Carried objects can be classified into two (non-mutually
exclusive) types (1) those that alter the way the person walks
(i.e. the biomechanics of gait) due to their sheer weight
and/or size, and (2) those that alter the way the person ap- pears because they occlude part of the body when carried.
Consequently, there are (at least) two approaches to visual
detection of a carried object: we can either determine if
the person’s gait is within the normal range (assuming we
have a model of ‘normal gait’), or we can characterize the
changes in appearance (in terms of its shape or texture) that
are indicative of the presence of a carried object.
In clinical gait analysis, gait abnormalities are typically
detected by measuring certain gait parameters (temporal,
kinematic and kinetic) and comparing them with those of
a naturally walking person [15]. It is difficult to compute
kinematic parameters with current state-of-the-art computer
vision, since this requires accurate tracking of body land-
marks. Furthermore, although recent work has shown that
it is possible to compute stride length robustly from video
[19, 4, 2], their estimation error is no smaller than the dif-
ference between natural and load-carrying stride lengths
(which is typically on the order of only 1-2 cm [10]).
The method of this paper takes the second, non-
parametric, approach. We formulate two constraints in
terms of the spatiotemporal patterns of the binary shape sil-
houette, that we claim to be satisfied by a naturally-walking
person but not a person carrying an object. This method is
view-invariant, however it can only detect a carried object
that protrudes sufficiently outside the body silhouette. It is
1
Page 2
8/18/2019 Carry Object Detection
http://slidepdf.com/reader/full/carry-object-detection 2/6
robust to segmentation and tracking errors, since it analyzes
shape over many frames, unlike a static shape analysis ap-
proach for example that would try to detect a ‘bump’ in the
silhouette from a single frame.We test the method on 41 outdoor sequences sponta-
neously recorded in the parking lot of a university building,
and achieves a detection rate of 85% and a false alarm rate
of 12%. To limit the scope of the problem, we make the
following assumptions:
The camera is stationary.
The person is walking in upright pose. This is a rea-
sonable assumption for a person carrying an object.
The person walks with a constant velocity for a few
seconds.
2 Related Work
Analysis and modeling of the human body and/or its mo-
tion are the subject of several areas of computer vision, such
as action/activity/gecture recognition, pedestrian detection,
and gait recognition [3, 6, 11, 4, 17, 1]. The solution ap-
proaches to these problems typically fall under one of two
categories: structure-based or structure-free. The former
assumes the action or gait to be a sequence of static con-
figurations (poses), and recognizes it by mapping featuresextracted from each frame to a configuration model. The
latter characterizes and recovers the motion generated by
the action or gait, without reference to the underlying pose
of the moving body.
Haritaoglu’s Backpack [8] system is the only work we
know of that addresses the specific problem of carried ob-
ject detection for video surveillance applications. Like our
method, it uses both shape and motion cues. It first locates
significantly protruding regions of the silhouette via static
shape symmetry analysis. Each outlier region is then classi-
fied as being part of the carried object or of the body based
on the periodicity of its vertical silhouette profile. Implicit
in this method is the assumption that aperiodic outlier re-
gions correspond to the carried object and periodic regions
to the body. This can often fail for a variety of reasons.
For example, the axis of symmetry (which is computed as
the blob’s major axis) is very sensitive to detection noise,
as well as to the size and shape of the carried object itself.
Also, using a heuristically-determined threshold to filter out
small non-symmetric regions makes this method less robust.
Like Backpack , we use a silhouette signature shape fea-
ture to capture the periodicity of the human body. A major
difference lies in that we analyze both the periodicity and
amplitude of these shape features over time to detect the
carried object, and only use static shape analysis in the final
segmentation phase of the object. Another important differ-
ence is that we explicitly constrain the location of the object
to be either in the arms region and/or legs region, since as
noted above, the silhouette signature of the region above thearms are not periodic.
3 Method
A walking person is first detected and tracked for
some
frames in the video sequence, then classified as
naturally-walking or object-carrying based on spatiotempo-
ral analysis of the obtained
binary silhouettes.
3.1 Foreground Detection and Tracking
Since the camera is assumed static, foreground detec-
tion is achieved via a non-parametric background mod-
elling technique that is essentially a generalization of the
mixed-Gaussian background modelling approach, and is
well suited for outdoor scenes in which the background is
often not perfectly static (for e.g. occasional movement of
tree leaves and grass) [7]. A number of standard morpho-
logical cleaning operations are applied to the detected blobs
to correct for random noise. Frame-to-frame tracking of a
moving object is done via simple overlap of its blob bound-
ing boxes in the current and previous frames.
3.2 Carried Object Detection
Human gait is highly structured both in space and time,
due to the bilateral symmetry of the human body and the
cyclic coordinated movement patterns of the various body
parts, which repeat at the fundamental frequency of walk-
ing. A good model for the motion of the legs is a pair of
planar pendula oscillating
out of phase [14, 16, 12].
The same can be said about the swinging of the arms [18].
We expect that the presence of a sufficiently large car-
ried object will (at least locally) distort this spatio-temporal
structure. In order to capture the differences between nat-
ural gait and load-carrying gait, we analyze the temporal
behavior of the widths of horizontal segments of the sil-
houette. Specifically, we formulate two constraints on the
periodicity and amplitude of these features, and posit that
the violation of these constraints is highly indicative that
the person is carrying an object.
Consider the subdivision of the silhouette into 4 seg-
ments, shown in Figure 1; three equal contiguous horizontal
segments over the lower body region, denoted
,
,
(bottom segments first), and one segment for the upper body
region, denoted
. We also define
(i.e.
the lower half of the body).
We compute the boundary box width over each of the
defined segments of the silhouette, for each blob in the se-
2
Page 3
8/18/2019 Carry Object Detection
http://slidepdf.com/reader/full/carry-object-detection 3/6
Region U
Region L3
Region L2
Region L1H/6
H/6
H/6
H/4
H
Figure 1. Subdivision of body silhouette into 5 segments
for shape feature computation.
quence. The time series thus obtained are denoted by
,
,
,
and
, corresponding to seg-
ments
,
,
,
and
, respectively. Since natural
walking gait is characterized by oscillation of the legs and
swinging of the arms at the period of gait, we contend that:
(1)
(2)
where
denotes the fundamental period of a time
series, and
the period of walking. The latter is estimated
via periodicity analysis of the width of the entire person’s
bounding box, i.e.
. For this we use
the autocorrelation method which is robust to colored noise
and non-linear amplitude modulations, unlike Fourier anal-
ysis [4]. We first smooth the signal, piecewise detrend it to
account for any depth changes, then compute its autocor-
relation
, where
is in some interval
and
is chosen to be sufficiently larger than
. The period
of
, denoted
, is estimated as the average distance be-
tween each two consecutive peaks in , as illustrated in
Figure 2. However
is estimated as
or
depending on
the camera viewpoint, as explained in [2].
The third and fourth constraints we formulate are an ar-
tifact of the pendular-like motion of the arms and legs, and
state that for a naturally-walking person:
(3)
(4)
where
denotes the median of a time series.
These constraints are verified via Wilcoxon’s matched-pairs
(a)
(b)
Figure 2. Computation of gait period via autocorrelation
of time series of bounding box width of binary silhouettes.
signed-rankstest (at significance level 0.05), which is a non-
parametric test for determining whether the medians of two
samples are equal [5].
4 Experiments and Results
We tested the method on 41 outdoor sequences taken
from various camera viewpoints, and captured at 30 fps and
an image size of 360x240. All sequences were recorded
without the subjects’ a priori knowledge at the parking lot
of a university building (their consent to use the sequences
was obtained afterwards). Table 1 summarizes the type of
sequences used and the detection results for each category.
The detection rate is 85% (35 out of 41) and the false alarmrate is at 11.76% (2 out of 17).
Total Natural-walking Load-carrying
Not Carrying 17 15 2
Carrying, upper body 11 2 9
Carrying, lower body 13 2 11
Table 1. Carried object detection results on 41 outdoor
sequences: rows depict the type of sequence, and columns
depict our method’s detection result.
3
Page 4
8/18/2019 Carry Object Detection
http://slidepdf.com/reader/full/carry-object-detection 4/6
4.1 Discussion
In the following, we discuss the results for a few of the
sequences. For each example, the left figure shows one
frame of the walking person, the top-right figure shows
(in blue),
(in green) and
(in red), the
center-right figure shows their respective autocorrelation
functions (with same colors), and the bottom-right shows
(in blue),
(in green),
(in red), and
(in cyan).
4.1.1 Natural-walking Gait
Figure 3 illustrates a person walking fronto-parallel and a
person walking non-fronto-parallel to the camera. All four
constraints are satisfied in both cases. Note however thatwhile in the former case both
and
have period
, in the latter
has period
. This is because the
swinging motion of the arm furthest away from the camera
is occluded by the body when the person walks at an angle.
4.1.2 Carried Object in Lower Body Region
Figure 4 shows five examples in which the carried object
resides mostly in the lower body region (i.e. held on the
side with one hand). Consequently, Constraint 3 is satisfied
in all cases since the object makes the lower region even
larger than the upper region, while Constraint 4 is violated
in all cases. Constraint 1 is satisfied in all cases, while Con-straint 2 is only satisfied by bottom two cases, mainly be-
cause the arms hardly swing when holding heavy objects.
4.1.3 Carried Object in Upper Body Region
Figure 4 shows five examples in which the carried object
resides mostly in the upper body (with both arms, on shoul-
der, or on the back). Constraint 3 is violated in the first
two cases, because the object makes the upper body appear
larger than the lower body. Constraint 4 is only violated in
the third case. Constraint 1 is satisfied in all cases, while
Constraint 2 is violated in all but the second case, again be-cause the arms hardly swing when holding an object.
4.1.4 False Alarms and False Negatives
False alarms, i.e. falsely detecting a carried object, occur
when any of the four constraints is violated not because the
person is carrying an object, but due to some other reason,
such as image noise, segmentation errors, and fluffy clothes.
The two false alarms in our experiments were both caused
by background subtraction errors (the color of the person’s
clothes was very similar to the background’s).
False negatives, i.e. failure of our method to detect that a
person is actually carrying an object, typically occur when
(a)
0 20 40 60 80 100 120 140 160 180
10
20
30
40
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 160 1800
10
20
30
40
B ( t )
(b)
(c)
0 20 40 60 80 100 120 140 160
10
20
30
40
time
B ( t )
−50 − 40 − 30 −20 −10 0 10 20 30 40 50−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 1600
10
20
30
40
time
B ( t )
(d)
Figure 3. Width series and their corresponding autocor-
relation functions for a person walking fronto-parallel (a,b),
and a person walking non-fronto-parallel (c,d) to camera.
the object does not protrude outside the body silhouette.
Figure 6 shows four examples of false negatives. In the firsttwo cases, the object is carried in one hand and is not de-
tected because it is too small. In the third case, the carried
object was quite large, but did not protrude enough outside
the body silhouette. Finally in the fourth case, the object
carried on the shoulder in not detected because our method
does not analyze the body above the arms region. Note that
the person is carrying object in the other hand, but is also
not detected because it is too small.
5 Conclusions and Future Work
We have described a novel method for determining
whether a person in carrying an object in monocular se-
quences seen from a stationary camera. This is achieved
via temporal correspondence-free analysis of binary shape
features, that exploits the periodic and pendular-like motion
of legs and arms. The method is view-invariantand is robust
to segmentation and tracking errors. It achieves a detection
rate of 85% and a false alarm rate of 12% when tested on
41 mostly non-fronto-parallel video sequences. One way
we are working to extend this method is by deducing from
the current time series analysis the body region where the
object is located, to be able to segment it and possibly infer
its type.
4
Page 5
8/18/2019 Carry Object Detection
http://slidepdf.com/reader/full/carry-object-detection 5/6
(a)
0 50 100 150 200 250
10
20
30
40
time
B ( t )
−80 −60 −40 −20 0 20 40 60 80−1
−0.5
0
0.5
1
r
A ( r )
0 50 100 150 200 2500
10
20
30
40
time
B ( t )
(b)
(c)
0 50 100 150
10
20
30
time
B ( t )
−40 −30 −20 −10 0 10 20 30 40−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 1600
10
20
30
40
time
B ( t )
(d)
(e)
0 20 40 60 80 100 120 140 160 180
10
20
30
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 160 1800
10
20
30
40
time
B ( t )
(f)
(g)
0 50 100 150 20010
20
30
40
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 50 100 150 2000
10
20
30
40
50
time
B ( t )
(h)
(i)
0 50 100 150 200 250 3005
10
15
20
25
time
B ( t )
−100 − 80 −60 −40 −20 0 20 40 60 80 100−1
−0.5
0
0.5
1
r
A ( r )
0 50 100 150 200 250 3000
5
10
15
20
time
B ( t )
(j)
Figure 4. Width series and their corresponding autocor-
relation functions for cases when carried objects reside in
lower body region.
(a)
0 20 40 60 80 100 120 140 160 180 200
10
20
30
40
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 160 180 2000
10
20
30
40
time
B ( t )
(b)
(c)
0 20 40 60 80 100 120 140 160
10
20
30
40
time
B ( t )
−50 −40 −30 −20 −10 0 10 20 30 40 50−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 1600
20
40
60
time
B ( t )
(d)
(e)
0 50 100 150 200 250 300
10
20
30
time
B ( t )
−100 − 80 −60 −40 −20 0 20 40 60 80 100−1
−0.5
0
0.5
1
r
A ( r )
0 50 100 150 200 250 3000
10
20
30
40
time
B ( t )
(f)
(g)
0 20 40 60 80 100 120 140 160 180
10
20
30
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 160 1800
10
20
30
40
time
B ( t )
(h)
(i)
0 10 20 30 40 50 60 70 80 90 100
10
20
30
time
B ( t )
−20 −15 −10 −5 0 5 10 15 20−1
−0.5
0
0.5
1
r
A ( r )
0 10 20 30 40 50 60 70 80 90 100 1 100
10
20
30
40
time
B ( t )
(j)
Figure 5. Width series and their corresponding autocor-
relation functions for cases when carried objects reside in
upper body region.
5
Page 6
8/18/2019 Carry Object Detection
http://slidepdf.com/reader/full/carry-object-detection 6/6
Acknowledgment
The authors would like to thank Harsh Nanda for collec-
tion of video data, Ahmed Elgammal for providing back-ground subtraction code, and to Ross Cutler from Microsoft
Research for providing code and references for periodicity
analysis. The support of the National Institute of Justice
(FAS No. 01529393) is also gratefully acknowledged.
References
[1] C. BenAbdelkader. Gait as a biometric for person identifica-
tion in video sequences. Technical Report 4289, University
of Maryland College Park, 2001.
[2] C. BenAbdelkader, R. Cutler, and L. Davis. Eigengait: A
performance analysis with different camera viewpoints andvariable clothing. In FGR, 2002.
[3] L. W. Campbell and A. Bobick. Recognition of human body
motion using phase space constraints. In ICCV , 1995.
[4] R. Cutler and L. Davis. Robust real-time periodic motion
detection, analysis and applications. PAMI , 13(2), 2000.
[5] W. W. Daniel. Applied Non-parametric Statistics. PWS-
KENT Publishing Company, 1978.
[6] J. W. Davis and A. F. Bobick. The representation and recog-
nition of action using temporal templates. In CVPR, 1997.
[7] A. Elgammal, D. Harwood, and L. Davis. Non-parametric
model for background subtraction. In ICCV , 2000.
[8] I. Haritaoglu, R. Cutler, D. Harwood, and L. Davis. Back-
pack: Detection of people carrying objects using silhouettes.
CVIU , 6(3), 2001.[9] V. Inman, H. J. Ralston, and F. Todd. Human Walking.
Williams and Wilkins, 1981.
[10] H. Kinoshita. Effects of different loads and carrying sys-
tems on selected biomechanical parameters describing walk-
ing gait. Ergonomics, 28(9), 1985.
[11] J. Little and J. Boyd. Recognizing people by their gait: the
shape of motion. Videre, 1(2), 1998.
[12] K. Luttgens and K. Wells. Kinesiology: Scientific Basis of
Human Motion. Saunders College Publishing, 7th edition,
1982.
[13] P. Martin and R. Nelson. The effect of carried loads on the
walking patterns of men and women. Ergonomics, 29(10),
1986.
[14] T. A. McMahon. Muscles, Reflexes, and Locomotion.Princeton University Press, 1984.
[15] J. Perry. Gait Analysis: Normal and Pathological Function.
SLACK Inc., 1992.
[16] J. Piscopo and J. A. Baley. Kinesiology: the Science of
Movement . John Wiley and Sons, 1st edition, 1981.
[17] Y. Song, X. Feng, and P. Perona. Towards detection of hu-
man motion. In CVPR, 2000.
[18] D. Webb, R. H. Tuttle, and M. Baksh. Pendular activity of
human upper limbs during slow and normal walking. Amer-
ican Journal of Physical Anthropology, 93, 1994.
[19] S. Yasutomi and H. Mori. A method for discriminating
pedestrians based on rythm. In IEEE/RSG Intl Conf. on In-
telligent Robots and Systems, 1994.
(a)
0 50 100 150 200
10
20
30
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 50 100 150 2000
10
20
30
40
time
B ( t )
(b)
(c)
0 20 40 60 80 100 120 140 160 180 200
10
20
30
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 160 180 2000
10
20
30
40
time
B ( t )
(d)
(e)
0 20 40 60 80 100 120 140 160
10
20
30
40
time
B ( t )
−50 −40 −30 −20 −10 0 10 20 30 40 50−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 1600
10
20
30
40
time
B ( t )
(f)
(g)
0 20 40 60 80 100 120 140 160 180
10
20
30
40
time
B ( t )
−60 −40 −20 0 20 40 60−1
−0.5
0
0.5
1
r
A ( r )
0 20 40 60 80 100 120 140 160 1800
10
20
30
40
B ( t )
(h)
Figure 6. Width series and their corresponding autocorre-
lation functions for false negatives, i.e. where carried object
is not detected.
6