-
IEEE Workshop on Intelligent Data Acquisition and Advanced
Computing Systems: Technology and Applications5-7 September 2005,
Sofia, Bulgaria
A Progress Review of Intelligent CCTV Surveillance
SystemsAnthony C Davies 1, Sergio A Velastin 2
1) Visiting Professor, School of Computing and Information
Systems, Kingston University, Penhryn Road, Kingston,Surrey, KT1
2EE, England (and Emeritus Professor, King's College London).
tonydaviesgieee.org
2) Reader, School of Computing and Information Systems, Kingston
University. sergio.velastingkingston.ac.uk
Abstract - The development and capabilities of closed
circuittelevision surveillance systems in association with
distributedcomputing systems are reviewed, and the applications
tovarious aspects ofsurveillance are described.
Keywords - Closed-circuit television, surveillance,
imageprocessing, security, vehicle tracking, crowd-monitoring.
I. INTRODUCTION
The installation of closed circuit television (CCTV)cameras in
urban environments is now commonplace andwell-known. Public
attitudes to these systems are in twoopposing categories:
(a) concerns over invasion of privacy and fears ofauthoritarian
control of the population
(b) welcoming the increased safety in public spacesand
reductions in antisocial behaviour.
Fears in category (a) are not surprising, since withinliving
memory, and in some cases until recently, therehave been countries
in Europe governed by regimes witha strong commitment to the
oppressive monitoring andcontrol of their citizens, and the
continuous tracking ofactual or supposed dissidents'.
Support for category (b) arises from public concernsover both
real and imagined risks of urban crime andterrorism. Controlling
anti-social behaviour andprotection against terrorist threats is
generally perceivedto have a high priority, making intrusive
monitoringrelatively acceptable and encouraging the installation
ofadvanced surveillance systems. Recent uses of thephrase homeland
secu rity', and plans fortechnologically-advanced personal ID cards
has madethis a topical issue.Forms of visual monitoring which were
once the
exclusive domain of well-funded secretive governmentsecurity
agencies are becoming readily available at anaffordable cost to the
public, and very sophisticatedsystems are being developed and
installed for generalpolice uses.
II. HISTORICAL STAGES OF DEVELOPMENT
CCTV based surveillance has developed from simplesystems
comprising a camera connected directly to aviewing screen with an
observer in a control room,watching for incidents of crime or
vandalism or
searching for targeted individuals, to complex multi-camera
systems with many computers. The computerscarry out image
processing, object recognition and sceneanalysis, prior to
presenting data to observers.Sophisticated recording and playback
techniques can beprovided, with searching capabilities, and
suitable foraudit and presenting observed results as evidence in
legalproceedings.
Early steps in this development were to add manualcamera control
(pan, tilt and zoom) in order to trackevents or objects of
particular types. As the number ofcameras in each system increased,
they first exceeded thenumber of monitoring screens (so requiring
sequentialswitching) and then exceeded the capability of
theobserving teams to watch events effectively. Theattention span
of human observers is inevitably limited[1]. Adding computational
intelligence to alert theobservers to the infrequent image
sequences whichcontained events of possible importance was thus
anatural development as computing resources becameboth cheaper and
more powerful.
The need for video recording arose to relieve theobservers from
overload, to provide an audit capabilityand for detailed studies of
images leading up to anincident Colour, including the use of
infrared imaging,has in many cases replaced monochrome, and
digitalencoding with compression, for storage and/ortransmission,
has become economical.
The availability of many powerful computers withinthe overall
surveillance system has enabled the requiredincreased automation,
with the use of tomputerintelligence' to detect and analyse
significant events andalert human observers when appropriate.
Detectionthresholds usually need to be biased in favour of
falsepositives, since these can usually be quickly recognisedand
disregarded by a human observer, whereas missingreal incidents
could be a serious deficiency.
Decreasing cost and increasing processing capabilityhas made
distributed multi-computer systems feasibleand affordable, and able
to include a significant amountof pre-processing of images per
camera' (as opposed tosending the raw video data to a centralised
processor).Image compression methods have advancedconsiderably, so
that the data rates and storage required
0-7803-9446-1/05/$20.00 C2005 IEEE 417
-
can be substantially reduced. Adoption of very efficientbut
lossy image-compression methods is not alwaysacceptable because of
fears that the images may then bechallenged and not be accepted as
evidence in legalproceedings.When video recordings from CCTV
systems are
required for legal purposes, it is necessary to ensure
thefeasibility of an independent and authenticated audit,without
which such recordings cannot be used asevidence in court
proceedings. Recorded image-sequences associated with criminal
events can often besubjected to advanced off-line image
enhancementmethods, making it easier to identify the
particularindividuals involved in the crime. Searching largeamounts
of recorded video data may be required in orderto try to find other
scenes containing the sameindividuals.CCTV surveillance systems are
required to have high
availability and run continuously, and yet be extendible.For
example, adding cameras and computing resources isoften desired,
but may need to be done without taking theexisting system out of
operation.A British newspaper reported in 2004 that over four
million CCTV cameras were deployed in the UK [2].III. CROWD AND
PERSON MONITORING
Early uses of computers included the estimation ofglobal
properties of urban crowds such as density andflow [3,4].
Applications were to learn about pedestrianbehaviour in
densely-populated urban spaces such as cityrail-stations during
fush-hours', shopping malls andairport departure lounges.
Additionally this data wasexpected to be useful to assist
architects in the planningof urban environments. The techniques did
not attemptto identify individual pedestrians in a crowd- rather,
thecrowd was monitored as a generalised entity, andhverage'
properties sought.
Ideal gas theory provides a basis for predicting thebehaviour of
gases although it takes no account of thetrajectories of individual
molecules. In an analogousway, some aspects of crowd behaviour can
be estimatedwithout considering the individuals making up thecrowd.
Some analogies with fluid behaviour and withthe behaviour of
charged particles in an electric field maybe observed.A. Crowd
Density
If the area of an image occupied by pedestrians can
beidentified, then the ratio of crowd area' to backgroundarea'
provides a rough estimate of crowd density. In lowdensity crowds,
the edges' of all pedestrians may beextracted from the image, and
the total number of edgepixels has been found to be a measure to
estimate thenumber of people in the crowd (Fig. 1).
(a) (b)Fig. 1. Pedestrian extraction from crowd-scene (from
[5])
(a) edges' of pedestrians (b) area of pedestrians.Compensation
is needed for the different apparent size
of objects at various distances. Calibration for cameraangle is
also essential. Variation of lighting levels anddirections presents
many problems to be overcome. Inmany monitoring situations in
public spaces, the camerashave to be mounted in places which give
very obliqueimages, making image interpretation more difficult.By
monitoring density, it is possible to set thresholds
above which safety might be endangered (Fig. 2). Forexample,
entry to city-centre rail stations may have to beperiodically
closed during rush-hours in order to keepcrowd density on platforms
at a safe level.
Fig. 2. Example of automatic identification of a congested
area,denoted by array of white dots.
B. CrowdFlowCrowd flow investigations can provide real time
data
of security value. Optical flow (the change in imagebrightness
from one image to the next, expressed as avector field) can provide
basic motion data. This uses agradient method to follow changes in
brightness in theimage [6]
418
-
Fig. 3. (a) Extracted motion-vector field - arrows showdirection
of motion.
Block matching is an alternative effective technique.Typically,
a rectangular block of pixels around a selectedpixel in the first
image is compared with a larger searchblock around the same pixel
in the second image.
All pixel blocks in the search area are comparedwith the one in
the first image, and the one with greatestsimilarity is used as an
indicator of the estimatedmovement direction and speed (Fig. 3(a)).
An overviewof the movement may be obtained by a polar velocityplot
(Fig.3(b)), which also enables unusual flows to bedetected
automatically (for example, a group ofpedestrians moving very
quickly in a non-standarddirection).
Fig. 3. (b) Polar velocity plot derived from a crowd
flowestimate (from [5]).
C. Detecting Unusual BehaviourDetection of individuals who
remain in one place for
long periods while surrounding crowds are moving, or
ofindividuals moving in a different direction from themajority, may
be a potential indication of planned oractual criminal behaviour.
Automatic detection of suchevents in the image sequences from a
CCTV system may
be used to alert human observers, who can then decide ifthe
events are significant. Tracking of specificindividuals is also
possible. Distinguishing approachingand receding pedestrians (for
example by automaticmarking with different coloured tags) may
beoperationally useful.
IV. VEHICLE MONITORING
Speed monitoring is often carried out by simplecameras triggered
by a vehicle exceeding the speed limit.The photograph captures the
vehicle registration number.Instead, this may now be automated
within a CCTVsurveillance system. This requires not only
identifyingthe moving vehicle within the scene, but also locating
theposition of the registration plate, and then
automaticallyreading the number. To achieve this reliably under
allconditions of lighting and weather is a
challengingimage-processing task, but operationally
successfulsystems have been developed.
Reliable methods of automatically reading vehicleregistration
numbers provide alternative possibilities forexcess-speed
detection. For example, detecting a vehicleat position xl at time
t, and then detecting the samevehicle at position x2 at time t2
immediately provides alower bound for the velocity of travel
between these twopoints of distance(x1,x2)A(t2-t1). If that is
greater than thespeed limit, then it can be concluded that the
vehicle hasexceeded the limit. Thus with a complex network
ofcameras capable of identifying particular vehicles (e.g.by
registration numbers) it is possible to detect extremespeed
violations without measuring speed, even when thevehicles are
moving within the junctions of a complexhighway network. Installing
such systems hasimplications for civil liberties, since the same
dataprovides information on the movements of particularvehicles,
and therefore has the potential to be used(misused) to track
individuals.
There is now considerable government interest in theenhancement
of such capabilities, for such applicationsas tongestion charging'
(automatically billing theowners of vehicles which are observed
travelling withincity centres during chargeable periods of time)
andhutomated road pricing' (toll-collection with no
toll-booths).
V. PUBLIC ACCEPTANCE AND INVOLVEMENT
In most countries, cameras in public areas are not nowconsidered
an invasion of privacy. They are a commonsight at highway
junctions, pedestrian crossings andtraffic-signal locations (for
traffic monitoring) and in citycentres, shopping malls, airports
and rail stations (forperson monitoring). The trends are towards
smallercameras, which may not be readily visible
(sometimesaugmented by highly-visible dummy cameras to
misleadcriminals).
419
-
There is now public awareness and acceptance ofWvebcams'
observing places of interest and supplyingimages to the internet on
a continuous basis. It isbecoming affordable to install
internet-connectedcameras in and around the home, so that remote
viewingis possible while away on vacation, etc. to
providereassurance that the house has not burnt down, beenflooded,
or broken into by thieves. Of course making thisdata available over
the internet does open a risk ofinterception.
VI. TRACKING MOVING OBJECTS: PEOPLEAND VEHICLES [7]
A key technique used to identify moving objects invideo data
streams is to subtract the current frame froman estimate of the
background scene, based on the ideathat anything hew' in the
current frame must be mobileobject(s). Since the background alters
as a result oflighting changes and camera movement, in practice
thebackground estimation needs to be adaptive. Otherwise,effects of
sunshine and clouds which would hardly benoticed by a human
observer can produce major errors inidentifying the moving parts
[8].The moving parts have to be segmented into the
distinct moving objects, typically referred to as blobs'.
Fig. 4. Pedestrians marked by bounding
rectangles(approachesmarked by E and departures by F ).
A standard method of tracking objects in a videosequence
involves identifying each blob of interest,marking it with an
easily visible bounding rectangle, andin many cases, adding an
identifier (Fig. 4). In the caseof vehicles, an identifier may be
extracted automaticallyby locating and automatically reading the
registrationplate. In the case of persons, of course this is
notpossible.
The object is then followe d' from image to image.Difficulties
naturally arise from occlusion: if the objectdisappears behind an
obstacle, it is possible to use itsvelocity to estimate the place
and time of its emergenceassuming no change in its velocity (more
sophisticatedsystems might use acceleration data too). Commonly,
aKalman filter is used to provide better estimates of
futuretrajectories of objects which move into an occluded zone.
[9,10]. However, it is easy to imagine many situationswhere this
is unreliable - for example two people beingtracked may disappear
behind an obstacle, and whilethere may meet, hold a conversation,,
then split up anddepart in opposite directions. To
automaticallydetermine which one is which from the image
sequencesfollowing their re-emergence is obviously not at all
easy[1 1].Two situations need to be handled: dynamic
occlusion, which is the occlusion of one moving objectby
another, and static occlusion which is the result ofmoving objects
interacting with static objects (pillars,access gates, etc. or the
boundary of the image). In theformer case, appearance templates of
the movingobject(s) can be used to complete
partially-obscuredobservations [12]. In the latter case,
observations over along time may be used to create
three-dimensional(depth) information about the image, and this may
thenbe used to make interpretations and predictions about
theocclusions [13]
Since surveillance is typically achieved by multiplecameras, the
hand-over' of identified objects from thefield of view of one
camera to another is needed.Sometimes the fields overlap, in other
cases, there maybe a part of the scene not covered by any
camera.
Mobile surveillance cameras (for example, mountedon vehicles)
present some additional problems becausethe observed background
moves from frame to frame,and a constant background has to be
estimated from thissequence of differing views. Only then is it
possible toidentify those objects which are really moving
withrespect to this background.
VII. SURVEILLANCE FOR PUBLIC SAFETY
Because of the large number of cameras oftendeployed in public
areas, the automatic detection ofevents of importance for safety
and security has becomeimportant- the events are required to
trigger an alarm, toalert humans able to make decisions about the
need foraction [14,15]. For example, one might wish to
identifyunattended baggage in a rail station. Of course, thesudden
appearance of a suitably-shaped stationary objectmay be detected by
conventional image processingmethods. Alternatively, if a person
being tracked plits'into a moving person-like object and a smaller
stationaryobject (e.g. depositing an item and walking away fromit),
this could be used to trigger an alarm.
It is relatively easy to automatically detect individualsin
forbidden areas (Fig. 5) or individuals who loiter forexcessive
times in one place (which can be a sign ofcriminal intent). Falling
pedestrians may be identified asperson-like objects which take up a
horizontal positionwith little or no motion. This may be of
particularimportance needing rapid response if observed on
railtracks [16].
420
-
Fig. 5. Pedestrian automatically identified as too close
toplatform edge (marked by rectangle and bar).
VIII. SOME SPECIFIC ACTIVITIES OF THE RESEARCHTEAMS AT KINGSTON
UNIVERSITY
The Digital Imaging Research Centre (DIRC) atKingston University
has a number of teams working onVideo Surveillance projects. Some
of this research wasstarted at King's College London and some at
CityUniversity London, and these teams moved to Kingston,taking the
activities with them. Example projectsinclude:
* 'PRISMATICA': Participation in a EuropeanCommission project
[17, 18] to make public transportsystems more attractive to
passengers, safer forpassengers and staff and operationally
cost-effective. Aninnovative aspect was the integration of
operational,legal, social and technical aspects of CCTV
surveillancesystems. Partners included RATP-Paris,
ATM-Milan,STIB-Brussels, PPT-Prague, ML-Lisbon, Kings
CollegeLondon, University College London, INRETS-France,CEA-France,
TIS-Portugal, SODIT-France, FIT-Italy,ILA-Germany, and
Thales-FrancePRISMATICA aims included development of
concepts for pro-active surveillance systems to
providedecision-support tools for human operators in complexand
large environments. The tools should automaticallyguide the
attention and actions of the managers of atransport network, while
keeping the technology itself astransparent as possible.
This project incorporated a two-stage assessment.First, the
architecture was tested in the Paris Mfo (Garede Lyon) and
successfully demonstrated communicationsmechanisms and protocols.
Next there was a majordeployment of the system in Liverpool
StreetUnderground station, London. This station takescommuters
to/from one of the biggest financial centres inEurope and connects
with main railways and buses.
There are more than seventy cameras in this stationcovering
approximately 80% of its total area.
* Monitoring Public Spaces: Developing intelligentsurveillance
tools for integration within existing urbanCCTV infrastructures to
improve incident detection andassist control room operations.
Incidents of interest inpublic transport systems include
overcrowding, loitering,busking, begging, jumping over access
barriers, drug-dealing. Fears of terrorism lead to
continuousmonitoring for abandoned luggage or suspect packages.For
all these situations, automation and the tracking ofobjects provide
an opportunity to deploy staff only wherethey are really needed.
This enables security personnelto be freed from having to acquire
and manually tracktargets [19, 20].
* Plug and Play Surveillance: Devising designswhich enable newly
installed cameras and associatedcomputational intelligence to
easily integrate into anexisting camera network.
* Learning Camera Topology: An intelligentsurveillance system
must capture and track objects toestablish a history of their
behaviour, classify the objects(as people or vehicles, etc. of
particular types), andestablish their trajectories in a 3D space.
In many cases,a number of cameras are used with partially
overlappingareas of view, and there may also be areas which are
notcovered by any camera. Synthesising all the images intoa single
real-time description of the scene is a complexdata-fusion task.
Solving this problem also means thatcamera configuration may be
changed without the needfor a full re-calibration.
* Learning Semantic Scene Models: The aim isto label regions in
a scene according to activity (e.g.entry zones, exit zones,
stopping zones, junctions, etc.)from video data streams from the
scene. The activity isoften time-dependent- for example
commuter-flows areusually in an opposite direction for travel in
morning andevening, because of travel to and from the
workplacerespectively. The purpose is to assist
subsequentinterpretation of moving-object behaviour in the
scene[11]
* Automated Extraction of Evidence from CCTVfootage: The aim is
to reduce the time and effort ofreviewing lengthy CCTV video
sequences to locatespecific sequences showing incidents of interest
byautomating the recovery and information management ofvideo
evidence.
In many investigations of major crimes in urbanenvironments,
CCTV evidence plays an increasinglycrucial role in establishing the
identity of vehicles and
421
-
individuals. Gathering this data is extremely timeconsuming, and
involves manual annotation of CCTVarchives by police and other
experts. A joint project ofKingston University, University of
Surrey and Sira, Ltd.is exploring the linkage of the meta-data
structure of thevideo interpretation process with the linguistic
structureof police descriptions of evidence. An aim is to
validatethe effectiveness of this data-fusion process by
anautomatic generator of galleries of vehicle registrationplate
images and person images (Fig. 6). Viewing ofsuch galleries by
local police and the general public isknown to be an effective
method of acquiringknowledge.
Fig. 6. Examples of vehicle registration number and
persongalleries associated with car-park usage.IX. FUTURE
DEVELOPMENTS
The progress in integrated circuit technology, theincreased
capabilities of digital cameras and the newcommunications methods
(wireless LANs, mobilephones) all contribute to the continued
deployment ofmore complex and advanced CCTV surveillancesystems,
which will become increasingly unobtrusive ascameras decrease in
size, and likely to be integrated withother sensors (audio,
thermal, etc.) and data-bases.The software will inevitably increase
in capability, so
that tracking and recognition of people and vehicles willbecome
much more effective. Automatic recognition ofbehaviour patterns
will improve, so that it will becomeeasier to detect and predict
both legitimate and illegalactivity. Automatic analysis of gestures
[21] and theability of computers to support lip-reading at a
distance[22] is already indicating a direction in whichsurveillance
might develop.
There is no assurance that these systems will alwaysbe used
responsibly and only by those with the publicinterest and safety in
mind. Misuse by official agenciesand adoption by criminal elements
in Society mayhappen if there are insufficient safeguards.
Just as the invention of the word processor did not
result in the paperless office, the development ofimproved CCTV
surveillance systems is not likely tolead to the crime-free city
centre.
ACKNOWLEDGEMENTS
Jia Hong Yin's work on the global estimation of crowdbehaviour
[5] was the start of research at King's Collegeon this subject.
Colleagues at Kingston University,particularly Graeme Jones and Tim
Ellis, are thanked fortheir suggestions for this paper,EPSRC and
the European Commission are thanked for
the support of projects at King's College London, andsupport for
several projects at Kingston University forwork in this area,
including ADVISOR (IST-1999-11287) and PRISMATICA
(G,1-2000-10601)
REFERENCES
[1] E. Wallace et al, "Good Practice for the Management
andOperation of Town Centre CCTV," European Conf. on Securityand
Detection, 28-30 Apr. 1997, pp. 36-41 (IEE Conf. Pub. 437).
[2] M. Frith, "Big Brother Britain, 2004," The Independent, 12
Jan2004.
[3] A.C. Davies, J.H. Yin, S.A. Velastin, "Crowd monitoring
usingimage processing," IEE Electronics and
CommunicationEngineering J., 7, Feb 1995, pp. 37-47.
[4] S. A. Velastin, J. H. Yin, A. C. Davies, M. A.
Vicencio-Silva, R.E. Allsop, A. Penn, "Automated measurement of
crowd densityand motion using image processing," 7th Int. Conf. on
RoadTraffic Monitoring and Control, London, UK, 1994, pp.
127-132.
[5] J. H. Yin: "Automation of crowd data-acquisition and
monitoringin confined areas using image processing," Ph.D. Thesis,
King'sCollege London, University of London, September 1996.
[6] B.K.P. Horn, B.G. Schunck, "Determining Optical
Flow,"Artificial Intelligence, 17, 1981, pp. 185-203.
[7] G. Foresti, C. Micheloni, L. Snidaro, P. Ramagnino, T.
Ellis,"Active Video-based Surveillance System," IEEE
SignalProcessing Magazine, March 2005, pp. 25-37.
[8] C. Stauffer, W.E.L. Grimson, "Learning Patterns of Activity
usingReal-Time Tracking," IEEE Transactions on Pattern Analysis
andMachine Intelligence, 22, August 2000, pp. 747-757.
[9] I. Haritaoglu, D. Harwood, L.S. Davis, "Real-time
Surveillance ofpeople and their Activities," IE EE Transactions on
PatternAnalysis and Machine Intelligence, 22, (8), August 2000, pp.
809-830.
[10] H. Tao, H.S. Sawhney, R. Kumar, "Dynamic
LayerRepresentation and its Applications to Tracking,"
IEEEConference on Computer Vision and Pattern Recognition, 2,
June2000, pp. 134-141.
[11] D. Makris, T.J. Ellis, "Learning Semantic Scene Models
fromObserving Activity in Visual Surveillance," IEEE Transactions
onSystems Man and Cybernetics - Part B, 35 (3), June 2005,
pp.397-408.
[12] A. Senior, "Tracking People with Probabilistic
AppearanceModels," 3rd IEEE International Workshop on
PerformanceEvaluation of Tracking and Surveillance, Copenhagen,1't
June2002, pp. 48-55.
[13] J. Renno, D. Greenhill, J. Orwell and G.A. Jones "Learning
theSemantic Landscape: Embedding scene knowledge in
objecttracking," Realtime Imaging, Special Issue on Video
ObjectProcessing, (accepted for publication in 2005).
[14] B.A. Boghossian, "Motion-based Image processing
Algorithmsapplied to Crowd Monitoring Systems,", Ph.D. Thesis,
King'sCollege, University of London, Oct. 2000.
[15] L.M. Fuentes, S.A. Velastin, "From tracking to
advancedsurveillance," Proc. Int. Conf. on Image Processing (ICIP
2003),Barcelona, Spain, Sept 2003, III 121 -4.
[16] S. A. Velastin, M. A. Vicencio-Silva, B. Lo, L. Khoudour,
"A
422
-
Distributed Surveillance System For Improving Security In
PublicTransport Networks," Measurement and Control, 35, (8), pp.
209-13, 2002.
[17] S.A. Velastin, B. Boghossian, B. Lo, J. Sun, M.A.
Vicencio-Silva,"PRISMATICA: Toward Ambient Intelligence In
PublicTransport Environments," IEEE Transactions on Systems Manand
Cybernetics, 35, Jan 2005, pp. 164-182.
[18] http://www.prismatica.com.[19] M. Valera, S.A. Velastin, "A
review of the State-of-the-Art in
distributed surveillance systems," in Intelligent
Distributed
Systems (Eds. S.A. Velastin, P. Remagnino), IEE
Publications,2005.
[20] M. Valera, S.A. Velastin, "Intelligent distributed
surveillancesystems: a review," IEE Proc. Vis. Image Signal
Processing, 152(2), Apr 2005, pp. 192-204.
[21] I.B. Ozer, T. Lu, W. Wolf, "Design of a Real-time
GestureRecognition System," IEEE Signal Processing Magazine, 22
(3),May 2005, pp. 57-64.
[22] Intel Developer Forum, Berlin, 28 Apr. 2003 and
"Lip-ReadingComputers Are Born," Internet Magazine, 29 Apr
2003.
423