A Progress Review of Intelligent CCTV Surveillance Systems

IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications5-7 September 2005, Sofia, Bulgaria

A Progress Review of Intelligent CCTV Surveillance SystemsAnthony C Davies 1, Sergio A Velastin 2

1) Visiting Professor, School of Computing and Information Systems, Kingston University, Penhryn Road, Kingston,Surrey, KT1 2EE, England (and Emeritus Professor, King's College London). tonydaviesgieee.org

2) Reader, School of Computing and Information Systems, Kingston University. sergio.velastingkingston.ac.uk

Abstract - The development and capabilities of closed circuittelevision surveillance systems in association with distributedcomputing systems are reviewed, and the applications tovarious aspects ofsurveillance are described.

Keywords - Closed-circuit television, surveillance, imageprocessing, security, vehicle tracking, crowd-monitoring.

I. INTRODUCTION

The installation of closed circuit television (CCTV)cameras in urban environments is now commonplace andwell-known. Public attitudes to these systems are in twoopposing categories:

(a) concerns over invasion of privacy and fears ofauthoritarian control of the population

(b) welcoming the increased safety in public spacesand reductions in antisocial behaviour.

Fears in category (a) are not surprising, since withinliving memory, and in some cases until recently, therehave been countries in Europe governed by regimes witha strong commitment to the oppressive monitoring andcontrol of their citizens, and the continuous tracking ofactual or supposed dissidents'.

Support for category (b) arises from public concernsover both real and imagined risks of urban crime andterrorism. Controlling anti-social behaviour andprotection against terrorist threats is generally perceivedto have a high priority, making intrusive monitoringrelatively acceptable and encouraging the installation ofadvanced surveillance systems. Recent uses of thephrase homeland secu rity', and plans fortechnologically-advanced personal ID cards has madethis a topical issue.Forms of visual monitoring which were once the

exclusive domain of well-funded secretive governmentsecurity agencies are becoming readily available at anaffordable cost to the public, and very sophisticatedsystems are being developed and installed for generalpolice uses.

II. HISTORICAL STAGES OF DEVELOPMENT

CCTV based surveillance has developed from simplesystems comprising a camera connected directly to aviewing screen with an observer in a control room,watching for incidents of crime or vandalism or

searching for targeted individuals, to complex multi-camera systems with many computers. The computerscarry out image processing, object recognition and sceneanalysis, prior to presenting data to observers.Sophisticated recording and playback techniques can beprovided, with searching capabilities, and suitable foraudit and presenting observed results as evidence in legalproceedings.

Early steps in this development were to add manualcamera control (pan, tilt and zoom) in order to trackevents or objects of particular types. As the number ofcameras in each system increased, they first exceeded thenumber of monitoring screens (so requiring sequentialswitching) and then exceeded the capability of theobserving teams to watch events effectively. Theattention span of human observers is inevitably limited[1]. Adding computational intelligence to alert theobservers to the infrequent image sequences whichcontained events of possible importance was thus anatural development as computing resources becameboth cheaper and more powerful.

The need for video recording arose to relieve theobservers from overload, to provide an audit capabilityand for detailed studies of images leading up to anincident Colour, including the use of infrared imaging,has in many cases replaced monochrome, and digitalencoding with compression, for storage and/ortransmission, has become economical.

The availability of many powerful computers withinthe overall surveillance system has enabled the requiredincreased automation, with the use of tomputerintelligence' to detect and analyse significant events andalert human observers when appropriate. Detectionthresholds usually need to be biased in favour of falsepositives, since these can usually be quickly recognisedand disregarded by a human observer, whereas missingreal incidents could be a serious deficiency.

Decreasing cost and increasing processing capabilityhas made distributed multi-computer systems feasibleand affordable, and able to include a significant amountof pre-processing of images per camera' (as opposed tosending the raw video data to a centralised processor).Image compression methods have advancedconsiderably, so that the data rates and storage required

0-7803-9446-1/05/$20.00 C2005 IEEE 417

can be substantially reduced. Adoption of very efficientbut lossy image-compression methods is not alwaysacceptable because of fears that the images may then bechallenged and not be accepted as evidence in legalproceedings.When video recordings from CCTV systems are

required for legal purposes, it is necessary to ensure thefeasibility of an independent and authenticated audit,without which such recordings cannot be used asevidence in court proceedings. Recorded image-sequences associated with criminal events can often besubjected to advanced off-line image enhancementmethods, making it easier to identify the particularindividuals involved in the crime. Searching largeamounts of recorded video data may be required in orderto try to find other scenes containing the sameindividuals.CCTV surveillance systems are required to have high

availability and run continuously, and yet be extendible.For example, adding cameras and computing resources isoften desired, but may need to be done without taking theexisting system out of operation.A British newspaper reported in 2004 that over four

million CCTV cameras were deployed in the UK [2].III. CROWD AND PERSON MONITORING

Early uses of computers included the estimation ofglobal properties of urban crowds such as density andflow [3,4]. Applications were to learn about pedestrianbehaviour in densely-populated urban spaces such as cityrail-stations during fush-hours', shopping malls andairport departure lounges. Additionally this data wasexpected to be useful to assist architects in the planningof urban environments. The techniques did not attemptto identify individual pedestrians in a crowd- rather, thecrowd was monitored as a generalised entity, andhverage' properties sought.

Ideal gas theory provides a basis for predicting thebehaviour of gases although it takes no account of thetrajectories of individual molecules. In an analogousway, some aspects of crowd behaviour can be estimatedwithout considering the individuals making up thecrowd. Some analogies with fluid behaviour and withthe behaviour of charged particles in an electric field maybe observed.A. Crowd Density

If the area of an image occupied by pedestrians can beidentified, then the ratio of crowd area' to backgroundarea' provides a rough estimate of crowd density. In lowdensity crowds, the edges' of all pedestrians may beextracted from the image, and the total number of edgepixels has been found to be a measure to estimate thenumber of people in the crowd (Fig. 1).

(a) (b)Fig. 1. Pedestrian extraction from crowd-scene (from [5])

(a) edges' of pedestrians (b) area of pedestrians.Compensation is needed for the different apparent size

of objects at various distances. Calibration for cameraangle is also essential. Variation of lighting levels anddirections presents many problems to be overcome. Inmany monitoring situations in public spaces, the camerashave to be mounted in places which give very obliqueimages, making image interpretation more difficult.By monitoring density, it is possible to set thresholds

above which safety might be endangered (Fig. 2). Forexample, entry to city-centre rail stations may have to beperiodically closed during rush-hours in order to keepcrowd density on platforms at a safe level.

Fig. 2. Example of automatic identification of a congested area,denoted by array of white dots.

B. CrowdFlowCrowd flow investigations can provide real time data

of security value. Optical flow (the change in imagebrightness from one image to the next, expressed as avector field) can provide basic motion data. This uses agradient method to follow changes in brightness in theimage [6]

418

Fig. 3. (a) Extracted motion-vector field - arrows showdirection of motion.

Block matching is an alternative effective technique.Typically, a rectangular block of pixels around a selectedpixel in the first image is compared with a larger searchblock around the same pixel in the second image.

All pixel blocks in the search area are comparedwith the one in the first image, and the one with greatestsimilarity is used as an indicator of the estimatedmovement direction and speed (Fig. 3(a)). An overviewof the movement may be obtained by a polar velocityplot (Fig.3(b)), which also enables unusual flows to bedetected automatically (for example, a group ofpedestrians moving very quickly in a non-standarddirection).

Fig. 3. (b) Polar velocity plot derived from a crowd flowestimate (from [5]).

C. Detecting Unusual BehaviourDetection of individuals who remain in one place for

long periods while surrounding crowds are moving, or ofindividuals moving in a different direction from themajority, may be a potential indication of planned oractual criminal behaviour. Automatic detection of suchevents in the image sequences from a CCTV system may

be used to alert human observers, who can then decide ifthe events are significant. Tracking of specificindividuals is also possible. Distinguishing approachingand receding pedestrians (for example by automaticmarking with different coloured tags) may beoperationally useful.

IV. VEHICLE MONITORING

Speed monitoring is often carried out by simplecameras triggered by a vehicle exceeding the speed limit.The photograph captures the vehicle registration number.Instead, this may now be automated within a CCTVsurveillance system. This requires not only identifyingthe moving vehicle within the scene, but also locating theposition of the registration plate, and then automaticallyreading the number. To achieve this reliably under allconditions of lighting and weather is a challengingimage-processing task, but operationally successfulsystems have been developed.

Reliable methods of automatically reading vehicleregistration numbers provide alternative possibilities forexcess-speed detection. For example, detecting a vehicleat position xl at time t, and then detecting the samevehicle at position x2 at time t2 immediately provides alower bound for the velocity of travel between these twopoints of distance(x1,x2)A(t2-t1). If that is greater than thespeed limit, then it can be concluded that the vehicle hasexceeded the limit. Thus with a complex network ofcameras capable of identifying particular vehicles (e.g.by registration numbers) it is possible to detect extremespeed violations without measuring speed, even when thevehicles are moving within the junctions of a complexhighway network. Installing such systems hasimplications for civil liberties, since the same dataprovides information on the movements of particularvehicles, and therefore has the potential to be used(misused) to track individuals.

There is now considerable government interest in theenhancement of such capabilities, for such applicationsas tongestion charging' (automatically billing theowners of vehicles which are observed travelling withincity centres during chargeable periods of time) andhutomated road pricing' (toll-collection with no toll-booths).

V. PUBLIC ACCEPTANCE AND INVOLVEMENT

In most countries, cameras in public areas are not nowconsidered an invasion of privacy. They are a commonsight at highway junctions, pedestrian crossings andtraffic-signal locations (for traffic monitoring) and in citycentres, shopping malls, airports and rail stations (forperson monitoring). The trends are towards smallercameras, which may not be readily visible (sometimesaugmented by highly-visible dummy cameras to misleadcriminals).

419

There is now public awareness and acceptance ofWvebcams' observing places of interest and supplyingimages to the internet on a continuous basis. It isbecoming affordable to install internet-connectedcameras in and around the home, so that remote viewingis possible while away on vacation, etc. to providereassurance that the house has not burnt down, beenflooded, or broken into by thieves. Of course making thisdata available over the internet does open a risk ofinterception.

VI. TRACKING MOVING OBJECTS: PEOPLEAND VEHICLES [7]

A key technique used to identify moving objects invideo data streams is to subtract the current frame froman estimate of the background scene, based on the ideathat anything hew' in the current frame must be mobileobject(s). Since the background alters as a result oflighting changes and camera movement, in practice thebackground estimation needs to be adaptive. Otherwise,effects of sunshine and clouds which would hardly benoticed by a human observer can produce major errors inidentifying the moving parts [8].The moving parts have to be segmented into the

distinct moving objects, typically referred to as blobs'.

Fig. 4. Pedestrians marked by bounding rectangles(approachesmarked by E and departures by F ).

A standard method of tracking objects in a videosequence involves identifying each blob of interest,marking it with an easily visible bounding rectangle, andin many cases, adding an identifier (Fig. 4). In the caseof vehicles, an identifier may be extracted automaticallyby locating and automatically reading the registrationplate. In the case of persons, of course this is notpossible.

The object is then followe d' from image to image.Difficulties naturally arise from occlusion: if the objectdisappears behind an obstacle, it is possible to use itsvelocity to estimate the place and time of its emergenceassuming no change in its velocity (more sophisticatedsystems might use acceleration data too). Commonly, aKalman filter is used to provide better estimates of futuretrajectories of objects which move into an occluded zone.

[9,10]. However, it is easy to imagine many situationswhere this is unreliable - for example two people beingtracked may disappear behind an obstacle, and whilethere may meet, hold a conversation,, then split up anddepart in opposite directions. To automaticallydetermine which one is which from the image sequencesfollowing their re-emergence is obviously not at all easy[1 1].Two situations need to be handled: dynamic

occlusion, which is the occlusion of one moving objectby another, and static occlusion which is the result ofmoving objects interacting with static objects (pillars,access gates, etc. or the boundary of the image). In theformer case, appearance templates of the movingobject(s) can be used to complete partially-obscuredobservations [12]. In the latter case, observations over along time may be used to create three-dimensional(depth) information about the image, and this may thenbe used to make interpretations and predictions about theocclusions [13]

Since surveillance is typically achieved by multiplecameras, the hand-over' of identified objects from thefield of view of one camera to another is needed.Sometimes the fields overlap, in other cases, there maybe a part of the scene not covered by any camera.

Mobile surveillance cameras (for example, mountedon vehicles) present some additional problems becausethe observed background moves from frame to frame,and a constant background has to be estimated from thissequence of differing views. Only then is it possible toidentify those objects which are really moving withrespect to this background.

VII. SURVEILLANCE FOR PUBLIC SAFETY

Because of the large number of cameras oftendeployed in public areas, the automatic detection ofevents of importance for safety and security has becomeimportant- the events are required to trigger an alarm, toalert humans able to make decisions about the need foraction [14,15]. For example, one might wish to identifyunattended baggage in a rail station. Of course, thesudden appearance of a suitably-shaped stationary objectmay be detected by conventional image processingmethods. Alternatively, if a person being tracked plits'into a moving person-like object and a smaller stationaryobject (e.g. depositing an item and walking away fromit), this could be used to trigger an alarm.

It is relatively easy to automatically detect individualsin forbidden areas (Fig. 5) or individuals who loiter forexcessive times in one place (which can be a sign ofcriminal intent). Falling pedestrians may be identified asperson-like objects which take up a horizontal positionwith little or no motion. This may be of particularimportance needing rapid response if observed on railtracks [16].

420

Fig. 5. Pedestrian automatically identified as too close toplatform edge (marked by rectangle and bar).

VIII. SOME SPECIFIC ACTIVITIES OF THE RESEARCHTEAMS AT KINGSTON UNIVERSITY

The Digital Imaging Research Centre (DIRC) atKingston University has a number of teams working onVideo Surveillance projects. Some of this research wasstarted at King's College London and some at CityUniversity London, and these teams moved to Kingston,taking the activities with them. Example projectsinclude:

* 'PRISMATICA': Participation in a EuropeanCommission project [17, 18] to make public transportsystems more attractive to passengers, safer forpassengers and staff and operationally cost-effective. Aninnovative aspect was the integration of operational,legal, social and technical aspects of CCTV surveillancesystems. Partners included RATP-Paris, ATM-Milan,STIB-Brussels, PPT-Prague, ML-Lisbon, Kings CollegeLondon, University College London, INRETS-France,CEA-France, TIS-Portugal, SODIT-France, FIT-Italy,ILA-Germany, and Thales-FrancePRISMATICA aims included development of

concepts for pro-active surveillance systems to providedecision-support tools for human operators in complexand large environments. The tools should automaticallyguide the attention and actions of the managers of atransport network, while keeping the technology itself astransparent as possible.

This project incorporated a two-stage assessment.First, the architecture was tested in the Paris Mfo (Garede Lyon) and successfully demonstrated communicationsmechanisms and protocols. Next there was a majordeployment of the system in Liverpool StreetUnderground station, London. This station takescommuters to/from one of the biggest financial centres inEurope and connects with main railways and buses.

There are more than seventy cameras in this stationcovering approximately 80% of its total area.

* Monitoring Public Spaces: Developing intelligentsurveillance tools for integration within existing urbanCCTV infrastructures to improve incident detection andassist control room operations. Incidents of interest inpublic transport systems include overcrowding, loitering,busking, begging, jumping over access barriers, drug-dealing. Fears of terrorism lead to continuousmonitoring for abandoned luggage or suspect packages.For all these situations, automation and the tracking ofobjects provide an opportunity to deploy staff only wherethey are really needed. This enables security personnelto be freed from having to acquire and manually tracktargets [19, 20].

* Plug and Play Surveillance: Devising designswhich enable newly installed cameras and associatedcomputational intelligence to easily integrate into anexisting camera network.

* Learning Camera Topology: An intelligentsurveillance system must capture and track objects toestablish a history of their behaviour, classify the objects(as people or vehicles, etc. of particular types), andestablish their trajectories in a 3D space. In many cases,a number of cameras are used with partially overlappingareas of view, and there may also be areas which are notcovered by any camera. Synthesising all the images intoa single real-time description of the scene is a complexdata-fusion task. Solving this problem also means thatcamera configuration may be changed without the needfor a full re-calibration.

* Learning Semantic Scene Models: The aim isto label regions in a scene according to activity (e.g.entry zones, exit zones, stopping zones, junctions, etc.)from video data streams from the scene. The activity isoften time-dependent- for example commuter-flows areusually in an opposite direction for travel in morning andevening, because of travel to and from the workplacerespectively. The purpose is to assist subsequentinterpretation of moving-object behaviour in the scene[11]

* Automated Extraction of Evidence from CCTVfootage: The aim is to reduce the time and effort ofreviewing lengthy CCTV video sequences to locatespecific sequences showing incidents of interest byautomating the recovery and information management ofvideo evidence.

In many investigations of major crimes in urbanenvironments, CCTV evidence plays an increasinglycrucial role in establishing the identity of vehicles and

421

individuals. Gathering this data is extremely timeconsuming, and involves manual annotation of CCTVarchives by police and other experts. A joint project ofKingston University, University of Surrey and Sira, Ltd.is exploring the linkage of the meta-data structure of thevideo interpretation process with the linguistic structureof police descriptions of evidence. An aim is to validatethe effectiveness of this data-fusion process by anautomatic generator of galleries of vehicle registrationplate images and person images (Fig. 6). Viewing ofsuch galleries by local police and the general public isknown to be an effective method of acquiringknowledge.

Fig. 6. Examples of vehicle registration number and persongalleries associated with car-park usage.IX. FUTURE DEVELOPMENTS

The progress in integrated circuit technology, theincreased capabilities of digital cameras and the newcommunications methods (wireless LANs, mobilephones) all contribute to the continued deployment ofmore complex and advanced CCTV surveillancesystems, which will become increasingly unobtrusive ascameras decrease in size, and likely to be integrated withother sensors (audio, thermal, etc.) and data-bases.The software will inevitably increase in capability, so

that tracking and recognition of people and vehicles willbecome much more effective. Automatic recognition ofbehaviour patterns will improve, so that it will becomeeasier to detect and predict both legitimate and illegalactivity. Automatic analysis of gestures [21] and theability of computers to support lip-reading at a distance[22] is already indicating a direction in whichsurveillance might develop.

There is no assurance that these systems will alwaysbe used responsibly and only by those with the publicinterest and safety in mind. Misuse by official agenciesand adoption by criminal elements in Society mayhappen if there are insufficient safeguards.

Just as the invention of the word processor did not

result in the paperless office, the development ofimproved CCTV surveillance systems is not likely tolead to the crime-free city centre.

ACKNOWLEDGEMENTS

Jia Hong Yin's work on the global estimation of crowdbehaviour [5] was the start of research at King's Collegeon this subject. Colleagues at Kingston University,particularly Graeme Jones and Tim Ellis, are thanked fortheir suggestions for this paper,EPSRC and the European Commission are thanked for

the support of projects at King's College London, andsupport for several projects at Kingston University forwork in this area, including ADVISOR (IST-1999-11287) and PRISMATICA (G,1-2000-10601)

REFERENCES

[1] E. Wallace et al, "Good Practice for the Management andOperation of Town Centre CCTV," European Conf. on Securityand Detection, 28-30 Apr. 1997, pp. 36-41 (IEE Conf. Pub. 437).

[2] M. Frith, "Big Brother Britain, 2004," The Independent, 12 Jan2004.

[3] A.C. Davies, J.H. Yin, S.A. Velastin, "Crowd monitoring usingimage processing," IEE Electronics and CommunicationEngineering J., 7, Feb 1995, pp. 37-47.

[4] S. A. Velastin, J. H. Yin, A. C. Davies, M. A. Vicencio-Silva, R.E. Allsop, A. Penn, "Automated measurement of crowd densityand motion using image processing," 7th Int. Conf. on RoadTraffic Monitoring and Control, London, UK, 1994, pp. 127-132.

[5] J. H. Yin: "Automation of crowd data-acquisition and monitoringin confined areas using image processing," Ph.D. Thesis, King'sCollege London, University of London, September 1996.

[6] B.K.P. Horn, B.G. Schunck, "Determining Optical Flow,"Artificial Intelligence, 17, 1981, pp. 185-203.

[7] G. Foresti, C. Micheloni, L. Snidaro, P. Ramagnino, T. Ellis,"Active Video-based Surveillance System," IEEE SignalProcessing Magazine, March 2005, pp. 25-37.

[8] C. Stauffer, W.E.L. Grimson, "Learning Patterns of Activity usingReal-Time Tracking," IEEE Transactions on Pattern Analysis andMachine Intelligence, 22, August 2000, pp. 747-757.

[9] I. Haritaoglu, D. Harwood, L.S. Davis, "Real-time Surveillance ofpeople and their Activities," IE EE Transactions on PatternAnalysis and Machine Intelligence, 22, (8), August 2000, pp. 809-830.

[10] H. Tao, H.S. Sawhney, R. Kumar, "Dynamic LayerRepresentation and its Applications to Tracking," IEEEConference on Computer Vision and Pattern Recognition, 2, June2000, pp. 134-141.

[11] D. Makris, T.J. Ellis, "Learning Semantic Scene Models fromObserving Activity in Visual Surveillance," IEEE Transactions onSystems Man and Cybernetics - Part B, 35 (3), June 2005, pp.397-408.

[12] A. Senior, "Tracking People with Probabilistic AppearanceModels," 3rd IEEE International Workshop on PerformanceEvaluation of Tracking and Surveillance, Copenhagen,1't June2002, pp. 48-55.

[13] J. Renno, D. Greenhill, J. Orwell and G.A. Jones "Learning theSemantic Landscape: Embedding scene knowledge in objecttracking," Realtime Imaging, Special Issue on Video ObjectProcessing, (accepted for publication in 2005).

[14] B.A. Boghossian, "Motion-based Image processing Algorithmsapplied to Crowd Monitoring Systems,", Ph.D. Thesis, King'sCollege, University of London, Oct. 2000.

[15] L.M. Fuentes, S.A. Velastin, "From tracking to advancedsurveillance," Proc. Int. Conf. on Image Processing (ICIP 2003),Barcelona, Spain, Sept 2003, III 121 -4.

[16] S. A. Velastin, M. A. Vicencio-Silva, B. Lo, L. Khoudour, "A

422

Distributed Surveillance System For Improving Security In PublicTransport Networks," Measurement and Control, 35, (8), pp. 209-13, 2002.

[17] S.A. Velastin, B. Boghossian, B. Lo, J. Sun, M.A. Vicencio-Silva,"PRISMATICA: Toward Ambient Intelligence In PublicTransport Environments," IEEE Transactions on Systems Manand Cybernetics, 35, Jan 2005, pp. 164-182.

[18] http://www.prismatica.com.[19] M. Valera, S.A. Velastin, "A review of the State-of-the-Art in

distributed surveillance systems," in Intelligent Distributed

Systems (Eds. S.A. Velastin, P. Remagnino), IEE Publications,2005.

[20] M. Valera, S.A. Velastin, "Intelligent distributed surveillancesystems: a review," IEE Proc. Vis. Image Signal Processing, 152(2), Apr 2005, pp. 192-204.

[21] I.B. Ozer, T. Lu, W. Wolf, "Design of a Real-time GestureRecognition System," IEEE Signal Processing Magazine, 22 (3),May 2005, pp. 57-64.

[22] Intel Developer Forum, Berlin, 28 Apr. 2003 and "Lip-ReadingComputers Are Born," Internet Magazine, 29 Apr 2003.

423

A Progress Review of Intelligent CCTV Surveillance Systems

Documents

information systems

distributedcomputing

advanced computing systems

multicamera systems

school of computing

kingston university

public concernsover

public attitudes