Top Banner
A survey on wireless multimedia sensor networks Ian F. Akyildiz * , Tommaso Melodia, Kaushik R. Chowdhury Broadband and Wireless Networking Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States Received 11 March 2006; received in revised form 6 August 2006; accepted 5 October 2006 Available online 2 November 2006 Abstract The availability of low-cost hardware such as CMOS cameras and microphones has fostered the development of Wire- less Multimedia Sensor Networks (WMSNs), i.e., networks of wirelessly interconnected devices that are able to ubiqui- tously retrieve multimedia content such as video and audio streams, still images, and scalar sensor data from the environment. In this paper, the state of the art in algorithms, protocols, and hardware for wireless multimedia sensor net- works is surveyed, and open research issues are discussed in detail. Architectures for WMSNs are explored, along with their advantages and drawbacks. Currently off-the-shelf hardware as well as available research prototypes for WMSNs are listed and classified. Existing solutions and open research issues at the application, transport, network, link, and phys- ical layers of the communication protocol stack are investigated, along with possible cross-layer synergies and optimizations. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Wireless sensor networks; Multimedia communications; Distributed smart cameras; Video sensor networks; Energy-aware protocol design; Cross-layer protocol design; Quality of service 1. Introduction Wireless sensor networks (WSN) [22] have drawn the attention of the research community in the last few years, driven by a wealth of theoretical and practical challenges. This growing interest can be largely attributed to new applications enabled by large-scale networks of small devices capable of har- vesting information from the physical environment, performing simple processing on the extracted data and transmitting it to remote locations. Significant results in this area over the last few years have ush- ered in a surge of civil and military applications. As of today, most deployed wireless sensor networks measure scalar physical phenomena like tempera- ture, pressure, humidity, or location of objects. In general, most of the applications have low band- width demands, and are usually delay tolerant. More recently, the availability of inexpensive hardware such as CMOS cameras and microphones that are able to ubiquitously capture multimedia content from the environment has fostered the 1389-1286/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2006.10.002 * Corresponding author. Tel.: +1 404 894 5141; fax: +1 404 894 7883. E-mail addresses: [email protected] (I.F. Akyildiz), tomma- [email protected] (T. Melodia), [email protected] (K.R. Chowdhury). Computer Networks 51 (2007) 921–960 www.elsevier.com/locate/comnet
40
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wireless Multimedia Networks Survey

Computer Networks 51 (2007) 921–960

www.elsevier.com/locate/comnet

A survey on wireless multimedia sensor networks

Ian F. Akyildiz *, Tommaso Melodia, Kaushik R. Chowdhury

Broadband and Wireless Networking Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology,

Atlanta, GA 30332, United States

Received 11 March 2006; received in revised form 6 August 2006; accepted 5 October 2006Available online 2 November 2006

Abstract

The availability of low-cost hardware such as CMOS cameras and microphones has fostered the development of Wire-less Multimedia Sensor Networks (WMSNs), i.e., networks of wirelessly interconnected devices that are able to ubiqui-tously retrieve multimedia content such as video and audio streams, still images, and scalar sensor data from theenvironment. In this paper, the state of the art in algorithms, protocols, and hardware for wireless multimedia sensor net-works is surveyed, and open research issues are discussed in detail. Architectures for WMSNs are explored, along withtheir advantages and drawbacks. Currently off-the-shelf hardware as well as available research prototypes for WMSNsare listed and classified. Existing solutions and open research issues at the application, transport, network, link, and phys-ical layers of the communication protocol stack are investigated, along with possible cross-layer synergies andoptimizations.� 2006 Elsevier B.V. All rights reserved.

Keywords: Wireless sensor networks; Multimedia communications; Distributed smart cameras; Video sensor networks; Energy-awareprotocol design; Cross-layer protocol design; Quality of service

1. Introduction

Wireless sensor networks (WSN) [22] have drawnthe attention of the research community in the lastfew years, driven by a wealth of theoretical andpractical challenges. This growing interest can belargely attributed to new applications enabled bylarge-scale networks of small devices capable of har-

1389-1286/$ - see front matter � 2006 Elsevier B.V. All rights reserved

doi:10.1016/j.comnet.2006.10.002

* Corresponding author. Tel.: +1 404 894 5141; fax: +1 404 8947883.

E-mail addresses: [email protected] (I.F. Akyildiz), [email protected] (T. Melodia), [email protected](K.R. Chowdhury).

vesting information from the physical environment,performing simple processing on the extracted dataand transmitting it to remote locations. Significantresults in this area over the last few years have ush-ered in a surge of civil and military applications. Asof today, most deployed wireless sensor networksmeasure scalar physical phenomena like tempera-ture, pressure, humidity, or location of objects. Ingeneral, most of the applications have low band-width demands, and are usually delay tolerant.

More recently, the availability of inexpensivehardware such as CMOS cameras and microphonesthat are able to ubiquitously capture multimediacontent from the environment has fostered the

.

Page 2: Wireless Multimedia Networks Survey

922 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

development of Wireless Multimedia Sensor Net-works (WMSNs) [54,90], i.e., networks of wirelesslyinterconnected devices that allow retrieving videoand audio streams, still images, and scalar sensordata. With rapid improvements and miniaturizationin hardware, a single sensor device can be equippedwith audio and visual information collection mod-ules. As an example, the Cyclops image capturingand inference module [103], is designed for extre-mely light-weight imaging and can be interfacedwith a host mote such as Crossbow’s MICA2 [4]or MICAz [5]. In addition to the ability to retrievemultimedia data, WMSNs will also be able to store,process in real-time, correlate and fuse multimediadata originated from heterogeneous sources.

Wireless multimedia sensor networks will notonly enhance existing sensor network applicationssuch as tracking, home automation, and environ-mental monitoring, but they will also enable severalnew applications such as:

• Multimedia surveillance sensor networks. Wirelessvideo sensor networks will be composed of inter-connected, battery-powered miniature videocameras, each packaged with a low-power wire-less transceiver that is capable of processing,sending, and receiving data. Video and audiosensors will be used to enhance and complementexisting surveillance systems against crime andterrorist attacks. Large-scale networks of videosensors can extend the ability of law enforcementagencies to monitor areas, public events, privateproperties and borders.

• Storage of potentially relevant activities. Multime-dia sensors could infer and record potentially rel-evant activities (thefts, car accidents, trafficviolations), and make video/audio streams orreports available for future query.

• Traffic avoidance, enforcement and control sys-

tems. It will be possible to monitor car traffic inbig cities or highways and deploy services thatoffer traffic routing advice to avoid congestion.In addition, smart parking advice systems basedon WMSNs [29] will allow monitoring availableparking spaces and provide drivers with auto-mated parking advice, thus improving mobilityin urban areas. Moreover, multimedia sensorsmay monitor the flow of vehicular traffic onhighways and retrieve aggregate informationsuch as average speed and number of cars. Sen-sors could also detect violations and transmitvideo streams to law enforcement agencies

to identify the violator, or buffer images andstreams in case of accidents for subsequent acci-dent scene analysis.

• Advanced health care delivery. Telemedicine sen-sor networks [59] can be integrated with 3G mul-timedia networks to provide ubiquitous healthcare services. Patients will carry medical sensorsto monitor parameters such as body temperature,blood pressure, pulse oximetry, ECG, breathingactivity. Furthermore, remote medical centerswill perform advanced remote monitoring oftheir patients via video and audio sensors, loca-tion sensors, motion or activity sensors, whichcan also be embedded in wrist devices [59].

• Automated assistance for the elderly and family

monitors. Multimedia sensor networks can beused to monitor and study the behavior of elderlypeople as a means to identify the causes ofillnesses that affect them such as dementia [106].Networks of wearable or video and audio sensorscan infer emergency situations and immediatelyconnect elderly patients with remote assistanceservices or with relatives.

• Environmental monitoring. Several projects onhabitat monitoring that use acoustic and videofeeds are being envisaged, in which informationhas to be conveyed in a time-critical fashion.For example, arrays of video sensors are alreadyused by oceanographers to determine the evolu-tion of sandbars via image processing techniques[58].

• Person locator services. Multimedia content suchas video streams and still images, along withadvanced signal processing techniques, can beused to locate missing persons, or identify crimi-nals or terrorists.

• Industrial process control. Multimedia contentsuch as imaging, temperature, or pressureamongst others, may be used for time-criticalindustrial process control. Machine vision is theapplication of computer vision techniques toindustry and manufacturing, where informationcan be extracted and analyzed by WMSNs tosupport a manufacturing process such as thoseused in semiconductor chips, automobiles, foodor pharmaceutical products. For example, inquality control of manufacturing processes,details or final products are automaticallyinspected to find defects. In addition, machinevision systems can detect the position and orien-tation of parts of the product to be picked up bya robotic arm. The integration of machine vision

Page 3: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 923

systems with WMSNs can simplify and addflexibility to systems for visual inspections andautomated actions that require high-speed,high-magnification, and continuous operation.

As observed in [37], WMSNs will stretch thehorizon of traditional monitoring and surveillancesystems by:

• Enlarging the view. The Field of View (FoV) of asingle fixed camera, or the Field of Regard (FoR)of a single moving pan-tilt-zoom (PTZ) camera islimited. Instead, a distributed system of multiplecameras and sensors enables perception of theenvironment from multiple disparate viewpoints,and helps overcoming occlusion effects.

• Enhancing the view. The redundancy introducedby multiple, possibly heterogeneous, overlappedsensors can provide enhanced understandingand monitoring of the environment. Overlappedcameras can provide different views of the samearea or target, while the joint operation ofcameras and audio or infrared sensors can helpdisambiguate cluttered situations.

• Enabling multi-resolution views. Heterogeneousmedia streams with different granularity can beacquired from the same point of view to providea multi-resolution description of the scene andmultiple levels of abstraction. For example, staticmedium-resolution camera views can be enrichedby views from a zoom camera that provides ahigh-resolution view of a region of interest. Forexample, such feature could be used to recognizepeople based on their facial characteristics.

Many of the above applications require the sen-sor network paradigm to be rethought in view ofthe need for mechanisms to deliver multimedia con-tent with a certain level of quality of service (QoS).Since the need to minimize the energy consumptionhas driven most of the research in sensor networksso far, mechanisms to efficiently deliver applicationlevel QoS, and to map these requirements to net-work layer metrics such as latency and jitter, havenot been primary concerns in mainstream researchon classical sensor networks.

Conversely, algorithms, protocols and techniquesto deliver multimedia content over large-scale net-works have been the focus of intensive research inthe last 20 years, especially in ATM wired and wire-less networks. Later, many of the results derived forATM networks have been readapted, and architec-

tures such as Diffserv and Intserv for Internet QoSdelivery have been developed. However, there areseveral main peculiarities that make QoS deliveryof multimedia content in sensor networks an evenmore challenging, and largely unexplored, task:

• Resource constraints. Sensor devices are con-strained in terms of battery, memory, process-ing capability, and achievable data rate [22].Hence, efficient use of these scarce resources ismandatory.

• Variable channel capacity. While in wired net-works the capacity of each link is assumed tobe fixed and pre-determined, in multi-hop wire-less networks, the attainable capacity of eachwireless link depends on the interference levelperceived at the receiver. This, in turn, dependson the interaction of several functionalities thatare distributively handled by all network devicessuch as power control, routing, and rate policies.Hence, capacity and delay attainable at each linkare location dependent, vary continuously, andmay be bursty in nature, thus making QoS provi-sioning a challenging task.

• Cross-layer coupling of functionalities. In multi-hop wireless networks, there is a strict interde-pendence among functions handled at all layersof the communication stack. Functionalitieshandled at different layers are inherently andstrictly coupled due to the shared nature of thewireless communication channel. Hence, the var-ious functionalities aimed at QoS provisioningshould not be treated separately when efficientsolutions are sought.

• Multimedia in-network processing. Processing ofmultimedia content has mostly been approachedas a problem isolated from the network-designproblem, with a few exceptions such as jointsource-channel coding [44] and channel-adaptivestreaming [51]. Hence, research that addressedthe content delivery aspects has typically not con-sidered the characteristics of the source contentand has primarily studied cross-layer interactionsamong lower layers of the protocol stack. How-ever, the processing and delivery of multimediacontent are not independent and their interactionhas a major impact on the levels of QoS that canbe delivered. WMSNs will allow performing mul-timedia in-network processing algorithms on theraw data. Hence, the QoS required at the applica-tion level will be delivered by means of a combi-nation of both cross-layer optimization of the

Page 4: Wireless Multimedia Networks Survey

924 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

communication process, and in-network process-ing of raw data streams that describe the phe-nomenon of interest from multiple views, withdifferent media, and on multiple resolutions.Hence, it is necessary to develop application-independent and self-organizing architectures toflexibly perform in-network processing of multi-media contents.

Efforts from several research areas will need toconverge to develop efficient and flexible WMSNs,and this in turn, will significantly enhance ourability to interact with the physical environment.These include advances in the understanding ofenergy-constrained wireless communications, andthe integration of advanced multimedia processingtechniques in the communication process. Anothercrucial issue is the development of flexible systemarchitectures and software to allow querying thenetwork to specify the required service (thus provid-ing abstraction from implementation details). At thesame time, it is necessary to provide the service inthe most efficient way, which may be in contrastwith the need for abstraction.

In this paper, we survey the state of the art inalgorithms, protocols, and hardware for the devel-opment of wireless multimedia sensor networks,and discuss open research issues in detail. In partic-ular, in Section 2 we point out the characteristics ofwireless multimedia sensor networks, i.e., the majorfactors influencing their design. In Section 3, wesuggest possible architectures for WMSNs anddescribe their characterizing features. In Section 4,we discuss and classify existing hardware and proto-typal implementations for WMSNs, while in Section5 we discuss possible advantages and challenges ofmultimedia in-network processing. In Sections 6–10 we discuss existing solutions and open researchissues at the application, transport, network, link,and physical layers of the communication stack,respectively. In Section 11, we discuss cross-layersynergies and possible optimizations, while inSection 12 we discuss additional complementaryresearch areas such as actuation, synchronizationand security. Finally, in Section 13 we concludethe paper.

2. Factors influencing the design of multimedia sensor

networks

Wireless Multimedia Sensor Networks (WMSNs)will be enabled by the convergence of communica-

tion and computation with signal processing andseveral branches of control theory and embeddedcomputing. This cross-disciplinary research willenable distributed systems of heterogeneous embed-ded devices that sense, interact, and control thephysical environment. There are several factors thatmainly influence the design of a WMSN, which areoutlined in this section.

• Application-specific QoS requirements. The widevariety of applications envisaged on WMSNs willhave different requirements. In addition to datadelivery modes typical of scalar sensor networks,multimedia data include snapshot and streaming

multimedia content. Snapshot-type multimediadata contain event triggered observations obtainedin a short time period. Streaming multimediacontent is generated over longer time periodsand requires sustained information delivery.Hence, a strong foundation is needed in terms ofhardware and supporting high-level algorithmsto deliver QoS and consider application-specificrequirements. These requirements may pertainto multiple domains and can be expressed, amongstothers, in terms of a combination of bounds onenergy consumption, delay, reliability, distortion,or network lifetime.

• High bandwidth demand. Multimedia content,especially video streams, require transmissionbandwidth that is orders of magnitude higherthan that supported by currently available sen-sors. For example, the nominal transmission rateof state-of-the-art IEEE 802.15.4 compliant com-ponents such as Crossbow’s [3] MICAz orTelosB [6] motes is 250 kbit/s. Data rates at leastone order of magnitude higher may be requiredfor high-end multimedia sensors, with compara-ble power consumption. Hence, high data rateand low-power consumption transmission tech-niques need to be leveraged. In this respect, theultra wide band (UWB) transmission techniqueseems particularly promising for WMSNs, andits applicability is discussed in Section 10.

• Multimedia source coding techniques. Uncom-pressed raw video streams require excessivebandwidth for a multi-hop wireless environment.For example, a single monochrome frame inthe NTSC-based Quarter Common Intermediate

Format (QCIF, 176 · 120), requires around21 Kbyte, and at 30 frames per second (fps), avideo stream requires over 5 Mbit/s. Hence, it isapparent that efficient processing techniques for

Page 5: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 925

lossy compression are necessary for multimediasensor networks. Traditional video coding tech-niques used for wireline and wireless communica-tions are based on the idea of reducing the bitrate generated by the source encoder by exploit-ing source statistics. To this aim, encoders relyon intra-frame compression techniques to reduceredundancy within one frame, while they leverageinter-frame compression (also known as predic-

tive encoding or motion estimation) to exploitredundancy among subsequent frames to reducethe amount of data to be transmitted and stored,thus achieving good rate-distortion performance.Since predictive encoding requires complexencoders, powerful processing algorithms, andentails high energy consumption, it may not besuited for low-cost multimedia sensors. However,it has recently been shown [50] that the tradi-tional balance of complex encoder and simpledecoder can be reversed within the frameworkof the so-called distributed source coding, whichexploits the source statistics at the decoder, andby shifting the complexity at this end, allowsthe use of simple encoders. Clearly, such algo-rithms are very promising for WMSNs and espe-cially for networks of video sensors, where it maynot be feasible to use existing video encoders atthe source node due to processing and energyconstraints.

• Multimedia in-network processing. WMSNs allowperforming multimedia in-network processingalgorithms on the raw data extracted from theenvironment. This requires new architecturesfor collaborative, distributed, and resource-con-strained processing that allow for filtering andextraction of semantically relevant informationat the edge of the sensor network. This mayincrease the system scalability by reducing thetransmission of redundant information, mergingdata originated from multiple views, on differentmedia, and with multiple resolutions. For exam-ple, in video security applications, informationfrom uninteresting scenes can be compressed toa simple scalar value or not be transmittedaltogether, while in environmental applications,distributed filtering techniques can create atime-elapsed image [120]. Hence, it is necessaryto develop application-independent architecturesto flexibly perform in-network processing of themultimedia content gathered from the environ-ment. For example, IrisNet [93] uses applica-tion-specific filtering of sensor feeds at the

source, i.e., each application processes its desiredsensor feeds on the CPU of the sensor nodeswhere data are gathered. This dramaticallyreduces the bandwidth consumed, since insteadof transferring raw data, IrisNet sends only apotentially small amount of processed data.However, the cost of multimedia processing algo-rithms may be prohibitive for low-end multime-dia sensors. Hence, it is necessary to developscalable and energy-efficient distributed filteringarchitectures to enable processing of redundantdata as close as possible to the periphery of thenetwork.

• Power consumption. Power consumption is a fun-damental concern in WMSNs, even more than intraditional wireless sensor networks. In fact, sen-sors are battery-constrained devices, while multi-media applications produce high volumes ofdata, which require high transmission rates, andextensive processing. While the energy consump-tion of traditional sensor nodes is known to bedominated by the communication functionalities,this may not necessarily be true in WMSNs.Therefore, protocols, algorithms and architec-tures to maximize the network lifetime while pro-viding the QoS required by the application are acritical issue.

• Flexible architecture to support heterogeneousapplications. WMSN architectures will supportseveral heterogeneous and independent applica-tions with different requirements. It is necessaryto develop flexible, hierarchical architectures thatcan accommodate the requirements of all theseapplications in the same infrastructure.

• Multimedia coverage. Some multimedia sensors,in particular video sensors, have larger sensingradii and are sensitive to direction of acquisition(directivity). Furthermore, video sensors can cap-ture images only when there is unobstructed lineof sight between the event and the sensor. Hence,coverage models developed for traditional wire-less sensor networks are not sufficient for pre-deployment planning of a multimedia sensornetwork.

• Integration with Internet (IP) architecture. It is offundamental importance for the commercialdevelopment of sensor networks to provide ser-vices that allow querying the network to retrieveuseful information from anywhere and at anytime. For this reason, future WMSNs willbe remotely accessible from the Internet, andwill therefore need to be integrated with the IP

Page 6: Wireless Multimedia Networks Survey

926 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

architecture. The characteristics of WSNs ruleout the possibility of all-IP sensor networks andrecommend the use of application level gatewaysor overlay IP networks as the best approach forintegration between WSNs and the Internet[138].

• Integration with other wireless technologies.Large-scale sensor networks may be created byinterconnecting local ‘‘islands’’ of sensorsthrough other wireless technologies. This needsto be achieved without sacrificing on the effi-ciency of the operation within each individualtechnology.

3. Network architecture

The problem of designing a scalable network

architecture is of primary importance. Most propos-als for wireless sensor networks are based on a flat,homogenous architecture in which every sensor hasthe same physical capabilities and can only interactwith neighboring sensors. Traditionally, the researchon algorithms and protocols for sensor networkshas focused on scalability, i.e., how to design solu-tions whose applicability would not be limited by

Fig. 1. Reference architecture of a wir

the growing size of the network. Flat topologiesmay not always be suited to handle the amount oftraffic generated by multimedia applications includ-ing audio and video. Likewise, the processing powerrequired for data processing and communications,and the power required to operate it, may not beavailable on each node.

3.1. Reference architecture

In Fig. 1, we introduce a reference architecturefor WMSNs, where three sensor networks with dif-ferent characteristics are shown, possibly deployedin different physical locations. The first cloud onthe left shows a single-tier network of homogeneousvideo sensors. A subset of the deployed sensors havehigher processing capabilities, and are thus referredto as processing hubs. The union of the processinghubs constitutes a distributed processing architec-ture. The multimedia content gathered is relayedto a wireless gateway through a multi-hop path.The gateway is interconnected to a storage hub, thatis in charge of storing multimedia content locallyfor subsequent retrieval. Clearly, more complexarchitectures for distributed storage can be imple-mented when allowed by the environment and the

eless multimedia sensor network.

Page 7: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 927

application needs, which may result in energy sav-ings since by storing it locally, the multimediacontent does not need to be wirelessly relayed toremote locations. The wireless gateway is alsoconnected to a central sink, which implements thesoftware front-end for network querying andtasking. The second cloud represents a single-tieredclustered architecture of heterogeneous sensors(only one cluster is depicted). Video, audio, andscalar sensors relay data to a central clusterhead,which is also in charge of performing intensive mul-timedia processing on the data (processing hub).The clusterhead relays the gathered content to thewireless gateway and to the storage hub. The lastcloud on the right represents a multi-tiered network,with heterogeneous sensors. Each tier is in chargeof a subset of the functionalities. Resource-con-strained, low-power scalar sensors are in chargeof performing simpler tasks, such as detectingscalar physical measurements, while resource-rich,high-power devices are responsible for more com-plex tasks. Data processing and storage can beperformed in a distributed fashion at each differenttier.

3.2. Single-tier vs. multi-tier sensor deployment

One possible approach for designing a multime-dia sensor application is to deploy homogeneoussensors and program each sensor to perform all pos-sible application tasks. Such an approach yields aflat, single-tier network of homogeneous sensornodes. An alternative, multi-tier approach is to useheterogeneous elements [69]. In this approach,resource-constrained, low-power elements are incharge of performing simpler tasks, such as detect-ing scalar physical measurements, while resource-rich, high-power devices take on more complextasks. For instance, a surveillance application canrely on low-fidelity cameras or scalar acoustic sen-sors to perform motion or intrusion detection, whilehigh-fidelity cameras can be woken up on-demandfor object recognition and tracking. In [68], amulti-tier architecture is advocated for video sensornetworks for surveillance applications. The architec-ture is based on multiple tiers of cameras with differ-ent functionalities, with the lower tier constituted oflow-resolution imaging sensors, and the higher tiercomposed of high-end pan-tilt-zoom cameras. It isargued, and shown by means of experiments, thatsuch an architecture offers considerable advantageswith respect to a single-tier architecture in terms

of scalability, lower cost, better coverage, higherfunctionality, and better reliability.

3.3. Coverage

In traditional WSNs, sensor nodes collect infor-mation from the environment within a pre-definedsensing range, i.e., a roughly circular area definedby the type of sensor being used.

Multimedia sensors generally have larger sensingradii and are also sensitive to the direction of dataacquisition. In particular, cameras can captureimages of objects or parts of regions that are notnecessarily close to the camera itself. However, theimage can obviously be captured only when thereis an unobstructed line-of-sight between the eventand the sensor. Furthermore, each multimediasensor/camera perceives the environment or theobserved object from a different and unique view-point, given the different orientations and positionsof the cameras relative to the observed event orregion. In [118], a preliminary investigation of thecoverage problem for video sensor networks is con-ducted. The concept of sensing range is replacedwith the camera’s field of view, i.e., the maximumvolume visible from the camera. It is also shownhow an algorithm designed for traditional sensornetworks does not perform well with video sensorsin terms of coverage preservation of the monitoredarea.

4. Multimedia sensor hardware

In this section, we review and classify existingimaging, multimedia, and processing wirelessdevices that will find application in next generationwireless multimedia sensor networks. In particular,we discuss existing hardware, with a particularemphasis on video capturing devices, review existingimplementations of multimedia sensor networks,and discuss current possibilities for energy harvest-

ing for multimedia sensor devices.

4.1. Enabling hardware platforms

High-end pan-tilt-zoom cameras and high resolu-tion digital cameras are widely available on the mar-ket. However, while such sophisticated devices canfind application as high-quality tiers of multimediasensor networks, we concentrate on low-cost, low-energy consumption imaging and processing devicesthat will be densely deployed and provide detailed

Page 8: Wireless Multimedia Networks Survey

928 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

visual information from multiple disparate view-points, help overcoming occlusion effects, and thusenable enhanced interaction with the environment.

4.1.1. Low-resolution imaging motes

The recent availability of CMOS imaging sensors[61] that capture and process an optical image withina single integrated chip, thus eliminating the need formany separate chips required by the traditionalcharged-coupled device (CCD) technology, hasenabled the massive deployment of low-cost visualsensors. CMOS image sensors are already in manyindustrial and consumer sectors, such as cell phones,personal digital assistants (PDAs), consumer andindustrial digital cameras. CMOS image quality isnow matching CCD quality in the low- and mid-range, while CCD is still the technology of choicefor high-end image sensors. The CMOS technologyallows integrating a lens, an image sensor and imageprocessing algorithms, including image stabilizationand image compression, on the same chip. Withrespect to CCD, cameras are smaller, lighter, andconsume less power. Hence, they constitute a suit-able technology to realize imaging sensors to beinterfaced with wireless motes.

However, existing CMOS imagers are stilldesigned to be interfaced with computationally richhost devices, such as cell phones or PDAs. For thisreason, the objective of the Cyclops module [103] isto fill the gap between CMOS cameras and compu-tationally constrained devices. Cyclops is an elec-tronic interface between a CMOS camera moduleand a wireless mote such as MICA2 or MICAz,and contains programmable logic and memory forhigh-speed data communication. Cyclops consistsof an imager (CMOS Agilent ADCM-1700 CIFcamera), an 8-bit ATMEL ATmega128L microcon-troller (MCU), a complex programmable logicdevice (CPLD), an external SRAM and an externalFlash. The MCU controls the imager, configures itsparameters, and performs local processing on theimage to produce an inference. Since image capturerequires faster data transfer and address generationthan the 4 MHz MCU used, a CPLD is used to pro-vide access to the high-speed clock. Cyclops firm-ware is written in the nesC language [48], based onthe TinyOS libraries. The module is connected toa host mote to which it provides a high level inter-face that hides the complexity of the imaging deviceto the host mote. Moreover, it can perform simpleinference on the image data and present it to thehost.

Researchers at Carnegie Mellon University aredeveloping the CMUcam 3, which is an embeddedcamera endowed with a CIF Resolution (352 · 288)RGB color sensor that can load images into memoryat 26 frames per second. CMUcam 3 has softwareJPEG compression and has a basic image manipula-tion library, and can be interface with an 802.15.4compliant TelosB mote [6].

In [41], the design of an integrated mote for wire-less image sensor networks is described. The designis driven by the need to endow motes with adequateprocessing power and memory size for image sens-ing applications. It is argued that 32-bit processorsare better suited for image processing than their 8-bit counterpart, which is used in most existingmotes. It is shown that the time needed to performoperations such as 2-D convolution on an 8-bit pro-cessor such as the ATMEL ATmega128 clocked at4 MHz is 16 times higher than with a 32-bitARM7 device clocked at 48 MHz, while the powerconsumption of the 32-bit processor is only six timeshigher. Hence, an 8-bit processor turns out to beslower and more energy-consuming. Based on thesepremises, a new image mote is developed based onan ARM7 32-bit CPU clocked at 48 MHz, withexternal FRAM or Flash memory, 802.15.4 compli-ant Chipcon CC2420 radio, that is interfaced withmid-resolution ADCM-1670 CIF CMOS sensorsand low-resolution 30 · 30 pixel optical sensors.

The same conclusion is drawn in [81], where theenergy consumption of the 8-bit Atmel AVR pro-cessor clocked at 8 MHz is compared to that ofthe PXA255 32-bit Intel processor, embedded on aStargate platform [10] and clocked at 400 MHz.Three representative algorithms are selected asbenchmarks, i.e., the cyclic redundancy check, afinite impulse response filter, and a fast Fouriertransform. Surprisingly, it is shown that even forsuch relatively simple algorithms the energy con-sumption of an 8-bit processor is between one andtwo orders of magnitude higher.

4.1.2. Medium-resolution imaging motes based on the

Stargate platformIntel has developed several prototypes that con-

stitute important building platform for WMSNapplications. The Stargate board [10] is a high-per-formance processing platform designed for sensor,signal processing, control, robotics, and sensor net-work applications. It is designed by Intel and pro-duced by Crossbow. Stargate is based on Intel’sPXA-255 XScale 400 MHz RISC processor, which

Page 9: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 929

is the same processor found in many handheld com-puters including the Compaq IPAQ and the DellAxim. Stargate has 32 Mbyte of Flash memory,64 Mbyte of SDRAM, and an on-board connectorfor Crossbow’s MICA2 or MICAz motes as wellas PCMCIA Bluetooth or IEEE 802.11 cards.Hence, it can work as a wireless gateway and as acomputational hub for in-network processing algo-rithms. When connected with a webcam or othercapturing device, it can function as a medium-reso-lution multimedia sensor, although its energy con-sumption is still high, as documented in [80].Moreover, although efficient software implementa-tions exist, XScale processors do not have hardwaresupport for floating point operations, which may beneeded to efficiently perform multimedia processingalgorithms.

Intel has also developed two prototypal genera-tions of wireless sensors, known as Imote andImote2. Imote is built around an integrated wirelessmicrocontroller consisting of an 8-bit 12 MHzARM7 processor, a Bluetooth radio, 64 KbyteRAM and 32 Kbyte FLASH memory, as well asseveral I/O options. The software architecture isbased on an ARM port of TinyOS. The second gen-eration of Intel motes has a common core to thenext generation Stargate 2 platform, and is builtaround a new low-power 32-bit PXA271 XScaleprocessor at 320/416/520 MHz, which enables per-forming DSP operations for storage or compres-sion, and an IEEE 802.15.4 ChipCon CC2420radio. It has large on-board RAM and Flash mem-ories (32 Mbyte), additional support for alternateradios, and a variety of high-speed I/O to connectdigital sensors or cameras. Its size is also very lim-ited, 48 · 33 mm, and it can run the Linux operatingsystem and Java applications.

4.2. Energy harvesting

As mentioned before, techniques for prolongingthe lifetime of battery-powered sensors have beenthe focus of a vast amount of literature in sensornetworks. These techniques include hardware opti-mizations such as dynamic optimization of voltageand clock rate, wake-up procedures to keep elec-tronics inactive most of the time, and energy-awareprotocol development for sensor communications.In addition, energy-harvesting techniques, whichextract energy from the environment where the sen-sor itself lies, offer another important mean to pro-long the lifetime of sensor devices.

Systems able to perpetually power sensors basedon simple COTS photovoltaic cells coupled withsupercapacitors and rechargeable batteries havebeen already demonstrated [64]. In [96], the stateof the art in more unconventional techniques forenergy harvesting (also referred to as energy scav-

enging) is surveyed. Technologies to generate energyfrom background radio signals, thermoelectric con-version, vibrational excitation, and the humanbody, are overviewed.

As far as collecting energy from backgroundradio signals is concerned, unfortunately, an electricfield of 1 V/m yields only 0.26 lW/cm2, as opposedto 100 lW/cm2 produced by a crystalline siliconsolar cell exposed to bright sunlight. Electric fieldsof intensity of a few volts per meter are only encoun-tered close to strong transmitters. Another practice,which consists in broadcasting RF energy deliber-ately to power electronic devices, is severely limitedby legal limits set by health and safety concerns.

While thermoelectric conversion may not be suit-able for wireless devices, harvesting energy fromvibrations in the surrounding environment may pro-vide another useful source of energy. Vibrationalmagnetic power generators based on moving mag-nets or coils may yield powers that range from tensof microwatts when based on microelectromechani-cal system (MEMS) technologies to over a milliwattfor larger devices. Other vibrational microgenera-tors are based on charged capacitors with movingplates, and depending on their excitation and powerconditioning, yield power on the order of 10 lW. In[96], it is also reported that recent analysis [91] sug-gested that 1 cm3 vibrational microgenerators canbe expected to yield up to 800 lW/cm3 frommachine-induced stimuli, which is orders of magni-tude higher than what provided by currently avail-able microgenerators. Hence, this is a promisingarea of research for small battery-powered devices.

While these techniques may provide an addi-tional source of energy and help prolong the lifetimeof sensor devices, they yield power that is severalorders of magnitude lower as compared to thepower consumption of state-of-the-art multimediadevices. Hence, they may currently be suitable onlyfor very-low duty cycle devices.

4.3. Examples of deployed multimedia sensor

networks

There have been several recent experimentalstudies, mostly limited to video sensor networks.

Page 10: Wireless Multimedia Networks Survey

930 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

Panoptes [46] is a system developed for environmen-tal observation and surveillance applications, basedon Intel StrongARM PDA platforms with a Logi-tech webcam as a video capture device. Here, videosensors are high-end devices with Linux operatingsystem, 64 Mbyte of memory, and are networkedthrough 802.11 networking cards. The systemincludes spatial compression (but not temporal),distributed filtering, buffering, and adaptive priori-ties for the video stream.

In [35], a system whose objective is to limit thecomputation, bandwidth, and human attention bur-dens imposed by large-scale video surveillance sys-tems is described. In-network processing is usedon each camera to filter out uninteresting eventslocally, avoiding disambiguation and tracking ofirrelevant environmental distractors. A resourceallocation algorithm is also proposed to steer pan-tilt cameras to follow interesting targets while main-taining awareness of possibly emerging new targets.

In [69], the design and implementation of Sens-Eye, a multi-tier network of heterogeneous wirelessnodes and cameras, is described. The surveillanceapplication consists of three tasks: object detection,recognition and tracking. The objective of thedesign is to demonstrate that a camera sensor net-work containing heterogeneous elements providesnumerous benefits over traditional homogeneoussensor networks. For this reason, SensEye followsa three-tier architecture, as shown in Fig. 2. Thelowest tier consists of low-end devices, i.e., MICA2Motes equipped with 900 MHz radios interfacedwith scalar sensors, e.g., vibration sensors. The sec-ond tier is made up of motes equipped with low-

Tier 1

Tier 2

Tier 3

Scalar Sensors + Mote

Low-res cam + Mote

Webcam + Stargate

Video stream

handoff

wakeup

wakeup

Fig. 2. The multi-tier architecture of SensEye [69].

fidelity Cyclops [103] or CMUcam [107] camera sen-sors. The third tier consists of Stargate [10] nodesequipped with webcams. Each Stargate is equippedwith an embedded 400 MHz XScale processor thatruns Linux and a webcam that can capture higherfidelity images than tier 2 cameras. Tier 3 nodes alsoperform gateway functions, as they are endowedwith a low data rate radio to communicate withmotes in tiers 1–2 at 900 MHz, and an 802.11 radioto communicate with tier 3 Stargate nodes. An addi-tional fourth tier may consist of a sparse deploy-ment of high-resolution, high-end pan-tilt-zoomcameras connected to embedded PCs. The camerasensors at this tier can be used to track movingobjects, and can be utilized to fill coverage gapsand provide additional redundancy. The underlyingdesign principle is to map each task requested by theapplication to the lowest tier with sufficientresources to perform the task. Devices from highertiers are woken up on-demand only when necessary.For example, a high-resolution camera can bewoken up to retrieve high resolution images of anobject that has been previously detected by a lowertier. It is shown that the system can achieve an orderof magnitude reduction in energy consumptionwhile providing comparable surveillance accuracywith respect to single-tier surveillance systems.

In [80], experimental results on the energy con-sumption of a video sensor network testbed are pre-sented. Each sensing node in the testbed consists ofa Stargate board equipped with an 802.11 wirelessnetwork card and a Logitech QuickCam Pro 4000webcam. The energy consumption is assessed usinga benchmark that runs basic tasks such as process-ing, flash memory access, image acquisition, andcommunication over the network. Both steady stateand transient energy consumption behaviorobtained by direct measurements of current with adigital multimeter are reported. In the steady state,it is shown that communication-related tasks areless energy-consuming than intensive processingand flash access when the radio modules are loaded.Interestingly, and unlike in traditional wireless sen-sor networks [99], the processing-intensive bench-mark results in the highest current requirement,and transmission is shown to be only about 5%more energy-consuming than reception. Experimen-tal results also show that delay and additionalamount of energy consumed due to transitions(e.g., to go to sleep mode) are not negligible andmust be accounted for in network and protocoldesign.

Page 11: Wireless Multimedia Networks Survey

Fig. 3. Stargate board interfaced with a medium resolutioncamera. Stargate hosts an 802.11 card and a MICAz mote thatfunctions as a gateway to the sensor network.

Fig. 4. Acroname GARCIA, a mobile robot with a mountedpan-tilt camera and endowed with 802.11 as well as Zigbeeinterfaces.

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 931

IrisNet (Internet-scale Resource-Intensive SensorNetwork Services) [93] is an example software plat-form to deploy heterogeneous services on WMSNs.IrisNet allows harnessing a global, wide-area sensornetwork by performing Internet-like queries on thisinfrastructure. Video sensors and scalar sensors arespread throughout the environment, and collectpotentially useful data. IrisNet allows users to per-form Internet-like queries to video sensors andother data. The user views the sensor network as asingle unit that can be queried through a high-levellanguage. Each query operates over data collectedfrom the global sensor network, and allows simpleGoogle-like queries as well as more complex queriesinvolving arithmetic and database operators.

The architecture of IrisNet is two-tiered: hetero-geneous sensors implement a common shared inter-face and are called sensing agents (SA), while thedata produced by sensors is stored in a distributeddatabase that is implemented on organizing agents(OA). Different sensing services are run simulta-neously on the architecture. Hence, the same hard-ware infrastructure can provide different sensingservices. For example, a set of video sensors canprovide a parking space finder service, as well as asurveillance service. Sensor data is represented inthe Extensible Markup Language (XML), whichallows easy organization of hierarchical data. Agroup of OAs is responsible for a sensing service,collects data produced by that service, and orga-nizes the information in a distributed database toanswer the class of relevant queries. IrisNet alsoallows programming sensors with filtering code thatprocesses sensor readings in a service-specific way.A single SA can execute several such software filters(called senselets) that process the raw sensor databased on the requirements of the service that needsto access the data. After senselet processing, the dis-tilled information is sent to a nearby OA.

We have recently built an experimental testbedat the Broadband and Wireless Networking(BWN) Laboratory at Georgia Tech based on cur-rently off-the-shelf advanced devices to demonstratethe efficiency of algorithms and protocols for multi-media communications through wireless sensornetworks.

The testbed is integrated with our scalar sensornetwork testbed, which is composed of a heteroge-neous collection of imotes from Intel and MICAzmotes from Crossbow. Although our testbedalready includes 60 scalar sensors, we plan toincrease its size to deploy a higher scale testbed that

allows testing more complex algorithms and assessthe scalability of the communication protocolsunder examination.

The WMSN-testbed includes three different typesof multimedia sensors: low-end imaging sensors,medium-quality webcam-based multimedia sensors,and pan-tilt cameras mounted on mobile robots.

Low-end imaging sensors such as CMOS cam-eras can be interfaced with Crossbow MICAzmotes. Medium-end video sensors are based onLogitech webcams interfaced with Stargate plat-forms (see Fig. 3).

The high-end video sensors consist of pan-tiltcameras installed on an Acroname GARCIA

Page 12: Wireless Multimedia Networks Survey

Fig. 5. GARCIA deployed on the sensor testbed. It acts as amobile sink, and can move to the area of interest for closer visualinspection. It can also coordinate with other actors and has built-in collision avoidance capability.

932 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

robotic platform [1], which we refer to as actor, andshown in Fig. 4. Actors constitute a mobile platformthat can perform adaptive sampling based on eventfeatures detected by low-end motes. The mobileactor can redirect high-resolution cameras to aregion of interest when events are detected bylower-tier, low-resolution video sensors that aredensely deployed, as seen in Fig. 5.

The testbed also includes storage and computa-tional hubs, which are needed to store large multi-media content and perform computationallyintensive multimedia processing algorithms.

5. Collaborative in-network processing

As discussed previously, collaborative in-net-work multimedia processing techniques are of greatinterest in the context of a WMSN. It is necessary todevelop architectures and algorithms to flexibly per-form these functionalities in-network with minimumenergy consumption and limited execution time.The objective is usually to avoid transmitting largeamounts of raw streams to the sink by processingthe data in the network to reduce the communica-tion volume.

Given a source of data (e.g., a video stream), dif-ferent applications may require diverse information(e.g., raw video stream vs. simple scalar or binaryinformation inferred by processing the videostream). This is referred to as application-specific

querying and processing. Hence, it is necessary todevelop expressive and efficient querying languages,and to develop distributed filtering and in-network

processing architectures, to allow real-time retrievalof useful information.

Similarly, it is necessary to develop architecturesthat efficiently allow performing data fusion orother complex processing operations in-network.Algorithms for both inter-media and intra-mediadata aggregation and fusion need to be developed,as simple distributed processing schemes developedfor existing scalar sensors are not suitable for com-putation-intensive processing required by multime-dia contents. Multimedia sensor networks mayrequire computation-intensive processing algo-rithms (e.g., to detect the presence of suspiciousactivity from a video stream). This may require con-siderable processing to extract meaningful informa-tion and/or to perform compression. A fundamentalquestion to be answered is whether this processingcan be done on sensor nodes (i.e., a flat architectureof multi-functional sensors that can perform anytask), or if the need for specialized devices, e.g.,computation hubs, arises.

In what follows, we discuss a non-exhaustive setof significative examples of processing techniquesthat would be applicable distributively in a WMSN,and that will likely drive research on architecturesand algorithms for distributed processing of rawsensor data.

5.1. Data alignment and image registration

Data alignment consists of merging informationfrom multiple sources. One of the most widespreaddata alignment concepts, image registration [137], isa family of techniques, widely used in areas such asremote sensing, medical imaging, and computervision, to geometrically align different images (refer-ence and sensed images) of the same scene taken atdifferent times, from different viewpoints, and/or bydifferent sensors:

• Different Viewpoints (Multi-view Analysis). Imagesof the same scene are acquired from differentviewpoints, to gain a larger 2D view or a 3D rep-resentation of the scene of interest. Main applica-tions are in remote sensing, computer vision and3D shape recovery.

• Different times (multi-temporal analysis). Imagesof the same scene are acquired at different times.The aim is to find and evaluate changes in time inthe scene of interest. The main applications arein computer vision, security monitoring, andmotion tracking.

Page 13: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 933

• Different sensors (multi-modal analysis). Imagesof the same scene are acquired by different sen-sors. The objective is to integrate the informationobtained from different source streams to gainmore complex and detailed scene representation.

Registration methods usually consist of foursteps, i.e., feature detection, feature matching, trans-

form model estimation, and image resampling and

transformation. In feature detection, distinctiveobjects such as closed-boundary regions, edges, con-tours, line intersections, corners, etc. are detected.In feature matching, the correspondence betweenthe features detected in the sensed image and thosedetected in the reference image is established. Intransform model estimation, the type and parame-ters of the so-called mapping functions, which alignthe sensed image with the reference image, are esti-mated. The parameters of the mapping functionsare computed by means of the established featurecorrespondence. In the last step, image resamplingand transformation, the sensed image is trans-formed by means of the mapping functions.

These functionalities can clearly be prohibitivefor a single sensor. Hence, research is needed onhow to perform these functionalities on parallelarchitectures of sensors to produce single data sets.

5.2. WMSNs as distributed computer vision systems

Computer vision is a subfield of artificial intelli-gence, whose purpose is to allow a computer toextract features from a scene, an image or multi-dimensional data in general. The objective is topresent this information to a human operator orto control some process (e.g., a mobile robot or anautonomous vehicle). The image data that is fedinto a computer vision system is often a digitalimage, a video sequence, a 3D volume from atomography device or other multimedia content.Traditional computer vision algorithms requireextensive computation, which in turn entails highpower consumption.

WMSNs enable a new approach to computervision, where visual observations across the networkcan be performed by means of distributed computa-tions on multiple, possibly low-end, vision nodes.This requires tools to interface with the user suchas new querying languages and abstractions toexpress complex tasks that are then distributivelyaccomplished through low-level operations on mul-tiple vision nodes. To this aim, it is necessary to

coordinate computations across the vision nodesand return the integrated results, which will consistof metadata information, to the final user.

In [102], the proposed Deep Vision network per-forms operations including object detection or clas-sification, image segmentation, and motion analysisthrough a network of low-end MICA motesequipped with Cyclops cameras [103]. Informationsuch as the presence of an intruder, the number ofvisitors in a scene or the probability of presence ofa human in the monitored area is obtained by col-lecting the results of these operations. Deep Visionprovides a querying interface to the user in the formof declarative queries. Each operation is representedas an attribute that can be executed through anappropriate query. In this way, low-level operationsand processing are encapsulated in a high-level que-rying interface that enables simple interaction withthe video network. As an example, the vision net-work can be deployed in areas with public andrestricted access spaces. The task of detectingobjects in the restricted-access area can be expressedas a query that requests the result of object detec-tion computations such as

SELECT Object,Location

REPORT = 30FROM NetworkWHERE Access = Restricted

PERIOD = 30.

The above query triggers the execution of theobject detection process on the vision nodes thatare located in the restricted-access areas in 30 sintervals.

6. Application layer

The functionalities handled at the applicationlayer of a WMSN are characterized by high hetero-geneity, and encompass traditional communicationproblems as well as more general system challenges.The services offered by the application layer include:(i) providing traffic management and admission con-trol functionalities, i.e., prevent applications fromestablishing data flows when the network resourcesneeded are not available; (ii) performing source

coding according to application requirements andhardware constraints, by leveraging advanced mul-timedia encoding techniques; (iii) providing flexible

and efficient system software, i.e., operating systemsand middleware, to export services for higher-layer

Page 14: Wireless Multimedia Networks Survey

934 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

applications to build upon; (iv) providing primitives

for applications to leverage collaborative, advanced

in-network multimedia processing techniques. Inthis section, we provide an overview of thesechallenges.

6.1. Traffic classes

Admission control has to be based on QoSrequirements of the overlying application. We envi-sion that WMSNs will need to provide support anddifferentiated service for several different classesof applications. In particular, they will need toprovide differentiated service between real-timeand delay-tolerant applications, and loss-tolerantand loss-intolerant applications. Moreover, someapplications may require a continuous stream ofmultimedia data for a prolonged period of time(multimedia streaming), while some other applica-tions may require event triggered observationsobtained in a short time period (snapshot multimedia

content). The main traffic classes that need to besupported are:

• Real-time, Loss-tolerant, Multimedia Streams.This class includes video and audio streams, ormulti-level streams composed of video/audioand other scalar data (e.g., temperature read-ings), as well as metadata associated with thestream, that need to reach a human or automatedoperator in real-time, i.e., within strict delaybounds, and that are however relatively loss tol-erant (e.g., video streams can be within a certainlevel of distortion). Traffic in this class usuallyhas high bandwidth demand.

• Delay-tolerant, Loss-tolerant, Multimedia Streams.This class includes multimedia streams that, beingintended for storage or subsequent offline process-ing, do not need to be delivered within strict delaybounds. However, due to the typically high band-width demand of multimedia streams and to lim-ited buffers of multimedia sensors, data in thistraffic class needs to be transmitted almost inreal-time to avoid excessive losses.

• Real-time, Loss-tolerant, Data. This class mayinclude monitoring data from densely deployedscalar sensors such as light sensors whose moni-tored phenomenon is characterized by spatialcorrelation, or loss-tolerant snapshot multimediadata (e.g., images of a phenomenon taken fromseveral multiple viewpoints at the same time).Hence, sensor data has to be received timely

but the application is moderately loss-tolerant.The bandwidth demand is usually between lowand moderate.

• Real-time, Loss-intolerant, Data. This may includedata from time-critical monitoring processes suchas distributed control applications. The band-width demand varies between low and moderate.

• Delay-tolerant, Loss-intolerant, Data. This mayinclude data from critical monitoring processes,with low or moderate bandwidth demand thatrequire some form of offline post processing.

• Delay-tolerant, Loss-tolerant, Data. This mayinclude environmental data from scalar sensornetworks, or non-time-critical snapshot multime-dia content, with low or moderate bandwidthdemand.

QoS requirements have recently been consideredas application admission criteria for sensor networks.In [97], an application admission control algorithm isproposed whose objective is to maximize the networklifetime subject to bandwidth and reliability con-straints of the application. An application admissioncontrol method is proposed in [28], which determinesadmissions based on the added energy load andapplication rewards. While these approaches addressapplication level QoS considerations, they fail to con-sider multiple QoS requirements (e.g., delay, reliabil-ity, and energy consumption) simultaneously, asrequired in WMSNs. Furthermore, these solutionsdo not consider the peculiarities of WMSNs, i.e., theydo not try to base admission control on a tight bal-ancing between communication optimizations andin-network computation. There is a clear need fornew criteria and mechanisms to manage the admis-sion of multimedia flows according to the desiredapplication-layer QoS.

6.2. Multimedia encoding techniques

There exists a vast literature on multimediaencoding techniques. The captured multimedia con-tent should ideally be represented in such a way asto allow reliable transmission over lossy channels(error-resilient coding), using algorithms that mini-mize processing power and the amount of informa-tion to be transmitted. The main design objectivesof a coder for multimedia sensor networks are thus:

• High compression efficiency. Uncompressed rawvideo streams require high data rates and thusconsume excessive bandwidth and energy. It is

Page 15: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 935

necessary to achieve a high ratio of compres-sion to effectively limit bandwidth and energyconsumption.

• Low complexity. Multimedia encoders areembedded in sensor devices. Hence, they needto be low complexity to reduce cost and form fac-tors, and low-power to prolong the lifetime ofsensor nodes.

• Error resiliency. The source coder should providerobust and error-resilient coding of source data.

To achieve a high compression efficiency, the tra-ditional broadcasting paradigm for wireline andwireless communications, where video is com-pressed once at the encoder and decoded severaltimes, has been dominated by predictive encodingtechniques. These, used in the widely spread ISOMPEG schemes, or the ITU-T recommendationsH.263 [11] and H.264 [2] (also known as AVC orMPEG-4 part 10), are based on the idea of reducingthe bit rate generated by the source encoder byexploiting source statistics. Hence, intra-frame com-pression techniques are used to reduce redundancywithin one frame, while inter-frame compression(also known as predictive encoding or motion estima-

tion) exploits correlation among subsequent framesto reduce the amount of data to be transmittedand stored, thus achieving good rate-distortion per-formance. Since the computational complexity isdominated by the motion estimation functionality,these techniques require complex encoders, power-ful processing algorithms, and entail high energyconsumption, while decoders are simpler and loadedwith lower processing burden. For typical imple-mentations of state-of-the-art video compressionstandards, such as MPEG or H.263 and H.264,the encoder is 5–10 times more complex than thedecoder [50]. It is easy to see that to realize low-cost,low-energy-consumption multimedia sensors it isnecessary to develop simpler encoders, and stillretain the advantages of high compressionefficiency.

However, it is known from information-theoreticbounds established by Slepian and Wolf for losslesscoding [117] and by Wyner and Ziv [130] for lossycoding with decoder side information, that efficientcompression can be achieved by leveraging knowl-edge of the source statistics at the decoder only. Thisway, the traditional balance of complex encoder andsimple decoder can be reversed [50]. Techniques thatbuild upon these results are usually referred to asdistributed source coding. Distributed source coding

refers to the compression of multiple correlated sen-sor outputs that do not communicate with eachother [131]. Joint decoding is performed by a centralentity that receives data independently compressedby different sensors. However, practical solutionshave not been developed until recently. Clearly,such techniques are very promising for WMSNsand especially for networks of video sensors. Theencoder can be simple and low-power, while thedecoder at the sink will be complex and loaded withmost of the processing and energy burden. Thereader is referred to [131,50] for excellent surveyson the state of the art of distributed source codingin sensor networks and in distributed video coding,respectively. Other encoding and compressionschemes that may be considered for source codingof multimedia streams, including JPEG with differ-ential encoding, distributed coding of images takenby cameras having overlapping fields of view, ormulti-layer coding with wavelet compression, arediscussed in [90]. Here, we focus on recent advanceson low complexity encoders based on Wyner–Zivcoding [130], which are promising solutions for dis-tributed networks of video sensors that are likely tohave a major impact in future design of protocolsfor WMSNs.

The objective of a Wyner–Ziv video coder is toachieve lossy compression of video streams andachieve performance comparable to that of inter-frame encoding (e.g., MPEG), with complexity atthe encoder comparable to that of intra-frame cod-ers (e.g., Motion-JPEG).

6.2.1. Pixel-domain Wyner–Ziv encoder

In [14,15], a practical Wyner–Ziv encoder is pro-posed as a combination of a pixel-domain intra-frame encoder and inter-frame decoder system forvideo compression. A block diagram of the systemis reported in Fig. 6. A regularly spaced subset offrames is coded using a conventional intra-framecoding technique, such as JPEG, as shown at thebottom of the figure. These are referred to as key

frames. All frames between the key frames arereferred to as Wyner–Ziv frames and are intra-frameencoded but inter-frame decoded. The intra-frameencoder for Wyner–Ziv frames (shown on top) iscomposed of a quantizer followed by a Slepian–Wolf coder. Each Wyner–Ziv frame is quantizedand blocks of symbols are sent to the Slepian–Wolfcoder, which is implemented through rate-compati-ble punctured turbo codes (RCPT). The parity bitsgenerated by the RCPT coder are stored in a buffer.

Page 16: Wireless Multimedia Networks Survey

E

SIDE INFORMATION

KEY FRAMES INTRAFRAME

ENCODER(E.G. JPEG)

WYNER-ZIV FRAMES

QUANTIZER RCPTENCODER BUFFER

RCPTDECODER

INTRAFRAMEDECODER

RECONSTRUCTION

INTERPOLATION AND

EXTRAPOLATION

DECODED KEY FRAMES

DECODED WYNER-ZIV

FRAMES

DECODER FEEDBACKREQUEST ADDITIONAL

BITS

INTRAFRAME ENCODER INTERFRAME DECODER

SLEPIAN-WOLF CODER

Fig. 6. Block diagram of a pixel-domain Wyner–Ziv encoder [14].

936 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

A subset of these bits is then transmitted uponrequest from the decoder. This allows adapting therate based on the temporally varying statisticsbetween the Wyner–Ziv frame and the side informa-tion. The parity bits generated by the RCPT coderare in fact used to ‘‘correct’’ the frame interpo-lated at the decoder. For each Wyner–Ziv frame,the decoder generates the side information frameby interpolation or extrapolation of previouslydecoded key frames and Wyner–Ziv frames. Theside information is leveraged by assuming a Lapla-cian distribution of the difference between the indi-vidual pixels of the original frame and the sideinformation. The parameter defining the Laplaciandistribution is estimated online. The turbo decodercombines the side information and the parity bitsto reconstruct the original sequence of symbols. Ifreliable decoding of the original symbols is impossi-ble, the turbo decoder requests additional parity bitsfrom the encoder buffer.

Compared to predictive coding such as MPEG orH.26X, pixel-domain Wyner–Ziv encoding is muchsimpler. The Slepian–Wolf encoder only requirestwo feedback shift registers and an interleaver.Its performance, in terms of peak signal-to-noiseratio (PSNR), is 2–5 dB better than conventionalmotion-JPEG intra-frame coding. The main draw-back of this scheme is that it relies on online feed-back from the receiver. Hence it may not besuitable for applications where video is encodedand stored for subsequent use. Moreover, the feed-back may introduce excessive latency for videodecoding in a multi-hop network.

6.2.2. Transform-domain Wyner–Ziv encoderIn conventional source coding, a source vector is

typically decomposed into spectral coefficients byusing orthonormal transforms such as the DiscreteCosine Transform (DCT). These coefficients arethen individually coded with scalar quantizersand entropy coders. In [13], a transform-domainWyner–Ziv encoder is proposed. A block-wiseDCT of each Wyner–Ziv frame is performed. Thetransform coefficients are independently quantized,grouped into coefficient bands, and then com-pressed by a Slepian–Wolf turbo coder. As in thepixel-domain encoder described in the previous sec-tion, the decoder generates a side information framebased on previously reconstructed frames. Based onthe side information, a bank of turbo decodersreconstructs the quantized coefficient bands inde-pendently. The rate-distortion performance isbetween conventional intra-frame transform codingand conventional motion-compensated transformcoding.

A different approach consists of allowing somesimple temporal dependence estimation at the enco-der to perform rate control without the need forfeedback from the receiver. In the PRISM scheme[100], the encoder selects the coding mode basedon the frame difference energy between the currentframe and a previous frame. If the energy of the dif-ference is very small, the block is not encoded. If theblock difference is large, the block is intra-coded.Between these two situations, one of differentencoding modes with different rates is selected.The rate estimation does not involve motion

Page 17: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 937

compensation and hence is necessarily inaccurate, ifmotion compensation is used at the decoder.Further, the flexibility of the decoder is restricted.

6.3. System software and middleware

The development of efficient and flexible systemsoftware to make functional abstractions and infor-mation gathered by scalar and multimedia sensorsavailable to higher layer applications is one of themost important challenges faced by researchers tomanage complexity and heterogeneity of sensor sys-tems. As in [66], the term system software is usedhere to refer to operating systems, virtual machines,and middleware, which export services to higher-layer applications. Different multimedia sensor net-work applications are extremely diverse in theirrequirements and in the way they interact with thecomponents of a sensor system. Hence, the maindesired characteristics of a system software forWMSNs can be identified as follows:

• Provides a high-level interface to specify thebehavior of the sensor system. This includessemantically rich querying languages that allowspecifying what kind of data is requested fromthe sensor network, the quality of the requireddata, and how it should be presented to the user;

• Allows the user to specify application-specificalgorithms to perform in-network processing onthe multimedia content [47]. For example, theuser should be able to specify particular imageprocessing algorithms or multimedia codingformat;

• Long-lived, i.e., needs to smoothly support evo-lutions of the underlying hardware and software;

• Shared among multiple heterogeneous appli-cations;

• Shared among heterogeneous sensors and plat-forms. Scalar and multimedia sensor networksshould coexist in the same architecture, withoutcompromising on performance;

• Scalable.

There is an inherent trade-off between degrees offlexibility and network performance. Platform-inde-pendence is usually achieved through layers ofabstraction, which usually introduce redundancyand prevent the developer from accessing low-leveldetails and functionalities. However, WMSNs arecharacterized by the contrasting objectives of opti-mizing the use of the scarce network resources and

not compromising on performance. The principaldesign objective of existing operating systems for sen-sor networks such as TinyOS is high performance.However, their flexibility, inter-operability and rep-rogrammability are very limited. There is a need forresearch on systems that allow for this integration.

We believe that it is of paramount importance todevelop efficient, high level abstractions that willenable easy and fast development of sensor networkapplications. An abstraction similar to the famousBerkeley TCP sockets, that fostered the develop-ment of Internet applications, is needed for sensorsystems. However, differently from the Berkeleysockets, it is necessary to retain control on the effi-ciency of the low-level operations performed on bat-tery-limited and resource-constrained sensor nodes.

As a first step towards this direction, Chu et al.[34] recently proposed Sdlib, a sensor network dataand communications library built upon the nesclanguage [48] for applications that require best-effort collection of large-size data such as videomonitoring applications. The objective of the effortis to identify common functionalities shared byseveral sensor network applications and to developa library of thoroughly-tested, reusable and efficientnesC components that abstract high-level opera-tions common to most applications, while leavingdifferences among them to adjustable parameters.The library is called Sdlib, Sensor Data Library,as an analogy to the traditional C++ StandardTemplate Library. Sdlib provides an abstractionfor common operations in sensor networks whilethe developer is still able to access low-level opera-tions, which are implemented as a collection of nesCcomponents, when desired. Moreover, to retain effi-ciency of operations that are so critical for sensornetworks battery lifetime and resource constraints,Sdlib exposes policy decisions such as resource allo-cation and rate of operation to the developer, whilehiding the mechanisms of policy enforcement.

6.4. Open research issues

• While theoretical results on Slepian–Wolf andWyner–Ziv coding exist since 30 years, there isstill a lack of practical solutions. The net benefitsand the practicality of these techniques still needto be demonstrated.

• It is necessary to fully explore the trade-offsbetween the achieved fidelity in the descriptionof the phenomenon observed, and the resultingenergy consumption. As an example, the video

Page 18: Wireless Multimedia Networks Survey

938 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

distortion perceived by the final user depends onsource coding (frame rate, quantization), andon channel coding strength. For example, in asurveillance application, the objective of maxi-mizing the event detection probability is in con-trast with the objective of minimizing the powerconsumption.

• As discussed above, there is a need for high-layer abstractions that will allow fast develop-ment of sensor applications. However, due to theresource-constrained nature of sensor systems,it is necessary to control the efficiency of thelow-level operations performed on battery-limited and resource-constrained sensor nodes.

• There is a need for simple yet expressivehigh-level primitives for applications to leveragecollaborative, advanced in-network multimediaprocessing techniques.

7. Transport layer

In applications involving high-rate data, thetransport layer assumes special importance by pro-viding end-to-end reliability and congestion controlmechanisms. Particularly, in WMSNs, the followingadditional considerations are in order to accommo-date both the unique characteristics of the WSNparadigm and multimedia transport requirements.

• Effects of congestion. In WMSNs, the effect ofcongestion may be even more pronounced ascompared to traditional networks. When a bot-tleneck sensor is swamped with packets comingfrom several high-rate multimedia streams, apart

Transport Lay

TCP/UDP and TCP Friendly Schemes Appl

• TCP may be preferred over UDP unlike traditional wireless networks

• Compatible with the TCP rate control mechanism

E.g.. STCP [60], MPEG-TFRCP [92]

Reliability

• Per-packet deliverguarantee for selectpacket types

• Redundancy bycaching at intermednodes

Eg. RMST [119], PSFQ [127], (RT)2

Fig. 7. Classification of existing

from temporary disruption of the application, itmay cause rapid depletion of the node’s energy.While applications running on traditional wire-less networks may only experience performancedegradation, the energy loss (due to collisionsand retransmissions) can result in network parti-tion. Thus, congestion control algorithms mayneed to be tuned for immediate response andyet avoid oscillations of data rate along theaffected path.

• Packet re-ordering due to multi-path. Multiplepaths may exist between a given source-sink pair,and the order of packet delivery is strongly influ-enced by the characteristics of the route chosen.As an additional challenge, in real-time video/audio feeds or streaming media, information thatcannot be used in the proper sequence becomesredundant, thus stressing on the need for trans-port layer packet reordering.

We next explore the functionalities and supportprovided by the transport layer to address theseand other challenges of WMSNs. The following dis-cussion is classified into (1) TCP/UDP and TCPfriendly schemes for WMSNs, and (2) application-specific and non-standardized protocols. Fig. 7 sum-marizes the discussion in this section.

7.1. TCP/UDP and TCP friendly schemes for

WMSNs

For real-time applications like streaming media,the User Datagram Protocol (UDP) is preferredover TCP as timeliness is of greater concern than

er

ication Specific and Non-standard Protocols

• Spatio-temporalreporting

• Adjusting of reporting frequency based oncurrent congestion levels

Eg. ESRT [17]

Congestion Control Use of Multipath

• Better loadbalancing and robustness to channel state variability.

• Need to regulate multiple sources monitoring the sameevent

Eg. CODA [128], MRTP [79]

y ed

iate

[52]

transport layer protocols.

Page 19: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 939

reliability. However, in WMSNs, it is expected thatpackets are significantly compressed at the sourceand redundancy is reduced as far as possible owingto the high transmission overhead in the energy-constrained nodes. Under these conditions, we notethe following important characteristics that maynecessitate an approach very different from classicalwireless networks.

• Effect of dropping packets in UDP. Simply drop-ping packets during congestion conditions, asundertaken in UDP, may introduce discernabledisruptions in the order of a fraction of a second.This effect is even more pronounced if the packetdropped contains important original content notcaptured by inter-frame interpolation, like theRegion of Interest (ROI) feature used inJPEG2000 [12] or the I-frame used in the MPEGfamily.

• Support for traffic heterogeneity. Multimedia traf-fic comprising of video, audio, and still imagesexhibits a high level of heterogeneity and maybe further classified into periodic or event driven.The UDP header has no provision to allow anydescription of these traffic classes that may influ-ence congestion control policies. As a contrast tothis, the options field in the TCP header can bemodified to carry data specific information. Asan example, the Sensor Transmission ControlProtocol (STCP) [60] accommodates a differenti-ated approach by including relevant fields in theTCP header. Several other major changes to thetraditional TCP model are proposed in the roundtrip time estimation, congestion notification, thepacket drop policy and by introducing a reliabil-ity driven intimation of lost packets.

We thus believe that TCP with appropriate mod-ifications is preferable over UDP for WMSNs, ifstandardized protocols are to be used. With respectto sensor networks, several problems and theirlikely solutions like large TCP header size, data vs.address centric routing, energy efficiency, amongstothers, are identified and solutions are proposed in[42]. We next indicate the recent work in this direc-tion that evaluates the case for using TCP inWMSNs.

• Effect of jitter induced by TCP. A key factor thatlimits multimedia transport based on TCP, andTCP-like rate control schemes, is the jitter intro-duced by the congestion control mechanism. This

can be, however, mitigated to a large extent byplayout buffers at the sink, which is typicallyassumed to be rich in resources. As an example,the MPEG-TFRCP (TCP Friendly Rate ControlProtocol for MPEG-2 Video Transfer) [92] is anequation-based rate control scheme designed fortransporting MPEG video in a TCP-friendlymanner.

• Overhead of the reliability mechanism in TCP. Asdiscussed earlier, blind dropping of packets inUDP containing highly compressed video/audiodata may adversely affect the quality of transmis-sion. Yet, at the same time, the reliability mecha-nism provided by TCP introduces an end-to-endmessage passing overhead and energy efficiencymust also be considered. Distributed TCP Cach-ing (DTC) [43] overcomes these problems bycaching TCP segments inside the sensor networkand by local retransmission of TCP segments.The nodes closest to the sink are the last-hop for-warders on most of the high-rate data paths andthus run out of energy first. DTC shifts the bur-den of the energy consumption from nodes closeto the sink into the network, apart from reducingnetwork wide retransmissions.

• Regulating streaming through multiple TCP con-

nections. The availability of multiple pathsbetween source and sink can be exploited byopening multiple TCP connections for multime-dia traffic [94]. Here, the desired streaming rateand the allowed throughput reduction in presenceof bursty traffic, like sending of video data, iscommunicated to the receiver by the sender. Thisinformation is used by the receiver which thenmeasures the actual throughput and controls therate within the allowed bounds by using multipleTCP connections and dynamically changing itsTCP window size for each connection.

TCP protocols tailor-made for wireless sensornetworks is an active research area with recentimplementations of the light-weight Sensor InternetProtocol (SIP) [78] and the open source uIP [42] thathas a code size of few Kbyte. However, a majorproblem with the TCP-based approach in wirelessnetworks is its inability to distinguish between badchannel conditions and network congestion. Thishas motivated a new family of specialized transportlayer where the design practices followed areentirely opposite to that of TCP [122], or stress ona particular functionality of the transport layer, likereliability or congestion control.

Page 20: Wireless Multimedia Networks Survey

940 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

7.2. Application specific and non-standard protocols

Depending on the application, both reliabilityand congestion control may be equally importantfunctionalities or one may be preferred over theother. As an example, in the CYCLOPS image cap-turing and inference module [103] designed forextremely light-weight imaging, congestion controlwould be the primary functionality with multiplesensor flows arriving at the sink, each being moder-ately loss-tolerant. We next list the important char-acteristics of such TCP incompatible protocols incontext of WMSNs.

7.2.1. Reliability

Multimedia streams may consist of images, videoand audio data, each of which merits a differentmetric for reliability. As discussed in Section 7.1,when an image or video is sent with differentiallycoded packets, the arrival of the packets with theROI field or the I-frame respectively should be guar-anteed. The application can, however, withstandmoderate loss for the other packets containing dif-ferential information. Thus, we believe that reliabil-ity needs to be enforced on a per-packet basis tobest utilize the existing networking resources. If aprior recorded video is being sent to the sink, allthe I-frames could be separated and the transportprotocol should ensure that each of these reachthe sink. Reliable Multi-Segment Transport(RMST) [119] or the Pump Slowly Fetch Quickly(PSFQ) protocol [127] can be used for this purposeas they buffer packets at intermediate nodes, allow-ing for faster retransmission in case of packetloss. However, there is an overhead of using thelimited buffer space at a given sensor node forcaching packets destined for other nodes, as wellas performing timely storage and flushing opera-tions on the buffer. In a heterogeneous network,where real-time data is used by actors as discussedin Section 4.3, the Real-time and Reliable Transport(RT)2 protocol [52] can be used that defines differentreliability constraints for sensor–actor and actor–actor communication.

7.2.2. Congestion control

The high rate of injection of multimedia packetsinto the network causes resources to be used upquickly. While typical transmission rates for sensornodes may be about 40 kbit/s, indicative data ratesof a constant bit rate voice traffic may be 64 kbit/s.Video traffic, on the other hand, may be bursty

and in the order of 500 kbit/s [136], thus making itclear that congestion must be addressed in WMSNs.While these data generation rates are high for a sin-gle node, multiple sensors in overlapped regionsmay inject similar traffic on sensing the same phe-nomenon. The Event-to-Sink Reliable Transport(ESRT) protocol [17] leverages the fact that spatialand temporal correlation exists among the individ-ual sensor readings [125]. The ESRT protocol regu-lates the frequency of event reporting in a remoteneighborhood to avoid congestions in the network.However, this approach may not be viable for allsensor applications as nodes transmit data onlywhen they detect an event, which may be a shortduration burst as in the case of a video monitoringapplication. The feedback from the base-stationmay hence not reach in time to prevent a suddencongestion due to this burst.

7.2.3. Use of multi-pathWe advocate the use of multiple paths for data

transfer in WMSNs owing to the following tworeasons:

• A large burst of data (say, resulting from anI-frame) can be split into several smaller bursts,thus not overwhelming the limited buffers at theintermediate sensor nodes.

• The channel conditions may not permit high datarate for the entire duration of the event beingmonitored. By allowing multiple flows, the effec-tive data rate at each path gets reduced and theapplication can be supported.

The design of a multiple source-sink transportprotocol is challenging, and is addressed by theCOngestion Detection and Avoidance (CODA)protocol [128]. It allows a sink to regulate multiplesources associated with a single event in case of per-sistent network congestion. However, as the conges-tion inference in CODA is based on queue length atintermediate nodes, any action taken by the sourceoccurs only after a considerable time delay. Othersolutions include the Multi-flow Real-time Trans-port Protocol (MRTP) [79] that does not specificallyaddress energy efficiency considerations in WMSNs,but is suited for real-time streaming of multimediacontent by splitting packets over different flows.MRTP does not have any mechanism for packetretransmission and is mainly used for real-time datatransmission and hence, reliability can be an issuefor scalar data traffic.

Page 21: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 941

7.3. Open research issues

In summary, the transport layer mechanisms thatcan simultaneously address the unique challengesposed by the WMSN paradigm and multimediacommunication requirements must be incorporated.While several approaches were discussed, someopen issues remain and are outlined below:

• Trade-off between reliability and congestion con-

trol. In WMSN applications, the data gatheredfrom the field may contain multimedia informa-tion such as target images, acoustic signal, andeven video captures of a moving target, all ofwhich enjoy a permissible level of loss tolerance.Presence or absence of an intruder, however, mayrequire a single data field but needs to be commu-nicated without any loss of fidelity. Thus, when asingle network contains multimedia as well asscalar data, the transport protocol must decidewhether to focus on one or more functionalitiesso that the application needs are met withoutan unwarranted energy expense. The design ofsuch a layer may as well be modular, with thefunctional blocks of reliability and/or congestioncontrol being invoked as per network demands.

• Real-time communication support. Despite theexistence of reliable transport solutions forWSN as discussed above, none of these protocolsprovide real-time communication support for theapplications with strict delay bounds. Therefore,new transport solutions which can also meet cer-tain application deadlines must be researched.

• Relation between multimedia coding rate and

reliability. The success in energy-efficient andreliable delivery of multimedia informationextracted from the phenomenon directly dependson selecting appropriate coding rate, number ofsensor nodes, and data rate for a given event[125]. However, to this end, the event reliabilityshould be accurately measured in order to effi-ciently adapt the multimedia coding and trans-mission rates. For this purpose, new reliabilitymetrics coupled with the application layer codingtechniques should be investigated.

8. Network layer

The network layer addresses the challenging taskof providing variable QoS guarantees depending onwhether the stream carries time-independent data

like configuration or initialization parameters,time-critical low rate data like presence or absenceof the sensed phenomenon, high bandwidth video/audio data, etc. Each of the traffic classes describedin Section 6.1 has its own QoS requirement whichmust be accommodated in the network layer.

Research on the network layer becomes impor-tant from the standpoint of supporting multimediaapplications constrained by lack of global knowl-edge, reduced energy, and computational ability ofthe individual nodes.

We next discuss the existing research directionsfor the network layer functionalities of addressingand routing, while stressing on their applicabilityto delay-sensitive and high bandwidth needs.

8.1. Addressing and localization

In the case of large WMSNs like IrisNet [29], it isrequired that the individual nodes be monitored viathe Internet. Such an integration between a ran-domly deployed sensor network and the establishedwired network becomes a difficult research chal-lenge. The key problem of global addressing couldbe solved by the use of IPv6 in which the sensorcan concatenate its cluster ID with its own MACaddress to create the full IPv6 address. However,the 16-byte address field of IPv6 introduces exces-sive overhead in each sensor data packet. Thereare several other schemes that assign unique net-work-wide IDs (see [95] and references therein) orleverage location information to create an address-free environment but they, however, run the riskof incompatibility with the established standardsof the Internet. Location information is a key char-acteristic of any sensor network system. The abilityto associate localization information to the raw datasampled from the environment increases the capa-bility of the system and the meaningfulness of theinformation extracted. Localization techniques forWMSNs are unlikely to differ substantially fromthose developed for traditional sensor networks,which are reviewed in [111]. Moreover, WMSNswill most likely leverage the accurate ranging capa-bilities that come with high bandwidth transmis-sions (such as UWB techniques, as described inSection 10).

8.2. Routing

Data collected by the sensor nodes needs to besent to the sink, where useful information can be

Page 22: Wireless Multimedia Networks Survey

Routing Layer

Network Condition Based [110][18]

Metrics -

• Position wrt sink

• Radio characteristics

• Error rate

• Residual energy

• Backlogged packets

Traffic Classes Based [59]

Metrics -

• QoS profiles/Traffic classes

• Dropping rate

• Latency tolerance

• Desired bandwidth

Real Time Streaming Based [56][45]

Metrics -

• Spatio-temporal character

• Probabilistic delay guarantees

Fig. 8. Classification of existing routing protocols.

942 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

extracted from it. Comprehensive surveys of themajor routing schemes existing in the literature arepresented in [19,23]. The concerns of routing in gen-eral differ significantly from the specialized servicerequirements of multimedia streaming applications.As an example, multiple routes may be necessary tosatisfy the desired data rate at the destination node.Also, different paths exhibiting varying channel con-ditions may be preferred depending on the type oftraffic and its resilience to packet loss. We next dis-cuss the various approaches to routing in WMSNswhile maintaining that protocols may incorporatefeatures from more than one of the following clas-ses. Majorly, they can be classified into routingbased on (i) network conditions that leverage chan-nel and link statistics, (ii) traffic classes that decidepaths based on packet priorities, and (iii) specializedprotocols for real-time streaming that use spatio-temporal forwarding. Fig. 8 provides a classificationof existing routing protocols and summarizes thediscussion in this section.

8.2.1. QoS routing based on network conditions

Network conditions include interference seen atintermediate hops, the number of backlogged flowsalong a path, residual energy of the nodes, amongstothers. A routing decision based on these metricscan avoid paths that may not support high band-width applications or introduce retransmissionowing to bad channel conditions.

The use of image sensors is explored in [110], inwhich visual information is used to gather topologyinformation that is then leveraged to develop effi-cient geographic routing schemes. A weighted costfunction is constructed that takes into account posi-tion with respect to the base station, backloggedpackets in the queue, and remaining energy ofthe nodes to decide the next hop along a route.

This approach involves an overhead in whichnodes must apprise their neighbors of any changesin the cost function parameters. This work alsodeals with relative priority levels for event based(high bandwidth) and periodic (low bandwidth)data.

A similar scenario is considered in [18] whereimaging data for sensor networks results in QoSconsiderations for routing, apart from the tradi-tional goal of energy conservation. Here, the costfunction evaluates the residual energy of a node,transmission energy, error rate and other communi-cation parameters. The protocol finds a least-cost,energy efficient path while considering maximumallowed delays.

8.2.2. QoS routing based on traffic classes

Sensor data may originate from various types ofevents that have different levels of importance, asdescribed in Section 6.1. Consequently, the contentand nature of the sensed data also varies. As anexample that highlights the need for network levelQoS, consider the task of bandwidth assignmentfor multimedia mobile medical calls, which includepatients’ sensing data, voice, pictures and video data[59]. Unlike the typical source-to-sink multi-hopcommunication used by classical sensor networks,the proposed architecture uses a 3G cellular systemin which individual nodes forward the sensed datato a cellular phone or a specialized information col-lecting entity. Different priorities are assigned tovideo data originating from sensors on ambulances,audio traffic from elderly people, and imagesreturned by sensors placed on the body. In orderto achieve this, parameters like hand-off droppingrate (HDR), latency tolerance and desired amountof wireless effective bandwidth are taken intoconsideration.

Page 23: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 943

8.2.3. Routing protocols with support for streaming

The SPEED protocol [56] provides three typesof real-time communication services, namely, real-time unicast, real-time area-multicast and real-timearea-anycast. It uses geographical location forrouting and a key difference with other schemes ofthis genre is its spatio-temporal character, i.e., ittakes into account timely delivery of the packets.It is specifically tailored to be a stateless, localizedalgorithm with minimal control overhead. End-to-end soft real-time communication is achievedby maintaining a desired delivery speed acrossthe sensor network through a combination offeedback control and non-deterministic geographicforwarding. As it works satisfactorily underscarce resource conditions and can provide servicedifferentiation, SPEED takes the first step inaddressing the concerns of real-time routing inWMSNs.

A significant extension over SPEED, theMMSPEED protocol [45] can efficiently differenti-ate between flows with different delay and reliabilityrequirements. MMSPEED is based on a cross-layerapproach between the network and the MAC layersin which a judicious choice is made over reliabilityand timeliness of packet arrival. It is argued thatthe differentiation in reliability is an effective wayof channeling resources from flows with relaxedrequirements to flows with tighter requirements.Importantly, a new metric called On-Time Reach-

ability is introduced which is a measure of the prob-ability that a packet reaches its destination withinrequired delay bounds. While current researchdirections make an effort to provide real-timestreaming, they are still best effort services. Givingfirm delay guarantees in a dynamically changingnetwork is a difficult problem and yet is importantfor seamless viewing of the multimedia frames.MMSPEED takes the step towards this end byadopting a probabilistic approach but clearly, fur-ther work is needed in this area.

8.3. Open research issues

• The identification of the optimal routing metricsis a continual area of research. Most routing pro-tocols that consider more than one metric, likeenergy, delay etc., form a cost function that isthen minimized. The choice of the weights forthese metrics need to be judiciously undertaken,and is often subject to dynamic network condi-tions. Thus, further work is needed to shift this

decision making process and network tuningfrom the user end into the network.

• As the connectivity between different domainsimproves, end-to-end QOS guarantees arecomplicated by the inherent differences in thenature of the wired and wireless media. Whensensed data from the field is sent via the Internet,a single routing metric is unsuitable for theentire path between source and end user. Decou-pling of reliability and routing parameters atsuch network boundaries and a seamless inte-gration of schemes better suited to wired orwireless domains, respectively, will need to beexplored.

9. MAC layer

Owing to the energy constraints of the small, bat-tery-powered sensor nodes, it is desirable that themedium access control (MAC) protocol enablereliable, error-free data transfer with minimumretransmissions while supporting application-spe-cific QoS requirements. Multimedia traffic, namelyaudio, video, and still images can be classified asseparate service classes and subjected to differentpolicies of buffering, scheduling and transmission.The need for packet-specific differentiation is justi-fied in the context of the following example. Thenew standard for the compression of still images,JPEG2000 [12], incorporates a feature called regionof interest (ROI) that may be applicable to visualdata sensing. It allows the allocation of greaterimportance to certain parts of the image whichcan then be coded and transmitted over a betterquality link or on a priority basis. Especially rele-vant to systems for military surveillance or faultmonitoring, such application layer features couldbe leveraged by the MAC by differentially treatingthe ROI packets.

Research efforts to provide MAC layer QoS canbe classified mainly into (i) channel access policies,(ii) scheduling and buffer management, and (iii)error control. We next provide a brief descriptionof each and highlight their support to multimediatraffic. The scope of this paper is limited to the chal-lenges posed by multimedia traffic in sensor net-works and the efforts at the MAC layer to addressthem. A detailed survey of MAC protocols for clas-sical sensor networks using scalar data can be foundin [67]. Fig. 9 provides a classification of relevantMAC layer functionalities and summarizes the dis-cussion in this section.

Page 24: Wireless Multimedia Networks Survey

Medium Access Control

Contention Free

Single Channel

• TDMA -like

• Better control for multimedia design parameters

• Simple hardware, operation

• MIMO technology

E.g.. STE [109], EDD [31]

Contention Based

• Coordinate sleep/awake cycles

• Bursty nature of scheduling may lead to jitters

E.g.. S-MAC [133], T-MAC [40]Multi-channel

• Better bandwidth utilization

• Hardware assumptions

• Channel switching delay may be a consideration in end to end latency

E.g.. STEM [114], RATE-EST [88], CMAC [33]

Fig. 9. Classification of protocols at the data link layer.

944 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

9.1. Channel access policies

The main causes of energy loss in sensor net-works are attributed to packet collisions and subse-quent retransmissions, overhearing packets destinedfor other nodes, and idle listening, a state in whichthe transceiver circuits remain active even in theabsence of data transfer. Thus, regulating accessto the channel assumes primary importance and sev-eral solutions have been proposed in the literature.

9.1.1. Contention-based protocols

Most contention-based protocols like S-MAC[133], and protocols inspired by it [40], have a sin-gle-radio architecture. They alternate between sleepcycles (low-power modes with transceiver switchedoff) and listen cycles (for channel contention anddata transmission). However, we believe that theirapplicability to multimedia transmission is limitedowing to the following reasons:

• The primary concern in the protocols of this classis saving energy, and this is accomplished at thecost of latency and by allowing throughput deg-radation. A sophisticated duty cycle calculationbased on permissible end-to-end delay needs tobe implemented and coordinating overlappinglisten period with neighbors based on this calcu-lation is a difficult research challenge.

• Coordinating the sleep–awake cycles betweenneighbors is generally accomplished though sche-dule exchanges. In case of dynamic duty cyclesbased on perceived values of instantaneous ortime averaged end-to-end latency, the overheadof passing frequent schedules also needs investi-

gation in light of the ongoing high data ratevideo/audio messaging.

• Video traffic exhibits an inherent bursty natureand can lead to sudden buffer overflow at thereceiver. This problem is further aggravated bythe transmission policy adopted in T-MAC [40].By choosing to send a burst of data duringthe listen cycle, T-MAC shows performanceimprovement over S-MAC, but at the cost ofmonopolizing a bottleneck node. Such an opera-tion could well lead to strong jitters and result indiscontinuous real-time playback.

9.1.2. Contention-free single channel protocols

Time Division Multiple Access (TDMA) is arepresentative protocol of this class in which theclusterhead (CH) or sink helps in slot assignment,querying particular sensors and maintaining timeschedules. We believe that such protocols can beeasily adapted for multimedia transmission andhighlight the likely design considerations.

• TDMA schemes designed exclusively for sensornetworks [70] (and references therein) have asmall reservation period (RP) that is generallycontention based, followed by a contention-freeperiod that spans the rest of the frame. This RPcould occur in each frame or at pre-decided inter-vals in order to assign slots to active nodes takinginto consideration the QoS requirement of theirdata streams. The length of the TDMA framesand the frequency of the RP interval are someof the design parameters that can be exploitedwhile designing a multimedia system.

Page 25: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 945

• For real-time streaming video, packets are timeconstrained and scheduling policies like Shortest

Time to Extinction (STE) [109] or Earliest Due

Date (EDD) [31] can be adopted. Both of theseare similar in principle as packets are sent inthe increasing order of their respective delay tol-erance but differ in respect that EDD may stillforward a packet that has crossed its alloweddelay bound. Based on the allowed packet lossof the multimedia stream, the dependenciesbetween packet dropping rate, arrival rate, anddelay tolerance [109] can be used to decide theTDMA frame structure and thus ensure smoothreplay of data. This allows greater design choicesas against [31], where the frame lengths and slotduration are considered constant.

• As sensor nodes are often limited by their maxi-mum data transmission rate, depending upontheir multimedia traffic class, the duration oftransmission could be made variable. Thus vari-able TDMA (V-TDMA) schemes should be pre-ferred when heterogeneous traffic is present in thenetwork. Tools for calculating the minimumworst-case delay in such schemes and algorithmsfor link scheduling are provided in [38]. As real-time streaming media is delay bounded, thelink-layer latency introduced in a given flowdue in order to satisfy data rate requirements ofanother flow needs to be analyzed well when V-TDMA schemes are used.

9.1.2.1. MIMO technology. The high data raterequired by multimedia applications can beaddressed by spatial multiplexing in MIMO sys-tems, that use a single channel but employ interfer-ence cancellation techniques. Recently, virtual

MIMO schemes have been proposed for sensor net-works [62], where nodes in close proximity form acluster. Each sensor functions as a single antennaelement, sharing information and thus simulatingthe operation of a multiple antenna array. A distrib-uted compression scheme for correlated sensor data,that specially addresses multimedia requirements, isintegrated with the MIMO framework in [63]. How-ever, a key consideration in MIMO based systems isthe number of sensor transmissions and the requiredsignal energy per transmission. As the complexity isshifted from hardware to sensor coordination, fur-ther research is needed at the MAC layer to ensurethat the required MIMO parameters like channelstate, desired diversity/processing gain are known

to both the sender and receiver at an acceptableenergy cost.

9.1.2.2. Open research issues

• While TDMA schedules within a cluster can beeasily devised, the problem is more involvedwhen individual CHs are not in direct range ofthe sink, thus necessitating inter-cluster multi-hop communication. An acceptable, non-over-lapping slot assignment for all neighboringclusters needs to be derived in a distributed man-ner requiring coordination between them at theset-up phase. This problem has been shown tobe NP-complete [57] by reduction to an instanceof graph coloring and the development of effi-cient heuristics is an open issue.

• The effect of clock drift is pronounced if the slotduration is small and rigid time synchronizationis required for best performance [73] (and refer-ences therein). Network scalability is anotherimportant area of research and the TDMAschedules must be able to accommodate highnode densities that are characteristic of sensornetworks. As channel capacity in TDMA is fixed,only slot durations or number of slots in a framemay be changed keeping in mind the number ofusers and their respective traffic types.

• Bounds on unequal slot/frame lengths for differ-entiated services should be decided by theallowed per-hop delay (and consequently end-to-end delay). Schedules, once created, shouldalso be able to account for a dynamically chang-ing topology due to nodes dying off or new onesbeing added.

9.1.3. Contention-free multi-channel protocols

Clearly, existing data rates of about 40 kbit/s and250 kbit/s supported by the MICA2 and MICAzmotes are not geared to support multimedia traffic.Along with improving hardware and thus increasingcost, an alternate approach is to efficiently utilizethe available bandwidth. By using multiple channelsin a spatially overlapped manner, existing band-width can be efficiently utilized for supporting mul-timedia applications. We observe that Berkeley’sthird generation MICA2 Mote has an 868/916 GHz multi-channel transceiver [4]. In Rock-well’s WINS nodes, the radio operates on one of40 channels in the ISM frequency band, selectableby the controller [16]. We next outline the design

Page 26: Wireless Multimedia Networks Survey

946 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

parameters that could influence MAC design inmulti-channel WMSNs.

• Recent research has focused on a two-transceiverparadigm in which the main radio (MR) is supple-mented by the presence of a wake-up radio (LR)having similar characteristics [88,114,115] or asimple low-energy design [53] that emits a seriesof short pulses or a busy tone. The LR is usedfor exchanging control messages and is assigneda dedicated channel. In high bandwidth applica-tions, like streaming video, the use of a separatechannel for channel arbitration alone does notallow best utilization of the network resources.

• We propose that WMSNs use in-band signalling,where the same channel is used for both data andchannel arbitration [33]. While such protocolsundoubtedly improve bandwidth efficiency, theyintroduce the problem of distinct channel assign-ment and need to account for the delay to switchto a different channel [72], as its cumulative nat-ure at each hop affects real-time media.

• Existing compression techniques like JPEG andMPEG cause size variations of the capturedvideo frame depending upon the movement andcompression rate within a given frame, and thesubsequent following frames. Thus, the arrivalpacket rate (assuming packet size constant) maychange over time and, we believe, enforcingwake-ups based on the packet backlog is a betterdesign approach as compared to those with statictimer based schemes. A dual approach is fol-lowed in [88], but this, however, raises questionsof receiver side synchronization as knowledge ofthe sender’s buffer state and its consequent sched-uling instant is unknown at the receiver end.

9.1.3.1. Open research issues

• Multi-channel protocols utilize bandwidth better,and thus may perform favorably in cases ofapplications demanding high data rate. Theresults obtained in [72], leave an open questionon whether switching delay can be successfullyhidden with only one interface per node. If thisis possible, it may greatly simplify sensor designwhile performing as well as a multi-channel,multi-interface solution. Also, the sleep–awakeschedule of the radios should be made dynamicin order to accommodate the varying frame ratefor video sensors.

• Recently, the cognitive radio paradigm [21,55]has drawn significant interest in which the radiois aware of its surroundings, learns from it andadapts in order to give the best possible function-ality to the user. By dynamically sharing spectrumwith other services, unused frequency bands canbe utilized ensuring optimum bandwidth usage.The release of new spectrum in the 5-GHz U-NII band has spurred on interest in realizing apractical system for multimedia applications.However, porting it to a low-power sensor nodeand developing controlling mechanisms for chan-nel hand-off is an area that is yet to be explored.

• Multi-channel protocols are not completely colli-sion free as is seen in the case of control packetcollision [33,88,114]. All available channels can-not be assumed to be perfectly non-overlapping,as is seen in the case of 802.11b based WLANs[89]. This may necessitate dynamic channelassignment, taking into account the effect ofadjacent channel interference, in order to main-tain the network QoS.

9.2. Scheduling

MAC layer scheduling in the context of WMSNsdiffers from the traditional networking model in thesense that apart from choosing the queueing disci-pline that accounts for latency bounds, rate/powercontrol and consideration of high channel errorconditions needs to be incorporated. We believethat an optimal solution is a function of all of thesefactors, appropriately weighted and seamlessly inte-grated with a suitable channel access policydescribed in Section 9.1.

In order to generate optimal schedules that mini-mize both power consumption and the probability ofmissing deadlines for real-time messages, PARM[24] integrates the EDD metric described in Section9.1.2 into an energy consumption function. Whilesignificant performance improvements are demon-strated, this work needs to be extended for large-scalenetworks that are typically envisaged for WMSNs.

Queueing at the MAC layer has been extensivelyresearched and several schemes with varying levelsof complexity exist. Of interest to multimedia appli-cations is the development of schemes that allow adelay bound and thus assure smooth streaming ofmultimedia content. E2WFQ [101], a variant of theestablished weighted fair queuing (WFQ) discipline,allows adjustments to be made to the energy-

Page 27: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 947

latency-fidelity trade-off space. Depending upon thecurrent residual energy in the network, it is possibleto adapt the scheme for greater energy savings,albeit at the cost of a small, bounded increase inworst-case packet latency.

Given the bursty nature of voice and video data,queueing disciplines are needed that can accommo-date sudden peaks, as well as operate under local-ized channel errors. Extending WFQ, the WirelessPacket Scheduling (WPS) presented in [76],addresses the concerns of delay and rate-sensitivepacket flows thus making it suitable for multimediatraffic. WPS, however, assumes that the channelerror is fully predictable at any time and its practicalimplementation shows marked deviations from theidealized case in terms of worst-case complexity.This work is suitable for single-hop sensor-sinkcommunication and multi-hop forwarding issuesare not explored.

Network calculus [27,36] is a theory for deter-ministic queueing systems that allows assignmentof service guarantees by traffic regulation and deter-ministic scheduling. Through tools provided by net-work calculus, bounds on various performancemeasures, such as delay and queue length, at eachelement of the network can be derived and thusQoS of a flow can be specified. Arrival, Departure

and Service curves reflect the constraints that flowsare subjected to within a network. The calculusrelies on Min-plus algebra, in which addition andmultiplication are replaced by minimum and addi-tion, respectively, to operate on these curves. Cur-rent network calculus results have been mostlyderived for wired networks, and assume static topol-ogies and fixed link capacity, which are clearlyunreasonable assumptions in sensor networks. Webelieve that extending network calculus results toWMSNs is a challenging but promising researchthrust, likely to produce important advancementsin our ability to provide provable QoS guaranteesin multi-hop networks.

9.3. Link-layer error control

Streaming of real-time multimedia data over asensor network is particularly challenging due tothe QoS requirements of a video/audio stream andthe unreliability of the wireless medium. For exam-ple, for good quality video perception a frame lossrate lower than 10�2 is required. This constitutes ahard task since the wireless channel is highly unreli-able, mostly caused by multi-path fading and

shadowing at the physical layer, and by collisionsor co-channel interference at the MAC layer. Twomain classes of mechanisms are traditionallyemployed to combat the unreliability of the wirelesschannel at the physical and data link layer, namelyforward error correction (FEC) and automatic repeat

request (ARQ), along with hybrid schemes. ARQmechanisms use bandwidth efficiently at the costof additional latency. Hence, while carefullydesigned selective repeat schemes may be of someinterest, naive use of ARQ techniques is clearlyinfeasible for applications requiring real-time deliv-ery of multimedia content.

An important characteristic of multimedia con-tent is unequal importance, i.e., not all packets havethe same importance for correct perceptual recon-struction of the multimedia content. Moreover,multimedia data are usually error-tolerant, so evenif some errors are introduced, the original informa-tion may still be reconstructed with tolerable distor-tion. Therefore, an idea that has been usedeffectively consists of applying different degrees ofFEC to different parts of the video stream, depend-ing on their relative importance (unequal protection).For example, this idea can be applied to layeredcoded streams to provide graceful degradation inthe observed image quality in presence of errorlosses, thus avoiding so-called ‘‘cliff’’ effects [54].

In general, delivering error-resilient multimediacontent and minimizing energy consumption arecontradicting objectives. For this reason, and dueto the time-varying characteristics of the wirelesschannel, several joint source and channel codingschemes have been developed, e.g. [44], which tryto reduce the energy consumption of the whole pro-cess. Some recent papers [77,135] even try to jointlyreduce the energy consumption of the whole processof multimedia content delivery, i.e., jointly optimizesource coding, channel coding, and transmissionpower control. In these schemes, the image codingand transmission strategies are adaptively adjustedto match current channel conditions by exploitingthe peculiar characteristics of multimedia data, suchas unequal importance of different frames or layers.However, most of these efforts have originated fromthe multimedia or coding communities, and thus donot jointly consider other important networkingaspects of content delivery over a multi-hop wirelessnetworks of memory-, processing- and battery-con-strained devices.

In [126], a cross-layer analysis of error controlschemes for WSNs is presented. The effects of

Page 28: Wireless Multimedia Networks Survey

948 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

multi-hop routing and of the broadcast nature ofwireless communications are investigated to modelthe energy consumption, latency and packet errorrate performance of error control schemes. As aresult, error control schemes are studied through across-layer analysis that considers the effects ofrouting, medium access, and physical layer. Thisanalysis enables a comprehensive comparison ofFEC and ARQ schemes in WSNs. FEC schemesare shown to improve the error resiliency comparedto ARQ. In a multi-hop network, this improvementcan be exploited by reducing the transmit power(transmit power control) or by constructing longerhops (hop length extension) through channel-awarerouting protocols. The analysis reveals that, for cer-tain FEC codes, hop length extension decreasesboth the energy consumption and the end-to-endlatency subject to a target packet error rate com-pared to ARQ. Thus, FEC codes are an importantcandidate for delay-sensitive traffic in WSNs. Onthe other hand, transmit power control results insignificant savings in energy consumption at the costof increased latency.

A different approach to link-layer reliability isproposed through the use of erasure codes [65]. Insensor networks where message passing is mini-mized, such a scheme significantly reduces the needfor retransmissions. It allows recovering of m origi-nal messages by receiving any m out of n codewords. In lossy conditions, the number of such codewords generated can be optimized against the energyexpense in their transmission to ensure greater reli-ability. In multimedia applications, where the lossof a few frames may be tolerated, shorter codewords may be used. When detailed, higher resolu-tion images are needed, regenerating the lost infor-mation through these codes may be preferred overinterpolation of the missing data at the receiverend. It should be noted that this approach worksbest for static reliability requirements. Dynamicallychanging code lengths also increases the packet sizeand its overall effect on other factors like congestioncannot be trivially estimated.

9.3.1. Open research issues

• There is a need to develop models and algorithmsto integrate source and channel coding schemesin existing cross-layer optimization frameworks.The existing schemes mostly consider point-to-point wireless links, and neglect interference fromneighboring devices and multi-hop routes.

• Since multimedia data is usually error-tolerant,new packet dropping schemes for multimediadelivery have to be delivered, that selectivelydrop packets that will not impact the perceivedquality at the end user.

• Energy-constrained sensor networks naturallycall for Selective Repeat ARQ techniques. How-ever, this can introduce excessive latency. Thereis a need to study trade-offs between the degreeof reliability required (i.e., acceptable packeterror rate) and the sustainable delay at the appli-cation layer.

• Coordinating link and transport layer error recov-ery schemes is a challenge that remains to beaddressed. In order to ensure that buffer over-flowconditions do not occur, mechanisms that detectincreased MAC level contention and regulate datageneration rate could be implemented.

• In-network storage schemes [26], in which data isstored within the sensor nodes itself and accessedon demand, may further reduce available buffercapacity of the sensor nodes. Thus allocation ofoptimum buffer sizes truly merits a cross-layerapproach, spanning application layer architec-ture to existing channel conditions seen at thePHY layer.

9.4. Multimedia packet size

In wireless networks, the successful reception of apacket depends upon environmental factors thatdecide the bit error rate (BER) of the link.

Packet length clearly has a bearing on reliablelink level communication and may be adjustedaccording to application delay sensitivity require-ments. The Dynamic Packet Size Mechanism(DPSM) scheme [124] for wireless networks followsan additive increase, multiplicative decrease(AIMD) mechanism to decide the packet length,analogous to the congestion control performed byTCP at the transport layer. As an example, if apacket fails the checksum, the sender is intimatedand the subsequent packets are sent with a multipli-cative decrease in length. However, the problemsassociated with burst-error channel and the adapta-tion of video quality have not been addressed in thiswork. Grouping smaller packets together in order toreduce contention has been explored in PacketFrame Grouping (PFG) [123] and PAcket Concate-nation (PAC) [134]. Originally devised for 802.11-like protocols, here the header overhead is sharedby the frames. In PFG, the individual frames may

Page 29: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 949

be addressed to different senders and requires per-frame ACKs while PAC requires buffering as allframes need to have a common destination. Depend-ing upon the information content of the frame andthe channel conditions, variable length forwarderror-correcting codes (FEC) can be used to reducethe effects of transmission errors at the decoder. Thetrade-off between the increase of packet length dueto the additional parity bits and energy constraintsis evaluated in [108], where FEC is shown to per-form better than retransmissions.

10. Physical layer

In this section, we discuss the applicability of theUWB transmission technique, which we advocateover other technologies such as Zigbee, as the mostsuitable choice for multimedia sensors.

10.1. Ultra wide band communications

The ultra wide band (UWB)1 technology has thepotential to enable low-power consumption, highdata rate communications within tens of meters,characteristics that make it an ideal choice forWMSNs.

UWB signals have been used for several decadesin the radar community. Recently, the US FederalCommunications Commission (FCC) Notice ofInquiry in 1998 and the First Report and Order in2002 [9] inspired a renewed flourish of researchand development efforts in both academy and indus-try due to the characteristics of UWB that make it aviable candidate for wireless communications indense multi-path environments.

Although UWB signals, as per the specificationsof the FCC, use the spectrum from 3.1 GHz to10.6 GHz, with appropriate interference limitation,UWB devices can operate using spectrum occupiedby existing radio services without causing interfer-ence, thereby permitting scarce spectrum resourcesto be used more efficiently. Instead of dividing thespectrum into distinct bands that are then allocated

1 The FCC defines UWB as a signal with either a fractionalbandwidth of 20% of the center frequency or 500 MHz (when thecenter frequency is above 6 GHz). The FCC calculates thefractional bandwidth as 2(fH � fL)/(fH + fL) where fH representsthe upper frequency of the �10 dB emission limit and fL

represents the lower frequency limit of the �10 dB emission limit[105].

to specific services, UWB devices are allowed tooperate overlaid and thus interfere with existingservices, at a low enough power level that existingservices would not experience performance degrada-tion. The First Report and Order by the FCCincludes standards designed to ensure that existingand planned radio services, particularly safetyservices, are adequately protected.

There exist two main variants of UWB. The first,known as Time-Hopping Impulse Radio UWB(TH-IR-UWB) [105], and mainly developed byWin and Scholtz [129], is based on sending veryshort duration pulses (in the order of hundreds ofpicoseconds) to convey information. Time is dividedinto frames, each of which is composed of severalchips of very short duration. Each sender transmitsone pulse in a chip per frame only, and multi-useraccess is provided by pseudo-random time hoppingsequences (THS) that determine in which chip eachuser should transmit. A different approach, knownas Multi-Carrier UWB (MC-UWB), uses multiplesimultaneous carriers, and is usually based onOrthogonal Frequency Division Multiplexing(OFDM) [25].

MC-UWB is particularly well suited for avoidinginterference because its carrier frequencies can beprecisely chosen to avoid narrowband interferenceto or from narrowband systems. However, imple-menting a MC-UWB front-end power amplifiercan be challenging due to the continuous variationsin power over a very wide bandwidth. Moreover,when OFDM is used, high-speed FFT processingis necessary, which requires significant processingpower and leads to complex transceivers.

TH-IR-UWB signals require fast switching timesfor the transmitter and receiver and highly precisesynchronization. Transient properties becomeimportant in the design of the radio and antenna.The high instantaneous power during the brief inter-val of the pulse helps to overcome interference toUWB systems, but increases the possibility of inter-ference from UWB to narrowband systems. The RFfront-end of a TH-IR-UWB system may resemble adigital circuit, thus circumventing many of the prob-lems associated with mixed signal integrated cir-cuits. Simple TH-IR-UWB systems can be veryinexpensive to construct.

Although no sound analytical or experimentalcomparison between the two technologies is avail-able to our knowledge, we believe that TH-IR-UWB is particularly appealing for WMSNs forthe following reasons:

Page 30: Wireless Multimedia Networks Survey

950 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

• It enables high data rate, very low-power wirelesscommunications, on simple-design, low-costradios (carrierless, baseband communications)[129].

• Its fine delay resolution properties are appropri-ate for wireless communications in dense multi-path environment, by exploiting more resolvablepaths [113].

• Provides large processing gain in presence ofinterference.

• Provides flexibility, as data rate can be tradedfor power spectral density and multi-pathperformance.

• Finding suitable codes for THS is trivial (asopposed to CDMA codes), and no assignmentprotocol is necessary.

• It naturally allows for integrated MAC/PHYsolutions; [87]. Moreover, interference mitigationtechniques [87] allow realizing MAC protocolsthat do not require mutual temporal exclusionbetween different transmitters. In other words,simultaneous communications of neighboringdevices are feasible without complex receiversas required by CDMA.

• The large instantaneous bandwidth enables finetime resolution for accurate position estima-tion [49] and for network time distribution(synchronization).

• UWB signals have extremely low-power spectraldensity, with low probability of intercept/detec-tion (LPI/D), which is particularly appealingfor military covert operations.

10.1.1. Ranging capabilities of UWB

Particularly appealing for WMSNs are UWBhigh data rate with low-power consumption, andits positioning capabilities. Positioning capabilitiesare needed in sensor networks to associate physicalmeaning to the information gathered by sensors.Moreover, knowledge of the position of each net-work device allows for scalable routing solutions[84]. While angle-of-arrival techniques and signalstrength based techniques do not give particularadvantages with respect to other transmission tech-niques, time-based approaches in UWB allow rang-ing accuracy in the order of centimeters [49]. Thiscan be intuitively explained by the expression in(1), which gives a lower bound on the best achiev-able accuracy of a distance estimate d [49]:ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðdÞ

qP

c

2ffiffiffi2p

pffiffiffiffiffiffiffiffiffiffiffiSNRp

b; ð1Þ

where c is the speed of light, SNR represents thesignal-to-noise ratio, and b is the effective signalbandwidth. As can be seen, the accuracy of thetime-based localization technique can be improvedby either increasing the effective bandwidth or theSNR. For this reason, the large bandwidth ofUWB systems allows extremely accurate locationestimations, e.g., within one inch at SNR = 0 dBand with a pulse of 1.5 GHz bandwidth. Excellentcomprehensive surveys of the UWB transmissiontechnique, and of localization techniques for UWBsystems, are provided in [132,49], respectively.

10.1.2. Standards based on UWB

The IEEE 802.15.3a task group has been discuss-ing for three years an alternate physical layer for itshigh data rate Wireless Personal Area Networks(WPAN) standard. However, in early 2005 thegroup has been disbanded after not being able toreach a consensus on a single UWB-based standardbetween two competing proposal from two leadingindustry groups, the UWB Forum and the WiMediaAlliance. The UWB Forum proposal was based on aDirect Sequence DS-UWB technology, while theWimedia alliance was proposing a Multi-bandOrthogonal Frequency Division Multiplexing(MB-OFDM). The IEEE 802.15.4a task group isdeveloping an alternate physical layer for low datarate, very low-power consumption sensors, basedon impulse radio UWB.

10.1.3. Open research issues

• While the UWB transmission technology isadvancing rapidly, many challenges need to besolved to enable multi-hop networks of UWBdevices. In particular, although some recentefforts have been undertaken in this direction[39,87], how to efficiently share the medium inUWB multi-hop networks is still an open issue.

• As a step ahead, research is needed aimed atdesigning a cross-layer communication architec-ture based on UWB with the objective of reliablyand flexibly delivering QoS to heterogeneous appli-cations in WMSNs, by carefully leveraging andcontrolling interactions among layers accordingto the applications requirements.

• It is necessary to determine how to provide prov-able latency and throughput bounds to multime-dia flows in an UWB environment.

• It is needed to develop analytical models to quan-titatively compare different variants of UWB to

Page 31: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 951

determine trade-offs in their applicability to highdata rate and low-power consumption devicessuch as multimedia sensors.

• A promising research direction may also be tointegrate UWB with advanced cognitive radio[21] techniques to increase the spectrum utiliza-tion. For example, UWB pulses could be adap-tively shaped to occupy portions of the spectrumthat are subject to lower interference.

10.2. Other physical layer technologies

Devices based on the specifications issued by theZigbee alliance can find applicability in low datarate applications that require simple forms of QoSguarantees. Zigbee [7] is a specification for a suiteof high-level communication protocols using small,low-power digital radios based on the IEEE802.15.4 standard for wireless personal area net-works (WPANs). IEEE 802.15.4 can operate atbandwidths of 250 kbit/s at 2.4 GHz, 40 kbit/s at915 MHz (America) and 20 kbit/s at 868 MHz(Europe). While the data rate is much lower thanBluetooth, energy consumption is much lesser hereand recent low-cost commercial versions havedemonstrated the viability of this technology forlow duty cycle (<0.01) sensor applications. TheCSMA-CA MAC allows a large number of devicesto be connected simultaneously while provision isalso made for guaranteed time slot communication.The IEEE 802.15.4 standard allows three traffictypes, namely, periodic data, intermittent data andlow frequency data through its contention-basedand contention-free channel access methods. In par-ticular, for delay-sensitive applications, slots can bereserved every super-frame that allows contention-free and high-priority access. The standard specifiesa reduced functionality device (RFD) and a fullfunctionality device (FFD) in which only the lattercan talk with other RFDs or FFDs and assumethe role of network coordinators/traffic forwarders.Distinguished on the basis of memory resourceavailability and communication capability, thisstandard introduces heterogeneity thus cuttingdeployment costs further.

10.2.1. Open research issues

Though dedicated communication slots are pos-sible in Zigbee, the low data rate limits its applica-bility for multimedia applications. The standarddescribes a self-organizing network but heteroge-

neous nodes necessitate some form of topology con-trol in order to derive optimum ratios of FFD andRFD devices. Such a ratio will be dependent onthe region being monitored, the desired coverageaccuracy, amongst others. There is no built-in sup-port in Zigbee that splits up large data into smallerpackets and additional code has to be inserted at theapplication layer. Video and image capturing sen-sors will hence need a specialized PHY/MAC awareapplication layer.

11. Cross-layer design

As previously discussed, in multi-hop wirelessnetworks there is a strict interdependence amongfunctions handled at all layers of the communica-tion stack. The physical, MAC, and routing layerstogether impact the contention for networkresources. The physical layer has a direct impacton multiple access of nodes in wireless channels byaffecting the interference at the receivers. TheMAC layer determines the bandwidth allocated toeach transmitter, which naturally affects the perfor-mance of the physical layer in terms of successfullydetecting the desired signals. On the other hand, as aresult of transmission schedules, high packet delaysand/or low bandwidth can occur, forcing the rout-ing layer to change its route decisions. Differentrouting decisions alter the set of links to be sched-uled, and thereby influence the performance of theMAC layer. Furthermore, congestion control andpower control are also inherently coupled [32], asthe capacity available on each link depends on thetransmission power. Moreover, specifically to multi-media transmissions, the application layer does notrequire full insulation from lower layers, but needsinstead to perform source coding based on informa-tion from the lower layers to maximize the multime-dia performance. Existing solutions often do notprovide adequate support for multimedia applica-tions since the resource management, adaptation,and protection strategies available in the lower lay-ers of the stack are optimized without explicitly con-sidering the specific characteristics of multimediaapplications. Similarly, multimedia compressionand streaming algorithms do not consider the mech-anisms provided by the lower layers for error pro-tection and resource allocation [112].

The additional challenges brought about by real-time streaming of multimedia content in WMSNscall for new research on cross-layer optimizationand cross-layer design methodologies, to leverage

Page 32: Wireless Multimedia Networks Survey

CROSS LAYER CONTROL UNIT

(XLCU)

QoS Scheduler

UWB Transceiver

LOCALIZATIONSUBSYSTEM

NETWORKCOORDINATION

UNIT (NCU)

GeoRouting

Rate Control

Channel Coder Source Coder

SYNCHRONIZATIONSUBSYSTEM

Admission Control

TO NETWORK PEERS

FLOW REQUIREMENTS

CapacityAssignment

CodingRate

Next Hop Selection

Data Rate

QoS Contracts

CROSS LAYER CONTROL UNIT

(XLCU)

QoS SchedulerQoS Scheduler

UWB Transceiver

LOCALIZATIONSUBSYSTEM

NETWORKCOORDINATION

UNIT (NCU)

GeoRouting

Rate Control

Channel Coder Source CoderSource Coder

SYNCHRONIZATIONSUBSYSTEM

Admission Control

TO NETWORK PEERS

FLOW REQUIREMENTS

CapacityAssignment

CodingRate

Next Hop Selection

Data Rate

QoS Contracts

Fig. 10. Cross-layer communication architecture.

952 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

potential improvements of exchanging informationbetween different layers of the communicationstack. However, the increased complexity of func-tionalities needed to deliver QoS to multimediaapplications needs to be managed as well. In partic-ular, it is important to keep some form of logicalseparation of these functionalities to preserveupgradability and ease of design and testing. To thisaim, it is needed to specify standardized interfacesthat will allow leveraging these interactions.

Although a consistent amount of recent papershave focused on cross-layer design and improve-ment of protocols for WSNs, a systematic method-ology to accurately model and leverage cross-layerinteractions is still largely missing. Most of the exist-ing studies decompose the resource allocation prob-lem at different layers, and consider allocation of theresources at each layer separately. In most cases,resource allocation problems are treated either heu-ristically, or without considering cross-layer interde-pendencies, or by considering pairwise interactionsbetween isolated pairs of layers.

In [112], the cross-layer transmission of multime-dia content over wireless networks is formalized asan optimization problem. Several different app-roaches for cross-layer design of multimedia com-munications are discussed, including bottom-up

approach, where the lower layers try to insulatethe higher layers from losses and channel capacityvariations, and top-down, where the higher layerprotocols optimize their parameters at the nextlower layer. However, only single-hop networksare considered.

In [116], several techniques that provide significantperformance gains through cross-layer optimizationsare surveyed. In particular, the improvements ofadaptive link layer techniques such as adaptive mod-ulation and packet size optimization, joint allocationof capacity and flows (i.e., MAC and routing), jointscheduling and rate allocation, are discussed. Whilestill maintaining a strict layered architecture, it isshown how these cross-layer optimizations helpimprove the spectral efficiency at the physical layer,and the peak signal-to-noise ratio (PSNR) of thevideo stream perceived by the user. Clearly, energy-constrained multimedia sensors may need to leveragecross-layer interactions one step further. At the sametime, optimization metrics in the energy domain needto be considered as well.

We are currently developing a new cross-layercommunication architecture [83] whose objective isto reliably and flexibly deliver QoS to heterogeneous

applications in WMSNs, by carefully leveraging andcontrolling interactions among layers according tothe applications requirements. Its design is basedon the following principles:

• Network layer QoS support enforced by a cross-

layer controller. The proposed system providesQoS support at the network layer, i.e., it providespacket-level service differentiation in terms ofthroughput, end-to-end packet error rate, anddelay. This is achieved by controlling operationsand interactions of functionalities at the physical,MAC, and network layers, based on a unifiedlogic that resides on a cross-layer controller thatmanages resource allocation, adaptation, andprotection strategies based on the state ofeach functional block, as shown in Fig. 10. Theobjective of the controller is to optimize someobjective function, i.e., minimize energy con-sumption, while guaranteeing QoS requirementsto application flows. While all decisions arejointly taken at the controller, implementationof different functionalities is kept separate forease of design and upgradeability.

• UWB physical/MAC layer. The communicationarchitecture is based on an integrated TH-IR-UWB MAC and physical layer. Similarly toCDMA, TH-IR-UWB allows several transmis-sions in parallel. Conversely, typical MACprotocols for sensor networks, such as con-tention-based protocols based on CSMA/CA,require mutual temporal exclusion between neigh-boring transmitters. This allows devising MACprotocols with minimal coordination. WhileCDMA usually entails complex transceivers andcumbersome code assignment protocols, this is

Page 33: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 953

achievable with simple transceivers in TH-IR-UWB.

• Receiver-centric scheduling for QoS traffic. One ofthe major problems in multi-hop wireless environ-ments is that channel and interference vary withthe physical location of devices. For this reason,we believe that QoS provisioning should be basedon receiver-centric scheduling of packets. Thisway, the receiver can easily estimate the state ofthe medium at its side. Thus, it can optimally han-dle loss recovery and rate adaptation, therebyavoiding feedback overheads and latency, andbe responsive to the dynamics of the wireless linkusing the information obtained locally.

• Dynamic channel coding. Adaptation to interfer-ence at the receiver is achieved through dynamicchannel coding, which can be seen as an alterna-tive form of power control, as it modulates theenergy per bit according to the interference per-ceived at the receiver.

• Geographical forwarding. We leverage UWB’spositioning capabilities, to allow scalable geo-graphical routing. The routing paths are selectedby the cross-layer controller by applying anadmission control procedure that verifies thateach node on the path be able to provide therequired service level. The required packet errorrate and maximum allowed delay are calculatedat each step based on the relative advance of eachhop towards the destination.

• Hop-by-hop QoS contracts. End-to-end QoSrequirements are guaranteed by means of localdecision. Each single device that participates inthe communication process is responsible forlocally guaranteeing given performance objec-tives. The global, end-to-end requirement isenforced by the joint local behaviors of the par-ticipating devices.

• Multi-rate transmission. TH-IR-UWB allowsvarying the data rate at the physical layer, bymodifying the pulse repetition period. While thisfunctionality has not been fully explored so far,it is possible to devise adaptive systems that mod-ify the achievable data rate at the physical layerbased on the perceived interference and on therequired power consumption.

12. Other research issues

While most of the challenges in realizing a prac-tical implementation of WMSNs can be classified

into layer-specific considerations, there are alsoadditional areas that need to be addressed. Thissection discusses the impact of recent advances insensor-actuation, synchronization issues, spatiallocalization techniques, security and managementtools in the context of multimedia transmission.

12.1. Convergence of sensing and actuation

The challenges brought about by WMSNs arenot to be limited to resource allocation problems.In fact, WMSNs enable new application scenariosin synergy with other research areas. For example,Distributed Robotics [30] has been a hot researchtopic since the mid-1990s. In distributed robotics,a task is not completed by a single robot but by ateam of collaborating robots. Information aboutthe surrounding environment is usually gatheredby onboard sensors, and team members exchangesensor information to move or perform actions(e.g., collaborate to manipulate heavy objects). Asopposed to a single robot, a team of robots can per-ceive the environment from multiple disparate view-

points. In the recently proposed Wireless Sensor andActor Networks (WSANs) [20] paradigm, the abil-ity of the actors to perceive the environment canbe pushed one step further: a dense spatio-temporalsampling of the environment, provided by a pre-deployed sensor network, can be exploited by thewhole team of actors, thus increasing the ability ofthe team to accurately interact with the physicalenvironment. Furthermore, multimedia contentgathered by sensors can be used to provide the teamof actors with accurate vision from multiple per-spectives, while as of today collaborating actorsmostly rely on expensive onboard cameras.

Coordination and communication algorithms forstatic [86] and mobile [85] WSANs have been thefocus of our research in recent years. In [86], weintroduced a framework for communication andcoordination problems in WSANs. The notions ofsensor–actor coordination and actor–actor coordi-nation were introduced. The process of establishingdata paths between sensors and actors is referred toas sensor–actor coordination. Once an event hasbeen detected, actors coordinate to reconstruct it,to estimate its characteristics, and make a collabora-tive decision on how to perform the action. Thisprocess is referred to as actor–actor coordination.In [85], we introduced a location managementscheme to handle the mobility of actors with mini-mal energy consumption for resource-constrained

Page 34: Wireless Multimedia Networks Survey

954 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

sensors. The proposed scheme is designed to reducethe energy consumption with respect to existinglocalization services for ad hoc and sensor net-works. This is achieved through a combination oflocation updating and location prediction, whereactors broadcast location updates limiting theirscope based on Voronoi diagrams, while sensorspredict the movement of actors based on Kalmanfiltering of previously received updates.

Clearly, further research is needed to fully lever-age the opportunities offered by the integration ofactors and multimedia sensors in a wireless network.

12.2. Network synchronization

Time synchronization is difficult to achieve on anetwork-wide basis due to slow clock drift overtime, effect of temperature and humidity on clockfrequencies, coordination and correction amongstthousands of deployed nodes with low messagingoverhead, amongst others [121]. The need for accu-rate timing in WMSNs is stressed mainly for the fol-lowing two scenarios:

• In-network processing schemes are often used inWMSNs in order to reduce traffic load in the net-work. However, flows can only be aggregated atan intermediate node if the difference in thepacket generation times is within allowedbounds. We believe that limited synchronizationcould serve the requirements of WMSNs, inwhich, it needs to be enforced only along theroute chosen to the sink and for a specified lengthof time. Algorithms specially developed for sen-sor networks [82] can be easily adapted andlarge-scale coordination avoided.

• Real-time streaming needs clock synchronizationat the sender and receiver to prevent bufferunderflows and overflows. Phase Locked Loops(PLL) help in maintaining clock frequencies atthe two ends but an efficient digital implementa-tion of the Voltage Controlled Oscillator (VCO)is an important area of research.

12.3. Inter-media synchronization

Multimedia data returned by sensors can be het-erogeneous, and possibly correlated. As an example,video and audio samples could be collected over acommon geographical region, or still pictures maycomplement text data containing field measure-

ments. In such cases, the flow of two separate multi-media streams needs to be synchronized as the finaldata available at the end user will comprise of bothof these, played back under a joint timing con-straint. The task of coordinating such sequences iscalled multimedia synchronization. The problemof multimedia synchronization is cross-layer in nat-ure and influences primarily the physical layer andthe application layer. At the physical layer, datafrom different media are multiplexed over sharedwireless connections, or are stored in common phys-ical storage. The application layer is concerned withintermedia synchronization necessary for presenta-tion or playout, i.e., with the interactions betweenthe multimedia application and the various media.

The performance metrics for such inter-mediasynchronization differ from the cases in which onlyvoice or video streaming is desired. Synchronizationcan be applied to the playout of concurrent orsequential streams of data, and also to the externalevents generated by a human user. In [75], it isargued that the average instantaneous delay varia-tion (skew) best measures inter-media synchroniza-tion for continuous media. The maximum andminimum delay, used in typical real-time scheduling,can be effectively applied for discrete events associ-ated with timed playout of text, graphics, and stillimages. Minimizing the delay variation at each hopis a challenge yet unaddressed in the context of sen-sor networks, though the effects of intermedia skewcan be mitigated to an extent by dropping and dupli-cating frames of the different media streams so thatthey play back in unison at the receiver.

12.4. Localization

Determining the location of the sensor nodeswith respect to a common reference, or in the con-text of the object being monitored by them, is animportant aspect in multimedia applications. Cam-eras and microphones have limited field of opera-tion and hence reachability and coverage are twoimportant considerations that go into efficient shar-ing of monitoring tasks. Localization techniqueshelp in allocating resources to events, deciding sens-ing precision and ensuring complete monitoring ofthe area under study.

In [104], this common space is provided by auto-matically determining the relative 3D positions ofaudio sensors and actors through a closed formapproximate solution based on time of flight and timedifference of flight. This approach however, requires

Page 35: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 955

that audio signals emitted by each sensor does notinterfere during the localization process whichimplies a network-wide ordering or code assignment.

SensEye [69] uses a two-tier localization in whichcheap, low resolution cameras are used for objectdetection and higher resolution web-cameras areused for tracking. It uses two cameras with overlap-ping coverage to localize an object and compute itsCartesian (x,y,z) coordinates which, in turn, is usedto intelligently wake up other nodes or to determinethe trajectory of the object. This work assumes thatthe orientations of the cameras are known relativeto a global reference frame and achieved througha GPS like or distributed technique.

12.5. Network security

Security in WMSNs has recently caught theattention of the research community with increasingapplications of sensors in military use. While the useof stronger codes, watermarking techniques, encryp-tion algorithms, amongst others, have resulted insecured wireless communication, there are alto-gether different considerations in WMSNs.

As outlined in [71], a video sensor surveillancesystem may require in-network processing tech-niques to reduce the amount of information flowingin the network. At the aggregation point of theincoming streams, the packets would have to becompletely decoded and thus the computationalcomplexity of the security algorithms must be lowenough to allow real-time processing. There is hencea trade-off between providing enhanced security tothe data flow by adopting a higher order code atthe source video sensor and permissible multimediadelay requirements. Apart from devising effectivelight-weight coding techniques, we believe thatefforts in this area must be directed to leverage phys-ical layer strategies, as processing power on the bat-tery-powered nodes is likely to be limited.

The delta–sigma (DR) modulator for high-speedspeech processing is modified in [71] for simulta-neously digitizing and authenticating sensor read-ings. By exchanging simple keys, filter parameterscan be decided that are used to encode the generatedstream, thus proving to be an computationally inex-pensive scheme. However, this technique has severalpractical difficulties including modulator matchingbetween the sender and receiver and precision track-ing of the signal for accurate demodulation. Otherareas that need to be explored are watermarkingfor heterogeneous streams of voice and video appli-

cations. Scalar or voice data may be rendered invis-

ible by embedding it in frames of video images thusmaking eavesdropping difficult.

12.6. Network management

Multimedia network management, when appliedto sensor networks, can be considered as a function-ality that encompasses resources, theories, tools andtechniques to manipulate data provided by differentand possibly mixed media with the goal to extractrelevant information [74]. Taking into account theconcerns of WMSNs, we believe that the design ofsuch systems should be influenced by the followingfactors:

• Reduced hardware/application requirement. Toolsfor WMSNs may comprise of hand-held devicesthat are used by the network manager to conducton-site surveys. Light-weight application envi-ronments like the Java Platform, Micro Edition(Java ME) [8] or AJAX [98] could be used forlocal record manipulation as they considerablyreduce traffic between the source and the distantserver (or sink). Java ME provides support fornetworked and offline applications that can bedownloaded dynamically. For distant web-basedmonitoring, AJAX builds upon XML and can beused to perform simple, localized search andmodification queries with short length messagesexchanged between the source and sink. Thesetools may allow dynamic reassignment of goalsbased on perceived QoS or events of interest.

• Independence of platform/programming environ-

ment. Established proprietary tools like LabViewcannot be easily integrated with other languagesand there is a clear need of platform independent,general purpose monitoring tools. As a solution,BeanWatcher [74], specially devised for WMSNs,can work with several languages, such as Java,Java ME and C++. Besides, it provides somevisual components (e.g., thermometer, speedome-ter, gauge, and valued maps) to cover differenttypes of sensory data and accommodates the factthat different streams from different types ofapplications need to be treated differently.

13. Conclusions

We discussed the state of the art of research onWireless Multimedia Sensor Networks (WMSNs),

Page 36: Wireless Multimedia Networks Survey

956 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

and outlined the main research challenges. Algo-rithms, protocols, and hardware for the develop-ment of WMSNs were surveyed, and open researchissues discussed in detail. We classified currentlyoff-the-shelf hardware as well as available researchprototypes for WMSNs. Furthermore, we discussedexisting solutions and open research issues at theapplication, transport, network, link, and physicallayers of the communication stack, along with possi-ble cross-layer synergies and optimizations. Wepointed out how recent work undertaken inWyner–Ziv coding at the application layer, special-ized spatio-temporal transport layer solutions, delaybounded routing, multi-channel MAC protocols,and UWB technology, amongst others, seem mostpromising research directions in developing practicalWMSNs. We believe that this research area willattract the attention of many researchers and thatit will push one step further our ability to observethe physical environment and interact with it.

Acknowledgement

The authors would like to thank Vehbi C. Gun-gor, Dario Pompili, and Mehmet C. Vuran for theirvaluable comments.

References

[1] Acroname GARCIA Robotic Platform. <http://www.acro-name.com/garcia/garcia.html>.

[2] Advanced Video Coding for Generic Audiovisual Services,ITU-T Recommendation H.264.

[3] Crossbow. <http://www.xbow.com>.[4] Crossbow MICA2 Mote Specifications. <http://www.

xbow.com>.[5] Crossbow MICAz Mote Specifications. <http://www.xbow.

com>.[6] Crossbow TelosB Mote Specifications. <http://www.xbow.

com>.[7] IEEE 802.15 wpan task group 4 (tg4). <http://group-

er.ieee.org/groups/802/15/pub/TG4.html>.[8] Java Platform, Micro Edition. <http://java.sun.com/jav-

ame/index.jsp>.[9] Revision of Part 15 of the Commission’s Rules Regarding

Ultra-Wideband Transmission Systems. First note andOrder, Federal Communications Commission, ET-Docket98-153, adopted February 14, 2002, released April 22, 2002.

[10] The Stargate Platform. <http://www.xbow.com/Products/Xscale.htm>.

[11] Video Coding for Low Bit Rate Communication, ITU-TRecommendation H.263.

[12] JPEG2000 Requirements and Profiles, ISO/IEC JTC1/SC29/WG1 N1271, March 1999.

[13] A. Aaron, S. Rane, E. Setton, B. Girod, Transform-domainWyner–Ziv codec for video, in: Proc. of Society of Photo-

Optical Instrumentation Engineers – Visual Communica-tions and Image Processing, San Jose, CA, USA, January2004.

[14] A. Aaron, S. Rane, R. Zhang, B. Girod, Wyner–Ziv codingfor video: applications to compression and error resilience,in: Proc. of IEEE Data Compression Conf. (DCC),Snowbird, UT, March 2003, pp. 93–102.

[15] A. Aaron, E. Setton, B. Girod, Towards practical Wyner–Ziv coding of video, in: Proc. of IEEE Intl. Conf. on ImageProcessing (ICIP), Barcelona, Spain, September 2003.

[16] R. Agre, L.P. Clare, G.J. Pottie, N.P. Romanov, Develop-ment platform for self-organizing wireless sensor networks,in: Proc. of Society of Photo-Optical InstrumentationEngineers – Aerosense, Orlando, FL, USA, March 1999.

[17] O. Akan, I.F. Akyildiz, Event-to-sink reliable transport inwireless sensor networks, IEEE/ACM Trans. Network 13(5) (2005) 1003–1017.

[18] K. Akkaya, M. Younis, An energy-aware QoS routingprotocol for wireless sensor networks, in: Proc. of Intl.Conf. on Distributed Computing Systems Workshops(ICSDSW), Washington, DC, 2003.

[19] K. Akkaya, M. Younis, A survey of routing protocols inwireless sensor networks, Ad Hoc Network (Elsevier) 3 (3)(2005) 325–349.

[20] I.F. Akyildiz, I.H. Kasimoglu, Wireless sensor and actornetworks: Research challenges, Ad Hoc Networks (Else-vier) 2 (4) (2004) 351–367.

[21] I.F. Akyildiz, W.-Y. Lee, M.C. Vuran, S. Mohanty, Nextgeneration/dynamic spectrum access/cognitive radio wire-less networks: a survey, Comput. Networks (Elsevier) 50(13) (2006) 2127–2159.

[22] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci,Wireless sensor networks: a survey, Comput. Networks(Elsevier) 38 (4) (2002) 393–422.

[23] J.N. Al-Karaki, A.E. Kamal, Routing techniques in wire-less sensor networks: a survey, IEEE Wireless Commun. 11(6) (2004) 6–28.

[24] M.I. Alghamdi, T. Xie, X. Qin, PARM: A power-awaremessage scheduling algorithm for real-time wireless net-works, in: Proc. of ACM Workshop on Wireless Multi-media Networking and Performance Modeling (WMuNeP),Montreal, Que., Canada, 2005, pp. 86–92.

[25] Anuj Batra et al., Multi-band OFDM physical layerproposal for IEEE 802.15 Task Group 3a, IEEE P802.15Working Group for Wireless Personal Area Networks(WPANs), March 2004.

[26] R. Biswas, K.R. Chowdhury, D.P. Agrawal, Optimal data-centric attribute allocation and retrieval (DCAAR) schemefor wireless sensor networks, in: Proc. of IEEE Intl. Conf.on Mobile Ad-hoc and Sensor Systems (MASS), Washing-ton, DC, USA, November 2005.

[27] J.-Y.L. Boudec, P. Thiran, Network Calculus, LNCS, vol.2050, Springer-Verlag, New York, 2001.

[28] A. Boulis, M. Srivastava, Node-level energy managementfor sensor networks in the presence of multiple applications,in: Proc. of IEEE Intl. Conf. on Pervasive Computing andCommunications (PerCom), Dallas – Forth Worth, TX,USA, 2003, pp. 41–49.

[29] J. Campbell, P.B. Gibbons, S. Nath, P. Pillai, S. Seshan,R. Sukthankar, IrisNet: an Internet-scale architecture formultimedia sensors, in: Proc. of the ACM MultimediaConference, 2005.

Page 37: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 957

[30] Y.U. Cao, A.S. Fukunaga, A.B. Kahng, Cooperativemobile robotics: antecedents and directions, Autonom.Robots 4 (1997) 1–23.

[31] J. Capone, I. Stavrakakis, Delivering QoS requirements totraffic with diverse delay tolerances in a TDMA environ-ment, IEEE/ACM Trans. Network. 7 (1) (1999) 75–87.

[32] M. Chiang, Balancing transport and physical layers inwireless multihop networks: jointly optimal congestioncontrol and power control, IEEE J. Select. Areas Commun.23 (1) (2005) 104–116.

[33] K.R. Chowdhury, N. Nandiraju, D. Cavalcanti, D.P.Agrawal, CMAC-A multi-channel energy efficient MACfor wireless sensor networks, in: Proc. of IEEE WirelessCommunications and Networking Conference (WCNC),Las Vegas, NV, April 2006.

[34] D. Chu, K. Lin, A. Linares, G. Nguyen, J.M. Hellerstein,sdlib: a sensor network data and communications libraryfor rapid and robust application development, in: Proc. ofInformation Processing in Sensor Networks (IPSN), Nash-ville, TN, USA, April 2006.

[35] M. Chu, J.E. Reich, F. Zhao, Distributed attention forlarge video sensor networks, in: Proc. of the Institute ofDefence and Strategic Studies (IDSS), London, UK,February 2004.

[36] R. Cruz, A calculus for network delay. I. Network elementsin isolation, IEEE Trans. Informat. Theory 37 (1) (1991)114–131.

[37] R. Cucchiara, Multimedia surveillance systems, in: Proc. ofACM Intl. Workshop on Video Surveillance and SensorNetworks, Singapore, November 2005.

[38] S. Cui, R. Madan, A. Goldsmith, S. Lall, Energy-delaytradeoffs for data collection in TDMA-based sensornetworks, in: Proc. of IEEE Intl. Conf. on Communications(ICC), Seoul, Korea, May 2005, pp. 3278–3284.

[39] F. Cuomo, C. Martello, A. Baiocchi, F. Capriotti, Radioresource sharing for ad-hoc networking with UWB, IEEE J.Select. Areas Commun. 20 (9) (2002) 1722–1732.

[40] T.V. Dam, K. Langendoen, An adaptive energy-efficientMAC protocol for wireless sensor networks, in: Proc. of theACM Conf. on Embedded Networked Sensor Systems(SenSys), Los Angeles, CA, USA, November 2003.

[41] I. Downes, L.B. Rad, H. Aghajan, Development of a motefor wireless image sensor networks, in: Proc. of COGnitivesystems with Interactive Sensors (COGIS), Paris, France,March 2006.

[42] A. Dunkels, T. Voigt, J. Alonso, Making TCP/IP viable forwireless sensor networks, in: Proc. of European Workshopon Wireless Sensor Networks (EWSN), Berlin, Germany,January 2004.

[43] A. Dunkels, T. Voigt, J. Alonso, H. Ritter, DistributedTCP caching for wireless sensor networks, in: Proc. of theMediterranean Ad Hoc Networking Workshop (MedHoc-Net), June 2004.

[44] Y. Eisenberg, C.E. Luna, T.N. Pappas, R. Berry, A.K.Katsaggelos, Joint source coding and transmission powermanagement for energy efficient wireless video communi-cations, IEEE Trans. Circ. Syst. Video Technol. 12 (6)(2002) 411–424.

[45] E. Felemban, C.-G. Lee, E. Ekici, MMSPEED: Multipathmulti-SPEED protocol for QoS guarantee of reliability andtimeliness in wireless sensor networks, IEEE Trans. MobileComput. 5 (6) (2006) 738–754.

[46] W. Feng, B. Code, E. Kaiser, M. Shea, W. Feng, L. Bavoil,Panoptes: scalable low-power video sensor networkingtechnologies, in: Proc. of ACM Multimedia, Berkeley,CA, USA, November 2003.

[47] W. Feng, N.B.W. Feng, Dissecting the video sensinglandscape, in: Proc. of the ACM Workshop on Networkand Operating System Support for Digital Audio and Video(NOSSDAV), Stevenson, WA, USA, June 2005.

[48] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer,D. Culler, The nesC language: a holistic approach to net-work embedded systems, in: Proc. of the ACM SIGPLAN2003 Conf. on Programming Language Design and Imple-mentation (PLDI), San Diego, CA, USA, June 2003.

[49] S. Gezici, Z. Tian, G.B. Giannakis, H. Kobayashi, A.F.Molisch, H.V. Poor, Z. Sahinoglu, Localization via ultra-wideband radios, IEEE Signal Process. Mag. 22 (4) (2005)70–84.

[50] B. Girod, A. Aaron, S. Rane, D. Rebollo-Monedero,Distributed video coding, Proc. IEEE 93 (1) (2005) 71–83.

[51] B. Girod, M. Kalman, Y. Liang, R. Zhang, Advances inchannel-adaptive video streaming, Wireless Commun.Mobile Comput. 2 (6) (2002) 549–552.

[52] V. Gungor, O. Akan, I. Akyildiz, A real-time and reliabletransport protocol for wireless sensor and actor networks,submitted for publication.

[53] C. Guo, L.C. Zhong, J.M. Rabaey, Low power distributedMAC for Ad Hoc sensor radio networks, in: Proc. of theIEEE Global Communications Conference (GLOBE-COM), San Antonio, TX, November 2001.

[54] E. Gurses, O.B. Akan, Multimedia communication inwireless sensor networks, Ann. Telecommun. 60 (7–8)(2005) 799–827.

[55] S. Haykin, Cognitive radio: Brain-empowered wirelesscommunications, IEEE J. Select. Areas Commun. 23 (2)(2005) 201–220.

[56] T. He, J.A. Stankovic, C. Lu, T.F. Abdelzaher, A spatio-temporal communication protocol for wireless sensornetworks, IEEE Trans. Parallel Distr. Syst. 16 (10) (2005)995–1006.

[57] T. Herman, S. Tixeuil, A distributed TDMA slot assign-ment algorithm for wireless sensor networks, in: Proc. ofthe Workshop on Algorithmic Aspects of Wireless SensorNetworks (ALGOSENSORS), Finland, 2004.

[58] R. Holman, J. Stanley, T. Ozkan-Haller, Applying videosensor networks to nearshore environment monitoring,IEEE Perv. Comput. 2 (4) (2003) 14–21.

[59] F. Hu, S. Kumar, Multimedia query with QoS consider-ations for wireless sensor networks in telemedicine, in: Proc.of Society of Photo-Optical Instrumentation Engineers –Intl. Conf. on Internet Multimedia Management Systems,Orlando, FL, September 2003.

[60] Y.G. Iyer, S. Gandham, S. Venkatesan, STCP: a GenericTransport Layer Protocol for Wireless Sensor Networks, in:Proc. of IEEE Intl. Conf. on Computer Communicationsand Networks (ICCCN), San Diego, CA, USA, October2005, pp. 449–454.

[61] D. James, L. Klibanov, G. Tompkins, S.J. Dixon-Warren,Inside CMOS Image Sensor Technology. Chipworks WhitePaper. <http://www.chipworks.com/resources/whitepapers/Inside-CMOS.pdf>.

[62] S.K. Jayaweera, An energy-efficient virtual MIMO com-munications architecture based on V-BLAST processing for

Page 38: Wireless Multimedia Networks Survey

958 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

distributed wireless sensor networks, in: Proc. of IEEEIntl. Conf. on Sensor and Ad-hoc Communications andNetworks (SECON), Santa Clara, CA, USA, October2004.

[63] S.K. Jayaweera, M.L. Chebolu, Virtual MIMO anddistributed signal processing for sensor networks – anintegrated approach, in: Proc. of IEEE Intl. Conf. onCommunications (ICC), Seoul, Korea, May 2005.

[64] X. Jiang, J. Polastre, D. Culler, Perpetual environmentallypowered sensor networks, in: Proc. of IEEE Workshop onSensor Platform, Tools and Design Methods for Net-worked Embedded Systems (SPOTS), Los Angeles, CA,USA, April 2005.

[65] S. Kim, R. Fonseca, D. Culler, Reliable transfer in wirelesssensor networks, in: Proc. of IEEE Intl. Conf. on Sensorand Ad-hoc Communications and Networks (SECON),Santa Clara, CA, October 2004.

[66] J. Koshy, R. Pandey, VM*: Synthesizing scalable runtimeenvironments for sensor networks, in: Proc. of the ACMConf. on Embedded Networked Sensor Systems (SenSys),San Diego, CA, November 2005.

[67] K. Kredo, P. Mohapatra, Medium access control inwireless sensor networks, Comput. Networks (Elsevier), inpress, doi:10.1016/j.comnet.2006.06.012.

[68] P. Kulkarni, D. Ganesan, P. Shenoy, The case for multi-tiercamera sensor network, in: Proc. of the ACM Workshop onNetwork and Operating System Support for Digital Audioand Video (NOSSDAV), Stevenson, WA, USA, June 2005.

[69] P. Kulkarni, D. Ganesan, P. Shenoy, Q. Lu, SensEye: amulti-tier camera sensor network, in: Proc. of ACMMultimedia, Singapore, November 2005.

[70] S.S. Kulkarni, M. Arumugam, TDMA service for sensornetworks, in: Proc. of the Intl. Conf. on DistributedComputing Systems Workshops (ICDCSW), Washington,DC, USA, 2004, pp. 604–609.

[71] D. Kundur, T. Zourntos, N.J. Mathai, Lightweight securityprinciples for distributed multimedia based sensor net-works, in: Proc. of Asilomar Conf. on Signals, Systems andComputers, November 2004, pp. 368–372.

[72] P. Kyasanur, N.H. Vaidya, Capacity of multi-channelwireless networks: Impact of number of channels andinterfaces, in: Proc. of ACM Intl. Conf. on MobileComputing and Networking (MobiCom), Cologne,Germany, August 2005.

[73] Q. Li, D. Rus, Global clock synchronization in sensornetworks, IEEE Trans. Comput. 55 (2) (2006) 214–226.

[74] A. Lins, E.F. Nakamura, A.A.F. Loureiro, C.J.N. CoelhoJr., BeanWatcher: a tool to generate multimedia monitoringapplications for wireless sensor networks, in: Proc. of IFIP/IEEE Intl. Conf. on Management of Multimedia Networksand Services, 2003, pp. 128–141.

[75] T.D.C. Little, A framework for synchronous delivery oftime-dependent multimedia data, J. Multimedia Syst. 1 (2)(1993) 87–94.

[76] S. Lu, V. Bharghavan, R. Srikant, Fair scheduling inwireless packet networks, IEEE/ACM Trans. Network. 7(4) (1999) 473–489.

[77] X. Lu, E. Erkip, Y. Wang, D. Goodman, Power efficientmultimedia communication over wireless channels, IEEE J.Select. Areas Commun. 21 (10) (2003) 1738–1751.

[78] X. Luo, K. Zheng, Y. Pan, Z. Wu, A TCP/IP implemen-tation for wireless sensor networks, in: Proc. of IEEE Conf.

on Systems, Man and Cybernetics, October 2004, pp. 6081–6086.

[79] S. Mao, D. Bushmitch, S. Narayanan, S.S. Panwar, MRTP:a multiflow real-time transport protocol for Ad Hocnetworks, IEEE Trans. Multimedia 8 (2) (2006) 356–369.

[80] C.B. Margi, V. Petkov, K. Obraczka, R. Manduchi,Characterizing energy consumption in a visual sensornetwork testbed, in: Proc. of IEEE/Create-Net Intl. Conf.on Testbeds and Research Infrastructures for the Develop-ment of Networks and Communities (TridentCom), Barce-lona, Spain, March 2006.

[81] D. McIntire, Energy Benefits of 32-bit MicroprocessorWireless Sensing Systems, Sensoria Corporation WhitePaper.

[82] L. Meier, P. Blum, L. Thiele, Interval synchronization ofdrift-constraint clocks in Ad-Hoc sensor networks, in: Proc.of ACM Intl. Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc), Tokyo, Japan, May 2004.

[83] T. Melodia, I.F. Akyildiz, Cross-layer QoS support forUWB wireless multimedia sensor networks, in preparation.

[84] T. Melodia, D. Pompili, I.F. Akyildiz, On the interdepen-dence of distributed topology control and geographicalrouting in ad hoc and sensor networks, IEEE J. Select.Areas Commun. 23 (3) (2005) 520–532.

[85] T. Melodia, D. Pompili, I.F. Akyildiz, A communicationarchitecture for mobile wireless sensor and actor networks,in: Proc. of IEEE Intl. Conf. on Sensor and Ad-hocCommunications and Networks (SECON), Reston, VA,USA, September 2006.

[86] T. Melodia, D. Pompili, V.C. Gungor, I.F. Akyildiz, Adistributed coordination framework for wireless sensor andactor networks, in: Proc. of ACM Intl. Symposium onMobile Ad Hoc Networking and Computing (MobiHoc),Urbana-Champaign, IL, May 2005.

[87] R. Merz, J. Widmer, J.-Y.L. Boudec, B. Radunovic, A jointPHY/MAC architecture for low-radiated power TH-UWBwireless ad-hoc networks, Wireless Commun. MobileComput. J. 5 (5) (2005) 567–580.

[88] M.J. Miller, N.H. Vaidya, A MAC protocol to reducesensor network energy consumption using a wakeup radio,IEEE Trans. Mobile Comput. 4 (3) (2005) 228–242.

[89] A. Mishra, S. Banerjee, W. Arbaugh, Weighted coloringbased channel assignment for WLANs, Mobile Comput.Commun. Rev. 9 (3) (2005) 19–31.

[90] S. Misra, M. Reisslein, G. Xue, A survey of multimediastreaming in wireless sensor networks, submitted forpublication.

[91] P.D. Mitcheson, T.C. Green, E.M. Yeatman, A.S. Holmes,Architectures for vibration-driven micropower generators,J. Microelectromech. Syst. 13 (3) (2004) 429–440.

[92] M. Miyabayashi, N. Wakamiya, M. Murata, H. Miyahara,MPEG-TFRCP: Video transfer with TCP-friendly ratecontrol protocol, in: Proc. of IEEE Intl. Conf. onCommunications (ICC), Helsinki, June 2001.

[93] S. Nath, Y. Ke, P.B. Gibbons, B. Karp, S. Seshan, Adistributed filtering architecture for multimedia sensors,Intel Research Technical Report IRP-TR-04-16, August2004.

[94] T. Nguyen, S. Cheung, Multimedia streaming with multipleTCP connections, in: Proc. of the IEEE Intl. PerformanceComputing and Communications Conference (IPCCC),Arizona, April 2005.

Page 39: Wireless Multimedia Networks Survey

I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 959

[95] E. Ould-Ahmed-Vall, D. Blough, B.S. Heck, G.F. Riley,Distributed global identification for sensor networks, in:Proc. of IEEE Intl. Conf. on Mobile Ad-hoc and Sen-sor Systems (MASS), Washington, DC, USA, November2005.

[96] J. Paradiso, T. Starner, Energy scavenging for mobile andwireless electronics, IEEE Perv. Comput. 4 (1) (2005) 18–27.

[97] M. Perillo, W. Heinzelman, Sensor management policies toprovide application QoS, Ad Hoc Networks (Elsevier) 1 (2–3) (2003) 235–246.

[98] B.W. Perry, Ajax Hacks, O’Reilly Publications, 2006.[99] G.J. Pottie, W.J. Kaiser, Wireless integrated network

sensors, Commun. ACM 43 (May) (2000) 51–58.[100] R. Puri, K. Ramchandran, PRISM: a new robust video

coding architecture based on distributed compressionprinciples, in: Proc. of the Allerton Conf. on Communica-tion, Control, and Computing, Allerton, IL, USA, October2002.

[101] V. Raghunathan, S. Ganeriwal, M. Srivastava, C. Schur-gers, Energy efficient wireless packet scheduling and fairqueuing, ACM Trans. Embedded Comput. Syst. 3 (1)(2004) 3–23.

[102] M. Rahimi, S. Ahmadian, D. Zats, J. Garcia, M. Srivast-ava, D. Estrin, Deep vision: Experiments in exploitingvision in wireless sensor networks, submitted forpublication.

[103] M. Rahimi, R. Baer, O. Iroezi, J. Garcia, J. Warrior,D. Estrin, M. Srivastava, Cyclops: in situ image sensing andinterpretation in wireless sensor networks, in: Proc. of theACM Conf. on Embedded Networked Sensor Systems(SenSys), San Diego, CA, November 2005.

[104] V.C. Raykar, I. Kozintsev, R. Lienhart, Position calibra-tion of microphones and loudspeakers in distributedcomputing platforms, IEEE Trans. Speech Audio Process.13 (1) (2005) 70–83.

[105] J. Reed, Introduction to Ultra Wideband CommunicationSystems, Prentice Hall, Englewood Cliffs, NJ, 2005.

[106] A.A. Reeves, Remote Monitoring of patients suffering fromearly symptoms of Dementia, in: Intl. Workshop onWearable and Implantable Body Sensor Networks, Lon-don, UK, April 2005.

[107] A. Rowe, C. Rosenberg, I. Nourbakhsh, A low costembedded color vision system, in: Proc. of the IEEE/RSJIntl. Conf. on Intelligent Robots and Systems (IROS),Lausanne, Switzerland, October 2002.

[108] Y. Sankarasubramaniam, I.F. Akyildiz, S.W. McLaughlin,Energy efficiency based packet size optimization in wirelesssensor networks, in: Proc. of IEEE Sensor NetworkProtocols and Applications (SNPA), Anchorage, Alaska,USA, April 2003, pp. 1–8.

[109] C. Santivanez, I. Stavrakakis, Study of various TDMAschemes for wireless networks in the presence of deadlinesand overhead, IEEE J. Select. Areas Commun. 17 (7) (1999)1284–1304.

[110] L. Savidge, H. Lee, H. Aghajan, A. Goldsmith, QoS-basedgeographic routing for event-driven image sensor networks,in: Proc. of IEEE/CreateNet Intl. Workshop on BroadbandAdvanced Sensor Networks (BaseNets), Boston, MA,October 2005.

[111] A. Savvides, L. Girod, M. Srivastava, D. Estrin, Localiza-tion in sensor networks, in: T.Z.C.S. Raghavendra, K.M.

Sivalingam (Eds.), Wireless Sensor Networks, Springer,New York, 2005.

[112] M.V.D. Schaar, S. Shankar, Cross-layer wireless multi-media transmission: Challenges, principles and new para-digms, IEEE Wireless Commun. 12 (4) (2005) 50–58.

[113] R.A. Scholtz, N.Z. Win, Impulse radio: How it works,IEEE Commun. Lett. 2 (2) (1998) 36–38.

[114] C. Schurgers, V. Tsiatsis, S. Ganeriwal, M. Srivastava,Optimizing sensor networks in the energy-latency-densitydesign space, IEEE Trans. Mobile Comput. 1 (1) (2002) 70–80.

[115] C. Schurgers, V. Tsiatsis, S. Ganeriwal, M. Srivastava,Topology management for sensor networks: Exploitinglatency and density, in: Proc. of ACM Intl. Conf. onMobile Computing and Networking (MobiCom), Atlanta,GA, USA, September 2002.

[116] E. Setton, T. Yoo, X. Zhu, A. Goldsmith, B. Girod, Cross-layer design of ad hoc networks for real-time videostreaming, IEEE Wireless Commun. 12 (4) (2005) 59–65.

[117] D. Slepian, J. Wolf, Noiseless coding of correlated infor-mation sources, IEEE Trans. Informat. Theory 19 (4)(1973) 471–480.

[118] S. Soro, W.B. Heinzelman, On the coverage problem invideo-based wireless sensor networks, in: Proc. of the IEEEIntl. Conf. on Broadband Communications, Networks, andSystems (BroadNets), Boston, MA, USA, October 2005.

[119] F. Stann, J. Heidemann, RMST: Reliable data transport insensor networks, in: Proc. of IEEE Sensor NetworkProtocols and Applications (SNPA), Anchorage, Alaska,USA, April 2003, pp. 102–112.

[120] H. Stockdon, R. Holman, Estimation of wave phase speedand nearshore bathymetry from video imagery, J. Geophys.Res. 105 (C9) (2000) 22,015–22,033.

[121] B. Sundararaman, U. Buy, A. Kshemkalyani, Clocksynchronization for wireless sensor networks: a survey,Ad Hoc Networks (Elsevier) 3 (3) (2005) 281–323.

[122] K. Sundaresan, V. Anantharaman, H.-Y. Hsieh, R.Sivakumar, ATP: a reliable transport protocol for AdHoc networks, IEEE Trans. Mobile Comput. 4 (6) (2005)588–603.

[123] J. Tourrilhes, Packet frame grouping: Improving IP multi-media performance over CSMA/CA, in: Proc. of IEEEConf. on Universal Personal Communications, Florence,Italy, October 1998.

[124] T. Vu, D. Reschke, W. Horn, Dynamic packet sizemechanism (DPSM) for multimedia in wireless networks,in: Proc. of Multimediale Informations- und Kommunika-tions systeme (MIK), Erfurt, Germany, September 2002.

[125] M.C. Vuran, O.B. Akan, I.F. Akyildiz, Spatio-temporalcorrelation: Theory and applications for wireless sensornetworks, Comput. Networks (Elsevier) 45 (3) (2004) 245–259.

[126] M.C. Vuran, I.F. Akyildiz, Cross-layer analysis of errorcontrol in wireless sensor networks, in: Proc. of IEEE Intl.Conf. on Sensor and Ad-hoc Communications and Net-works (SECON), Reston, VA, USA, September 2006.

[127] C.Y. Wan, A.T. Campbell, L. Krishnamurthy, PSFQ: areliable transport protocol for wireless sensor networks, in:Proc. of ACM Workshop on Wireless Sensor Networks andApplications (WSNA), Atlanta, GA, September 2002.

[128] C.Y. Wan, S.B. Eisenman, A.T. Campbell, CODA: Con-gestion detection and avoidance in sensor networks, in:

Page 40: Wireless Multimedia Networks Survey

960 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960

Proc. of the ACM Conf. on Embedded Networked SensorSystems (SenSys), Los Angeles, CA, November 2003.

[129] M.Z. Win, R.A. Scholtz, Ultra-wide bandwidth time-hopping spread-spectrum impulse radio for wireless multi-ple-access communication, IEEE Trans. Commun. 48 (4)(2000) 679–689.

[130] A. Wyner, J. Ziv, The rate-distortion function for sourcecoding with side information at the decoder, IEEE Trans.Informat. Theory 22 (January) (1976) 1–10.

[131] Z. Xiong, A.D. Liveris, S. Cheng, Distributed sourcecoding for sensor networks, IEEE Signal Process. Mag.21 (September) (2004) 80–94.

[132] L. Yang, G.B. Giannakis, Ultra-wideband communica-tions: an idea whose time has come, IEEE Signal Process.Mag. 3 (6) (2004) 26–54.

[133] W. Ye, J. Heidemann, D. Estrin, Medium access controlwith coordinated, adaptive sleeping for wireless sensornetworks, IEEE Trans. Network. 12 (3) (2004) 493–506.

[134] K. Yeung, 802.11a Modeling and MAC enhancements forhigh speed rate adaptive networks, Technical ReportUCLA, 2002.

[135] W. Yu, Z. Sahinoglu, A. Vetro, Energy-efficient JPEG 2000image transmission over wireless sensor networks, in: Proc.of IEEE Global Communications Conference (GLOBE-COM), Dallas, TX, USA, January 2004, pp. 2738–2743.

[136] X. Zhu, B. Girod, Distributed rate allocation for multi-stream video transmission over ad hoc networks, in: Proc.of IEEE Intl. Conf. on Image Processing (ICIP), Genoa,Italy, September 2005, pp. 157–160.

[137] B. Zitova, J. Flusser, Image registration methods: a survey,Image Vis. Comput. 21 (11) (2003) 977–1000.

[138] M. Zuniga, B. Krishnamachari, Integrating future large-scale sensor networks with the Internet, USC ComputerScience Technical Report CS 03-792, 2003.

Ian F. Akyildiz received the B.S., M.S.,and Ph.D. degrees in Computer Engi-neering from the University of Erlangen-Nuernberg, Germany, in 1978, 1981 and1984, respectively.

Currently, he is the Ken Byers Distin-guished Chair Professor with the Schoolof Electrical and Computer Engineering,Georgia Institute of Technology, Atlanta,and Director of Broadband and WirelessNetworking Laboratory. He is an Editor-

in-Chief of Computer Networks Journal (Elsevier) as well as thefounding Editor-in-Chief of the Ad Hoc Network Journal (Else-

vier). His current research interests are in next generation wirelessnetworks, sensor networks, and wireless mesh networks.

He received the ‘‘Don Federico Santa Maria Medal’’ for hisservices to the Universidad of Federico Santa Maria, in 1986. From1989 to 1998, he served as a National Lecturer for ACM andreceived the ACM Outstanding Distinguished Lecturer Award in1994. He received the 1997 IEEE Leonard G. Abraham Prize

Award (IEEE Communications Society) for his paper entitled‘‘Multimedia Group Synchronization Protocols for IntegratedServices Architectures’’ published in the IEEE Journal of SelectedAreas in Communications (JSAC) in January 1996. He received the2002 IEEE Harry M. Goode Memorial Award (IEEE ComputerSociety) with the citation ‘‘for significant and pioneering contri-butions to advanced architectures and protocols for wireless andsatellite networking’’. He received the 2003 IEEE Best TutorialAward (IEEE Communication Society) for his paper entitled ‘‘ASurvey on Sensor Networks,’’ published in IEEE CommunicationsMagazine, in August 2002. He also received the 2003 ACM Sig-mobile Outstanding Contribution Award with the citation ‘‘forpioneering contributions in the area of mobility and resourcemanagement for wireless communication networks’’. He receivedthe 2004 Georgia Tech Faculty Research Author Award for his‘‘outstanding record of publications of papers between 1999–2003’’. He also received the 2005 Distinguished Faculty Achieve-ment Award from School of ECE, Georgia Tech. He has been aFellow of the Association for Computing Machinery (ACM) andIEEE Fellow since 1996.

Tommaso Melodia received his Laureaand Doctorate degrees in Telecommuni-cations Engineering from the Universityof Rome ‘‘La Sapienza’’, Rome, Italy, in2001 and 2005, respectively. He is cur-rently pursuing his Ph.D. in Electricaland Computer Engineering whileworking as a research assistant at theBroadband and Wireless NetworkingLaboratory (BWN-Lab), Georgia Insti-tute of Technology, Atlanta, under the

guidance of Dr. Ian F. Akyildiz. His main research interests are inwireless sensor and actor networks, wireless multimedia sensor

networks, underwater acoustic sensor networks, and in wirelessnetworking in general. He is the recipient of the BWN-LabResearcher of the Year 2004 award.

Kaushik R. Chowdhury received his B.E.degree in Electronics Engineering withdistinction from VJTI, Mumbai Univer-sity, India, in 2003. He received his M.S.degree in computer science from theUniversity of Cincinnati, OH, in 2006,graduating with the best thesis award.He is currently a Research Assistant inthe Broadband and Wireless NetworkingLaboratory and pursuing his Ph.D.degree at the School of Electrical and

Computer Engineering, Georgia Institute of Technology,Atlanta, GA. His current research interests include multichannel

medium access protocols, dynamic spectrum management, andresource allocation in wireless multimedia sensor networks. He isa student member of IEEE.