MultimediaStreaming_bookchapter_proof

CHAPTER 9

MULTIMEDIA STREAMING IN MOBILEWIRELESS NETWORKS

SANJEEV VERMANokia Research CenterTampere, Finland

MUHAMMAD MUKARRAM BIN TARIQDocoMo Communication Laboratories USA, Inc.San Jose, California

TAKESHI YOSHIMURAMultimedia Laboratories, NTT DoCoMo, Inc.Yokosuka, Kanagawa, Japan

TAO WUNokia Research CenterBurlington, Massachusetts

9.1 INTRODUCTION

Multimedia services, such as streaming applications, are growing in popularity with

advances in compression technology, high-bandwidth storage devices, and high-

speed access networks. Streaming services are generally used in applications like

multimedia information and message retrieval, video on demand, and pay TV.

Also, there has been growing popularity of portable devices, such as notebook com-

puters, PDAs, and mobile phones in recent years. Now it is possible to provide very

high-speed access to portable devices with emerging technologies like WLAN and

3G networks. For instance, emerging 3G wireless technologies provide data rates of

144 kbps for vehicular, 384 kbps for pedestrian, and 2 Mbps for indoor environ-

ments [1,2]. Hence, it is now possible to enrich the end user’s experience by com-

bining multimedia services [3,4] with mobile-specific services such as geographic

positioning, user profiling, and mobile payment. One example of such service

is “mobile cinema ticketing,” which uses geographic positioning and user-defined

Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao WuISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

preferences to offer a mobile user a selection of movies from nearby movie theatres.

A user views corresponding movie trailers through a streaming service before select-

ing a movie and purchasing a ticket.

Streaming services are services in which continuous video and audio data are

delivered to an end user. A multimedia streaming service consists of one or more

media streams. A multimedia streaming application may have both audio and

video components (e.g., news reviews, movie trailers) or it may have audio stream-

ing with visual presentation comprising still images and or graphics animations,

such as a quarterly Webcast of earnings by corporations. These applications are gen-

erally stored at a Web-based server and streamed to clients on request. Streaming

audiovideo clips are sufficiently large, which makes their transmission time

longer (several minutes or longer) than the acceptable playback latency. Hence,

downloading the entire audiovideo content before its playback is not an option.

The streaming audiovideo clips are played out while parts of the clips are being

received and decoded. This is the biggest advantage of streaming service, since a

user is able to see video soon after downloading begins.

Figure 9.1 illustrates a general architecture for providing streaming services [5].

The multimedia content for streaming services is created from one or more media

sources (videocamera, microphone, etc). It can also be created synthetically

without using any natural media source. Examples of synthetically generated multi-

media contents are computer-generated graphics and digitally generated music.

Typically, the storage space required for raw multimedia content can be huge.

The multimedia content is digitally edited and compressed in order to provide attrac-

tive multimedia retrieval services over low-speed modem connections. The edited

Figure 9.1 A general architecture designed to provide streaming services.

276 MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS

and compressed multimedia clips are then stored in storage devices at the server. On

receiving a request from the client, the streaming server retrieves the compressed

multimedia clip from storage devices and the application layer QoS module

adapts the multimedia stream based on the QoS feedback at the application layer.

After adaptation at the application layer, transport protocols packetize the com-

pressed multimedia clips and send them over the Internet. The packets may suffer

losses and accumulate delay jitter while traversing the Internet. To further

improve the QoS, continuous media distribution services (e.g., caching) may be

deployed in the Internet. The successfully delivered media packets are decom-

pressed and decoded at the client end. Compensation or playout buffers are deployed

at the terminal end to mitigate the impact of delay jitter in the Internet and to achieve

seamless QoS. Clients also use media synchronization mechanisms to achieve syn-

chronization across different media streams, for example, between audio and video

streams.

There are several challenges in providing streaming services in wireless environ-

ments due to some issues that are specific to these environments (see Fig. 9.2). For

example, wireless terminals typically have power constraints due to battery power.

Also, they have limited buffering and processing power available due to size and

power constraints. In addition, wireless environments are very harsh. The character-

istics of a wireless channel have a very unpredictable time-varying behavior due to

several factors such as interference, multipath fading, and atmospheric conditions.

This results in more delay jitter, more delay, and higher error rates, compared to

that in wired networks. Moreover, the mobility or the movement of a mobile user

from one cell to another cell introduces additional uncertainty. The movement trig-

gers a handoff mechanism to minimize interruption to an ongoing session. The wire-

less channel characteristics may be entirely different in a new cell after handoff. The

access point (typically a basestation) of the mobile host to the wired network also

changes after the handoff. This results in the establishment of entirely new route

in the wired network. The new route in the fixed network may have very different

path characteristics. This problem becomes even more severe as wireless networks

Limited Resources

Mobile Terminals Wireless Environments

Harsh Environments•Power constraints•Limited storage•Limited processing

•High error rate•Large and variable delay•Expensive spectrum

Figure 9.2 Constraints in wireless environments.

9.1 INTRODUCTION 277

are being implemented using smaller cell sizes (microcell) to allow higher system

capacity. Microcell implementation results in rapid handoff rates, causing even

wider variation in path characteristics. These issues have some implications in

providing streaming services in mobile environments. Streaming architecture in

wireless/mobile environments should ensure minimum processing at the mobile

terminal end. For instance, a typical approach used by streaming applications

regarding QoS adaptations at the application layer may not be suitable in wireless

environments. Adaptation at the application layer involves a lot of end-to-end sig-

naling, which may eat away precious resources at the terminal end. Also, it is

very difficult for mobile terminals with very limited processing and buffering capa-

bility to adapt at the application layer. The wireless network should have built-in

networkwide mechanisms to minimize the resource and processing requirements

at the mobile terminals. The overall design goal of wireless access architecture

should be to “make networks friendly to applications” rather than “make appli-

cations friendly to networks.”

In the remainder of this chapter, we will describe the different components and

protocols that are constituents of the streaming architecture. First, we go over

various QoS issues to support streaming services in general. We then give an over-

view of various codecs and media types that constitute an important component of

multimedia streaming architecture. Next, we describe a general architecture to

implement streaming services in mobile environments. We first review the different

architectural components that support these services in wireless/mobile environ-

ments. In subsequent sections, we give an overview of key protocols and languages

used for streaming multimedia delivery and provide an overview of their working

and example usage. We then describe packet-switched streaming service architec-

ture developed by 3GPP (referred as 3GPP-PSS) since it is the most mature stan-

dardization activity in this field. Most likely, the 3GPP2 architectural solution

will also be on similar lines. Next, we discuss research issues and related work in

providing multimedia services in mobile and wireless environments. Finally, we

summarize and look into the future trends in supporting multimedia services in

broadband wireless access networks.

9.2 QOS ISSUES FOR STREAMING APPLICATIONS

Streaming applications are real-time noninteractive applications. Also, they involve

one-way delivery of streaming data from the server to the client. Because of their

real-time nature, these applications typically have bandwidth, delay jitter, and

loss requirements. We first discuss the QoS parameters that are specifically very

important for the streaming applications and then the QoS control mechanisms at

application and lower layers.

Delay jitter [6] is particularly important for these applications. The delay jitter

bound for a session is calculated as the difference between the largest and smallest

delays incurred by packets belonging to the session. A client (receiver) should

choose playback instants so that, when it is ready to output the information

contained in the packet, the packet has already arrived. If the delay jitter over a


network connection is bounded, the receiver can eliminate delay variations in the

network by delay jitter bound in a playout or compensation buffer (see Fig. 9.3).

Subsequent packets are then scheduled for transmission according to the rate at

which they are generated at the sender. The packets, which arrive earlier than

their scheduled playout time, wait in the playout buffer. Thus, the larger the delay

jitter bound, the larger the playout buffer required at the receiver to maintain con-

stant quality. For a given delay jitter bound, the required playout buffer size is the

product of delay jitter bound and the playback rate. Figure 9.4 illustrates the

Constant bit-rate

Cum

ulat

ive

data

time

variablenetworkdelay(jitter)

Clientreception

Client playoutdelay

buffe

red

data

Constant bit rateplayout at client

transmission

Figure 9.3 Client-side buffering: playout delay compensate for network-induced delay-

jitter.

TimeB=KI

Playout after holdingfor delay-jitter bound

Actual delay for 1st packet

Earliest arrival time for kth Packet

Latest arrival time for kth Packet

d1

J

Jdk

Server(sender)

Client(receiver)

Figure 9.4 Delay jitter removal at the client end.

9.2 QOS ISSUES FOR STREAMING APPLICATIONS 279

removal of delay jitter at a client. Although the output rate of the server could vary

with time, for simplicity we assume that the server is generating packets at the con-

stant rate with equal spacing every I seconds. The receiver delays the first packet by

the delay jitter bound J and then plays out packets with the same spacing as they are

generated. Suppose that the first packet arrives at the receiver d1 seconds after the

transmission and is further delayed by an amount equal to the delay jitter bound J

in playout buffer. The Kth packet is generated after B ¼ KI seconds, and this

packet will incur a delay between dk (fixed delay, mainly propagation delay)

seconds and (dkþ J) seconds. Since the client plays back the packets with the

same spacing as when they were generated, the Kth packet will be scheduled for

playout at (d1 þ Jþ B) seconds. Since d1 . ¼ dk, the latest arrival of the kth

packet, (dkþ Jþ B) is guaranteed to be before the scheduled time. Thus by delaying

packets in the playout buffer for delay jitter bound, the receiver can eliminate jitter

in the arrival stream, and guarantee that a packet has already arrived by the time the

client is ready to play it.

Note that the playout buffer is useful only to absorb short-term delay

variations. The more data are initially buffered, the wider are the variations that

can be absorbed, but higher startup playback latency is experienced at the client

end. The maximum allowable buffering is determined by the acceptable delay

latency.

Another important QoS parameter for a streaming application is the error rate.

Although streaming applications can tolerate some loss, the error rate beyond a

threshold can degrade the quality of the delivered streaming data significantly. To

maintain reasonably good quality of played-back stream, a proper error control

mechanism is needed to recover packets before their scheduled playback time.

The well-known techniques to minimize error for streaming traffic are FEC, inter-

leaving, and redundant retransmissions. In addition, the lost packets can be recov-

ered through limited retransmissions. This necessitates buffering at the client end

to allow for retransmissions.

Now we look into the specific QoS control mechanisms at the application and

lower layers to achieve the QoS needs of multimedia streaming applications.

9.2.1 Application Layer QoS Control

The goal of the application layer QoS control is to adapt at the application layer in

order to provide acceptable quality streaming service to the end user in the presence

of packet loss and congestion in the network. We note here that the Internet in its

current form is a best-effort network and does not provide networkwide QoS

support. Thus the available bandwidth is not known in advance and varies with

time. The packets may suffer variable delay and come out of order at the client

end. Clients need to adapt at the application layer in order to receive good-quality

streaming service. The application layer QoS control techniques include end-

to-end congestion and error control. These techniques are employed by the end

systems and do not assume any support from the network.


9.2.1.1 Congestion Control and Quality AdaptationThe Internet in its very rudimentary form provides a transport network that delivers

packets from one point to another. It provides a shared environment, and its stability

depends on the end systems implementing appropriate congestion control algor-

ithms. The end-to-end congestion control algorithms help to reduce packet loss

and delay in the network. Unfortunately, it is not possible for streaming applications

to implement end-to-end congestion control algorithms since stored multimedia

applications typically have intrinsic transmission rates. Streaming applications are

rate-based and typically transmit data with a near-constant rate or loosely adjust

their transmission rate on long timescales since the required rate for being well

behaved is not compatible with their nature. For streaming applications, congestion

control takes the form of rate control that attempts to minimize the possibility of con-

gestion by matching the rate of streaming media to the available network bandwidth.

A vast majority of the Internet applications implement TCP-based congestion

control that uses the additive increase, multiplicative decrease (AIMD) algorithm.

Under this algorithm, the transmission rate is linearly increased until a loss of

packet signals congestion and a multiplicative decrease is performed. TCP, as it

is, is not appropriate for delay-sensitive applications such as streaming. To ensure

fairness and efficient utilization of network resources, rate control algorithms for

streaming applications should be “TCP-friendly” [7–9]. This means that a stream-

ing application sharing the same path with a TCP flow should obtain the same

average throughput during a session. A number of model-based TCP-friendly rate

control mechanisms [10] have been proposed for streaming applications. These

mechanisms are based on the mathematical models that relate the throughput of a

typical TCP connection to the network parameters [7]:

l ¼1:22 � MTU

RTT �ffiffiffiffi

pp (9:1)

where

l ¼ throughput of a TCP connection

MTU ¼ maximum transmission unit is the maximum packet size used by the

connection

RTT ¼ Roundtrip time for the connection

p ¼ packet loss experienced by the connection

Under the model-based approach, the streaming server uses Equation (9.1) to deter-

mine the sending rate of the streamed media to behave in a TCP-friendly manner.

The source basically regulates the rate of the streamed media according to the feed-

back information of the network. This can be used for both unicast and multicast

scenarios. However, a source-rate-based control scheme is not suitable in hetero-

geneous network environments, where receivers have heterogeneous network

capacity and processing power.

Receiver-based rate control [11,12] has been found to be better rate control mech-

anism in heterogeneous network environments. Under this mechanism, receivers


regulate the receiving rate of streaming media by adding or dropping channels

without any rate regulation from the source end. This is targeted toward scenarios

where the source multicasts layered video with several layers. The basic scheme

works as follows:

1. When no congestion is detected, a receiver joins or adds a layer or channel that

results in increase of its receiving rate. If addition of a channel does not cause

any congestion then join experiment is deemed successful. Otherwise, the

receiver drops the added layer or channel.

2. If congestion is detected, the receiver drops the low-priority layer or channel

(enhancement channel).

Alternatively, an architecture may use both source and receiver-based control mech-

anisms [13] in which receivers regulate the receiving rate of streaming media by

adding or dropping channels, while the sender also adjusts the transmission rate

of each channel according to the feedback from the receivers.

One of the main challenges in delivering streaming media to a client is to adjust

with variations in network bandwidth while delivering acceptable quality streaming

media to the receiver. As discussed before, short-term variations in bandwidth can

be handled by providing playout or compensation buffer at the receiver. When avail-

able bandwidth is more than the playback rate at the receiver, the spare data are

stored in the playout buffer and when the available bandwidth is less than that

required to maintain the constant quality then the deficit is supplied by the spare

data in the playout buffer (see Fig. 9.5). However, the bandwidth variations for a

long-lived session can be very large and random. This may cause the client’s

buffer to either underflow or overflow. The buffer underflow is particularly undesir-

able since it causes interruption of service at the client’s end. Rate control

su

Spare data stored inplayout buffer

pplied fromuffer

Available bandwidthfrom network

Playbackrate

Transmission rate

raining Phase

Ban

dwid

th

Filling Phase D

Time

Deficit pl t bayou

Figure 9.5 Short-term quality adaptation at client.


mechanisms discussed in preceding paragraphs are one way to tackle quality adap-

tation due to long-term variations in network bandwidth. Alternative mechanisms

are adaptive encoding and switching between multiple encoded versions. Under

adaptive encoding mechanism, the server adjusts the resolution of encoding by

doing requantization based on the network feedback. However, this task is very

CPU-intensive and is not scalable to large number of clients. Also, once the stream-

ing data are compressed and stored, encoders cannot change the output rate over a

wide range. In another alternative scheme, a server maintains several versions of

media streams, each with different qualities. As available bandwidth in the

network changes, the server dynamically switches between low- and high-quality

media streams as appropriate.

Hence, quality adaptation under short-term variations in bandwidth is achieved

through the playout/compensation buffer at client end and quality adaptation

under long term, and wide variations in bandwidth are achieved through appropriate

rate control mechanisms at both client and server ends.

9.2.1.2 Error ControlAs previously mentioned, streaming media can survive as long as the error rate

remains within an acceptable limit. The error rate is particularly important in wire-

less environments that have very high error rates. Moreover, errors tend to happen in

bursts in these environments. Well-known techniques to minimize the error for

streaming traffic are FEC (forward error correction), error-resilient encoding,

error-concealment, and retransmissions. The FEC technique adds redundant infor-

mation to the original packet in order to recover the packet in the presence of

error. Error-resilient encoding is a preventive technique that enhances the robustness

of streaming media in the presence of packet loss. The well-known error-resilient

encoding schemes are resynchronization marking, data partitioning, and data recov-

ery. These are particularly effective in wireless environments. Another promising

error-resilient encoding scheme is multiple description coding (MDC) [14], where

raw video data are encoded into a number of streams (or descriptions): each descrip-

tion provides an acceptable quality. If a client gets only one description, it should

also be able to reconstruct video with reasonably good quality. However, the recei-

ver can construct better-quality video if it gets more than one description. Error con-

cealment techniques, on the other hand, adopt a reactive approach and aim to

conceal lost packets and make the presentation less displeasing to human eyes.

Packet retransmission techniques [15] are considered very effective in wireless

environments because of bursty nature of wireless channels. In general, packet

retransmission is not deemed very suitable for real-time applications such as

video because of retransmission delay. However, the retransmission may be

allowed, especially for high-priority packets, if there is sufficient delay until the

scheduled playback time of the packet considered. Clients may request the retrans-

missions of only those high-priority packets that have sufficient retransmission delay

budget. We explain this concept as follows. For simplicity, we assume that the


server is generating packets at a constant frame rate (say, every T seconds). We

introduce the following notations:

Pn ¼ playback time of the nth packet

Tn ¼ arrival time of the nth packet

T ¼ interframe time

RTT ¼ estimated roundtrip time

Td ¼ loss detection delay

Tr ¼ retransmission delay

Tc ¼ current time

Thus the scheduled time of the kth frame can be given by (P0 þ kT), where P0 is the

playback time of the 0th frame. Now, if the current time is Tc, the delay budget

before the scheduled playback time of the kth packet can be given by

Delay budget ¼ (P0 þ kT) � Tc (9:2)

This delay budget should be sufficient to allow retransmission of the frame from the

server taking into account loss detection delay, estimated roundtrip delay, and

retransmission time. The client should send the retransmission request to the

server only if the following condition is satisfied:

Td þ RTT þ Tr � delay budget (9:3)

The objective here is to avoid unnecessary retransmissions that will not arrive in

time for display.

9.2.2 Network Layer QoS Control

The previous discussions on QoS control at the application layer for streaming ser-

vices assume no support from network whatsoever. The QoS support at the network

layer and below complements the QoS mechanisms at application layer and reduces

the signaling and processing load at higher layers.

Providing QoS in the Internet is inherently a difficult problem due to its connection-

less nature. However, a number of proposals have been made in IETF to provide some

sort of QoS support in the Internet. Currently, there are two approaches, notably Inte-

grated Services (IntServ[16]) and Differentiated Services (DiffServ[17]), standardized

by IETF to provide QoS support in the Internet. The IntServ model provides per flow

QoS guarantees. A flow is defined as a stream of packets between two end nodes with

the same tuple of source address, destination address, source port number, and destina-

tion port number. The IntServ model consists of four functional blocks: end-to-end sig-

naling protocol, call admission control at the edge, packet classifier at the edge, and

packet scheduler at every network element in the path. RSVP [18] is the proposed sig-

naling protocol to take the reservation requests to all the routers in the path. Underlying


IP routing protocols determine the path, and RSVP signaling is used to reserve

resources along the selected path. Keeping in mind the dynamic nature of IP routing

protocols, the soft-state approach is utilized to reserve resources. Though IntServ pro-

vides excellent QoS model, it suffers from scalability problem. Network elements need

to maintain a per flow state to provide per flow QoS guarantees. This can introduce

scalability problems, particularly in backbone networks that support tens of thousands

of flows. The DiffServ QoS model is another approach that provides scalable solution

and does not require any signaling support. Unlike IntServ model, this model does not

provide per flow QoS guarantees. Under this model, routers simply implement a suite

of prioritylike scheduling and buffering mechanisms and apply them to IP packets

based on the DS-field in the packet headers. The service that an individual flow gets

is determined by the traffic characteristics of the other flows (cross-traffic) sharing

the same service class. The lack of networkwide control implies that, on overload in

a given service class, all flows in that class suffer a degradation of service. DiffServ

tries to give soft QoS guarantees to flows by using a combination of provisioning,

service-level agreements, and per hop behavior implementations. For this purpose,

networkwide mechanisms are deployed in the network. Bandwidth broker (BB) is

one approach to do resource provisioning within a DiffServ domain. BB is the resource

manager within the DiffServ domain that keeps track of available resources and

topology information for a domain. BB uses COPS (common open policy service)

protocol [19] to interact with routers inside the domain.

9.3 STREAMING MEDIA CODECS

Standardized video coding and decoding methods, such as H.263 by ITU-T and MPEG-

4 by ISO, are expected to be supported by a wide range of mobile terminals and net-

works. For audio-only content, MPEG-4 AAC is an appealing candidate for its superior

coding efficiency, while MP3 is also likely to be supported because of its popularity on

the Internet. Some mobile terminals may also support proprietary codecs and file

formats, such as those developed by Apple Computer, Microsoft, and Real Networks.

9.3.1 Video Compression

Video compression in mobile networks is usually lossy compression that exploits

temporal and spatial redundancy within the video streams. Specifically, motion esti-

mation and compensation are widely used between consecutive video frames to

reduce temporal redundancy. Within a frame, block-based transforms such as

DCT (discrete-cosine transform) are performed to reduce spatial redundancy. In

MPEG, for example, one can encode a video frame into one of the following

types of encoded pictures [20]:

. I-picture (I ¼ intraframe). I-pictures are encoded using intraframe information

only, independently of other frames. In other words, I-pictures exploit spatial

redundancy only.

9.3 STREAMING MEDIA CODECS 285

. P-picture (P ¼ interframe prediction). P-pictures are encoded using the most

recent-I-picture or P-picture as a reference.

. B-picture (B ¼ bidirectional prediction). B-pictures are encoded using P-pic-

tures and/or I-pictures both in the past and in the future as references.

A video stream composed of I-pictures allows for flexible random access and

high editability, but its compression ratio is relatively poor. P-pictures and B-pic-

tures substantially improve compression efficiency at the cost of increased manipu-

lation difficulty (random access, editability, etc.) and in the case of B-pictures,

coding delay. Hence, an MPEG video stream often consists of a sequence of

pictures of all three types (e.g., I B B P B B P B B I B) to strike a good balance

among different aspects of performance and usability. In addition, MPEG-4 also

allows encoding of arbitrarily shaped objects in order to provide content-based

interactivity [21].

The mobile environment that we consider in this chapter brings some specific

requirements for video compression. For example, wireless channel errors can

lead to loss of synchronization because video encoders often uses variable-length

coding (VLC), and forward error correction (FEC) codes are not very effective in

correcting burst errors. Toward this end, error resilience and concealment tech-

niques that minimize the effect of channel errors are important in providing graceful

service degradation [22]. Furthermore, many mobile terminals have limited CPU,

memory, and battery power resources; thus controlling decoder complexity is

important for these terminals.

9.3.2 Audio Compression

Besides the speech codec used for voice services, general audio compression is

needed for high-quality audio services such as music delivery. General audio

coders typically generate higher bit rates than do speech coders since they

cannot rely on a specific audio production model as speech coders do with the

human vocal tract model. Additionally, while speech coder’s emphasis is intellig-

ibility, audio codec may need to provide higher signal fidelity in streaming media

services.

In high bit rate, an audio codec strives to preserve the original signal

waveform [23]. Higher compression can be achieved by taking advantage of the

human auditory model so that the signal components that the human ears are not

sensitive to can be compressed. More details on these techniques can be found in,

for example, the article by Poll [23].

9.3.3 Codecs Used in 3GPP

As an example, Table 9.1 lists required or recommended decoders in 3GPP [24].

Figure 9.6 illustrates general client functional components for streaming media

service in 3GPP [24].


TABLE 9.1 Codec Standards Used in 3GPP

Services Decoder Requirements or Recommendations

Speech AMR

Audio MPEG-4 AAC low complexity

Synthetic audio Scalable polyphony MIDI

Video H.263 profile 0 level 10 mandatory; MPEG-4 visual simple profile

optional

Still images JPEG

Bitmap graphics GIF, PNG

Vector graphics SVG Tiny profile

Figure 9.6 Functional components of a 3GPP packet-switched streaming service (PSS) client.

9.3 STREAMING MEDIA CODECS 287

9.4 END-TO-END ARCHITECTURE DESIGNED TO PROVIDESTREAMING SERVICES IN WIRELESS ENVIRONMENTS

Streaming multimedia is characterized by an application rendering audio, video, or

other media in a continuous way while part of the media is still being transmitted to

the application over a data network. Streaming multimedia is a little different from

conversational multimedia, which involves (usually bidirectional) conversation

between multiple parties. Although the type of the media (media encoding) used

for both streaming and conversational multimedia communication may be the

same, conversational multimedia usually has more stringent requirements on end-

to-end delay between the parties. Also, streaming multimedia is usually a client

server application and the media usually flow in only one direction (from server

to the client), whereas conversational multimedia, such as interactive videoconfer-

encing, is usually peer-to-peer, and the media (often) flow among all peers. If you

feel confused by this description, don’t worry; later in the chapter we will describe

streaming multimedia in more detail.

In previous sections we saw how streaming media applications process (through

decoding, error correction, buffering and scheduling) media data to compensate for

delay jitter and packet loss incurred over the network and ensure a smooth rendering.

Here we will discuss important logical components needed to enable streaming

service in mobile or wireless networks and the interrelationships between these

logical components required to form a complete streaming multimedia delivery

system. Our main focus is packet-based streaming systems. We will start with a dis-

cussion on logical layout and components for such a system. In subsequent sections,

we will shift focus to different protocols and languages used for streaming multime-

dia delivery and provide an overview of their working and example usage.

9.4.1 Logical Streaming Multimedia Architecture

Streaming multimedia architecture (Fig. 9.7) consists of following basic components:

1. A streaming server that sends media as a continuous stream over a data

network. The server is often referred to as the origin server, to distinguish

it from intermediary (proxy or caching) servers.

2. A data network that transports media from the server to the client application.

3. A client application capable of receiving, processing and rendering continuous

stream of media in a smooth manner.

4. Protocols that are understood amongst the componentsing and allow them to

talk with each other. The protocols provide various functionalities, including

allowing the client to establish a streaming multimedia session with the server,

facilitating delivery of media from the server to the network and from the

network to the client, understanding the content of media stream for correct

processing at the client application (encoding and packaging), and allowing

interaction with the servers to manipulate the media streams.


Besides the basic components and functionalities listed above, a multimedia deliv-

ery system often contains additional components, functionalities and protocols to

improve various aspects of multimedia delivery. These may include the following:

1. Proxy Servers. Proxy servers provide functionality similar to that of a server

from the client’s perspective. Proxy servers are often transparent to the appli-

cation; however, certain streaming media protocols explicitly provide for the

existence of proxies [25]. Proxy servers may be present to process client

requests locally or relay the requests to some other server (after performing

some optional local processing). If the target is to serve multimedia session

requests locally, then a cache of streaming media content usually accompanies

the proxy server. On receiving a request, the proxy server determines whether

the desired content is available in the cache; if so, the content can be delivered

locally; otherwise the proxy server relays the request to some other servers.

2. Caching Servers. Caching servers are local repositories of content. As in the

case of static Web object (e.g., images and Webpages), it is advantageous to

store local copies of content and serve user requests locally. This not only

eliminates the delays incurred due to topological distance of the origin

server from the client application but also results in traffic localization and

better utilization of network resources. There are several well-known

methods for populating caches with content, but they can be broadly classified

in two categories:

Passive Caching. Here only the content delivered by origin or upstream

servers in response regional client application requests is stored at the

cache server. Local storage in this method is often a promiscuous

process and the cache server belonging to this category is often

termed simply as “cache.”

Proactive Caching. Here the content is proactively stored on the cache

server by some external mechanism. Often entire or large portions of

the content on a server may be replicated onto a caching server. In

Streaming Media ClientStreaming Media ServerNetwork

1. Streami ng Media Request

2. Streaming Media

Figure 9.7 Basic streaming media architecture.

9.4 END-TO-END ARCHITECTURE DESIGNED TO PROVIDE STREAMING SERVICES 289

this case the term surrogate server is sometimes used for the caching

server.

3. Additional Protocols. Additional protocols may include

Protocols for capability exchange between the client application and server,

so as to allow the server to transmit appropriate data.

Protocols for QoS feedback from client application to the server, enabling

the server to adapt the transmission (if possible).

Protocols and languages for (time and space) synchronization of multiple

multimedia streams.

Protocols and mechanisms for request routing to best available surrogate or

caching for a given client request. We will not discuss request routing

any further in this chapter. An overview of a multitude of request

routing methods can be found in the report by Barbir et al. [26].

4. Other Miscellaneous Components. A real-life deployment of a streaming mul-

timedia delivery system will rely on more than just the abovementioned com-

ponents (see example components in Fig. 9.8). Functionalities such as

authentication, authorization and accounting (AAA) often require additional

architectural support. Similarly, ensuring digital rights management (DRM)

may require additional functionality from client application and also from

the server and the content creation process. In certain scenarios dedicated

components may be present to provide QoS adaptation and feedback.

Standards for streaming media consist of a wide array of protocols, description

languages, and media coding techniques. These standards have been developed

and standardized at various standardization organizations, such as, Internet Engin-

eering Task Force IETF, ISO, Third-Generation Partnership Project, and World

Wide Web Consortium (W3C).

Network

Streaming Media Negotiation

Streaming Media

Streaming Media Server

Streaming Media Proxy Caches

Caches

Proxy based

Request Redirection to

appropriate servers in CDN

setup

QoS and other feedbackStreaming

Media Client

Figure 9.8 Some components of a typical streaming media architecture.


9.5 PROTOCOLS FOR STREAMING MEDIA

A streaming multimedia delivery system involves a number of protocols (see

Fig. 9.9) to deal with the different aspects of streaming media. The protocols

provide a common dialect through which different components in the architecture

can talk with each other. These protocols can be classified in two broad categories:

(1) session control and (2) media transport protocols. In most contemporary multi-

media streaming setups, separate logical channels are used for session control and

media transport. In some cases, however, most notably HTTP and RTSP tunneling,

the same logical channel is used for both session control and media transport. Con-

sequently, certain protocols provide functionalities that span more than one aspect of

multimedia streaming, and we cannot draw a hard boundary. We will discuss these

as well, but let’s first see what functionalities are expected out of the two main cat-

egories of the protocols.

9.5.1 Protocols and Languages for Streaming Media Session Control

Streaming multimedia often have a notion of (prolonged) association between

multiple components, for example, between the client application and the server;

this association is termed a session.

Client

Media Server

WebServer

HTTP GET

Session Descript ion(SDP)

SETUP

PLAY

PAUSE

CLOSE

RTP AudioRTP Video

RTCP

RTSP signaling

RTSP signaling

Figure 9.9 Protocols used in a typical streaming session.

9.5 PROTOCOLS FOR STREAMING MEDIA 291

Session control and establishment usually includes identifying the parties (the

client and server applications) involved in the session and the agreement or the

announcement of different session parameters. For IP-based environments the

parties are often identified by their transport layer address (IP address and port

number). Multimedia streaming sessions often have a rich set of parameters, the

most important of which are the types of encoding of media that will later flow

from the sender (server application) to the recipient (client application). These par-

ameters allow the application on the recipient to process and render the media cor-

rectly. Different session control protocols provide varying degrees of functionality,

but all of them provide minimal functionality for basic session control: session

setup, teardown and establishment of other session parameters.

Examples of session control protocols include the real-time streaming control

protocol (RTSP) [25], the session announcement protocol (SAP) [27], the session

description protocol (SDP) [28], the session initiation protocol (SIP) [29], and

ITU-T’s H.323 [30].

RTSP is the dominant session control protocol for client–server streaming multi-

media application and is defined in RFC 2326 [25]. In this section you will find a

brief tutorial on RTSP and its use; however, it is by no means a complete description

of RTSP. In the following section we will describe RTSP in detail and briefly over-

view the other protocols in this realm.

9.5.1.1 Real-Time Streaming ProtocolRTSP is an application-level client–server protocol that provides the functionality

needed to establish and control a streaming session. The session may comprise

one or more streams, which are described using a presentation description (using

expressions such as SMIL or SDP). Once a session is established, RTSP provides

methods for controlling the streams, such as, VCR-like forward, rewind,

pause, and record methods. RTSP primarily provides functionalities to retrieve

data from the server and invite a server to a conference, and it is a transaction-

oriented, request–response protocol like HTTP. However, there are a number of

differences:

. RTSP servers are required to maintain state between most transactions, unlike

in HTTP, in which the servers are mostly stateless.

. RTSP defines new methods and a protocol identifier.

. In RTSP, the server side may issue some requests as well, unlike the in case

HTTP, where the client always makes the request and the server sends back

a response.

. In RTSP, the data are carried mostly out of band, on a separate data channel

such as RTP. In HTTP, the data are carried in payload of HTTP (response)

messages.

. RTSP uses absolute resource identifiers (request URI); this is to eliminate the

problems caused due to usage of relative URLs in earlier versions of HTTP.


RTSP Messages Figure 9.10 shows the syntax of RTSP messages. There are

only two basic types of RTSP messages: request and response. All RTSP messages

are text-based and use ISO-10646 UTF-8 encoding. The first line in the message

identifies the message type: whether it is a request or response message and specifi-

cally what kind of request or response message. For requests this first line is termed

the request line and for responses, the status line. Message headers follow the

request line or the status line. These provide additional information that is critical

for the correct interpretation of the message. Finally, messages may optionally

contain a message body. Please refer to Section 15 in RFC 2326 [25] for the com-

plete syntax of RTSP.

RTSP Request Messages Request line in each request message has a method

token that indicates the task to be performed on the resource specified in

“Request-URI.” Eleven methods are defined in RFC 2326 [25], each designed for

a different task. Following is a brief description of each of the 11 RTSP methods;

however, please refer to Section 10 in RFC 2326 [25] for an in depth description

of the methods.

RTSP Message = Request | Response

Request = RequestLine *( generalHeader | requestHeader| entityHeader )CRLF[ messageBody ]

RequestLine = Method SP Request-URI SP RTSP_Ver CRLF

Method = "DESCRIBE" | "ANNOUNCE" | "GET_PARAMETER"| "OPTIONS" | "PAUSE" | "PLAY" | "RECORD" | "REDIRECT" | "SETUP" | "SET_PARAMETER"| "TEARDOWN" | ext-method

ext-method = token

Request-URI = "*" | absolute_URI

RTSP_Ver = "RTSP" "/" 1*DIGIT "." 1*DIGIT

Response = Status-Line *( generalHeader | responseHeader| entityHeader ) CRLF [ messageBody ]

StatusLine = RTSP_Ver SP StatusCode SP ReasonPhrase CRLFStatusCode = A pre-defined 3 digit code or a 3 -Digit extension-code

ReasonPhrase = *<TEXT, excluding CR, LF>

Request and Response are the only two

types of RTSP messages.Request and Response are the only two types of RTSP messages.

Method identifies the type of request message.Leading headers provide additional information

for interpreting the request message.

Method identifies the type of request message.Leading headers provide additional information for interpreting the request message.

Eleven methods are

defined in [RTSP]

specification.

Eleven methods are defined in [RTSP] specification.

Status-code identifies the type of

response message.

Leading headers provide additional

information for interpreting the response.

Status-code identifies the type of response message. Leading headers provide additional information for interpreting the response.

Request-URI in the request

message identifies the

resource in question.

Request-URI in the request message identifies the resource in question.

Figure 9.10 Syntax for RTSP messages.


. DESCRIBE is a recommended method that is only sent from the client side.

The server typically sends a description of the resource identified in

Request-URI. This description is contained in the message body. It is not

necessary that session description always be obtained using this method.

Other out-of-band mechanisms may be used for a variety of reasons including

the cases where the server does not support the DESCRIBE method. Session

may be described using SDP or other protocols.

. ANNOUNCE is an optional method that may be sent from the client or the

server. When sent from the client to the server, it updates the presentation or

media object identified by the Request-URI. When sent from the server to

the client, the session description is updated in real time.

. SETUP is a mandatory method that is only sent from the client side. The client

specifies the transport mechanism to be used for a media stream (identified by

Request-URI). The SETUP method may also be used to change the transport

parameters of a stream that is already playing.

. PLAY is a mandatory method that is always sent from the client to the server.

This tells the server to start sending the stream that was setup using a pre-

viously (successfully) completed SETUP transaction. PLAY is a versatile

method, allowing very precise control to the client such as identify the range

of media stream to be played (both starting point and ending point may be

specified). Similarly, several PLAY requests may be issued for different seg-

ments of the stream setup using the previous SETUP message. Each request

may specify both the range of stream segment and the time at which the

server should start streaming the data. These requests would queue at the

server and the server would generate the stream corresponding to each

request at appropriate times. Obviously the server is not obliged to fulfill all

the client requests. PLAY request is also used to resume a paused stream.

. PAUSE is a recommended method that is always sent from the client to the

server. This method causes the server to temporarily halt the delivery of a

stream (or set of streams, depending on Request-URI). If a PAUSE request

is issued, all the queued PLAY requests related to the Request-URI are dis-

carded by the server. A new PLAY request must be sent to resume the

stream(s).

. The OPTIONS method is used by the sender to query the information about the

communication options available on the resource identified by Request-URI;

for example, it may be used by a client to query the types of methods supported

by a server for a given media stream. Although a client or a server may

send this message, implementation of this method is mandatory only for

servers.

. The TEARDOWN method request stops the stream delivery of the resource

identified in the Request-URI. All the queued requests are discarded, and all

the resources associated with the resource are freed. As you may have

rightly guessed, TEARDOWN message is always send from the client to the

server and this is a mandatory method.


. The REDIRECT method request informs the client that it must contact another

server location. If the client wants to continue to send and/or receive the

media, it must issue a TEARDOWN request for the current session and

issue a new SETUP request to the server location identified in the REDIRECT

request. REDIRECT message is always sent from the server to the client, but

strangely, its support is optional.

. The RECORD method initiates recording a range of media data according to

description of the resource identified in Request-URI. This description may

be made available by a previously sent ANNOUNCE method request or

some out-of-band means. RECORD request is sent from the client to the

server and its implementation is optional for both the client and the server.

. The GET PARAMETER method request retrieves the values of the parameters

of a presentation. The desired parameters are specified in the body of the

request message. If no parameters are specified in message body, the

message can serve as a method to check liveliness of client and server appli-

cations (a sort of RTSP application “ping”). GET PARAMETER is an optional

method that may be used in either direction, that is, from the client to the server

and from the server to the client.

. The SET PARAMETER method request is used to set the value of a parameter

for a presentation or stream identified in Request-URI. Only one parameter can

be specified in the request, so that in event of failure there is no ambiguity

about which parameter was not set. Like GET PARAMETER this method

can also be used in both directions, and its implementation is optional for

both client and server side applications.

RTSP Response Messages The status line in each response message includes a

status code, specifying the recipient’s response to the request. A three-digit number

represents each status code. Response messages are classified in two broad cat-

egories, provisional responses and final responses. All messages status codes of

the form 1xx (i.e., between 100 and 199) are considered provisional responses

and they indicate that the recipient is processing the request, but the final action

has not been taken, so the transaction is still considered pending. All other status

codes indicate final responses. There are four subcategories. Status codes of the

form 2xx indicate successful completion of transaction. Codes of the form 3xx indi-

cate redirection (i.e., the responders “thinks” that the request must be sent else-

where), 4xx indicate client error (i.e., something is wrong with the request made

by the client), and 5xx indicate server error (i.e., although the request itself was

fine, syntactically and semantically, but the server cannot process for some reason).

Although the method, token, and status codes are helpful in identifying the

request and the response, in most cases the recipient of a message cannot determine

the exact nature of the task to be performed on a request or the complete meaning of

a response without looking at some of the other headers included in the message;

sometimes message body must also be interpreted before the message can be

fully understood by the recipient. For instance, earlier in this section, we referred

to the range of a stream while discussing the PLAY method. In RTSP, the stream


range is specified using the “Range” request header; we discuss some of the RTSP

message headers in next section.

Session Setup Using RTSP Figure 9.11 shows a typical interaction between

RTSP client and RTSP server for establishing a RTSP session and its subsequent

teardown. Once the client learns about certain RTSP resource, rtsp://

resource-name.server in this case, it sends a DESCRIBE request to the

server to learn more about the resource. The server sends back a description of

the session corresponding to the identified resource. If the client is interested, it

sends a SETUP request, asking server to make necessary arrangements for estab-

lishment of the session. If successful, the client can initiate a PLAY request at a

later time to get the media stream flowing. If the session requires a special QoS

arrangement, such as resource reservation, the client does that before issuing the

play request. If the PLAY request is successful, the media starts to flow. The

client can manipulate this media stream using various RTSP requests, such as

PAUSE or PLAY with different headers. Once the session is completed or the

client is no longer interested, the client sends a TEARDOWN request to the

server to terminate the session.

Figure 9.11 Session setup and teardown using RTSP.


9.5.1.2 Session Description ProtocolThe session description protocol (SDP) is widely used for presentation and session

description. This protocol is specified in standards track IETF RFC 2327 [28]. SDP

provides a well-defined format that conveys sufficient information about the multi-

media session to allow the recipients of the session description to participate in the

session. This information is commonly conveyed by SAP protocol that announces a

multimedia session by periodically transmitting an announcement packet at a well-

known multicast address and port number. Alternatively, session descriptions can be

conveyed through electronic email and World Wide Web. The SDP conveys follow-

ing information:

. Session name and purpose

. Media comprising the session

Media type (video, audio, etc.)

Transport Protocol (RTP/UDP/IP)

Media format (MPEG4 video, H.261 video, etc.)

Addresses, port numbers for media

. Time(s) the session is active

The session description using SDP consists of a series of text-based lines (using

the ISO 10646 character set in UTF-8 encoding). Each line is of the form

<type>¼<value>. <type> is strictly one character (derived only from the U.S.

ASCII subset of UTF-8). <value> is generally either a number of fields delimited

by a single space character or a free-format string.

A typical session description using SDP has three parts:

. Session Description. This part describes the session and provides information

about session owners and the session itself. The mandatory types included in

this part are version (v), owner (o), and session name (s) fields. Other optional

fields include session information (i), URI of description (u), email address (e),

phone number (p), connection information (c), bandwidth (b), time-zone

adjustments (z), encryption keys (k), and attribute lines with type field (a).

. Timing Information. This part has one mandatory field (t) indicating time at

which the session becomes active. The part may optionally include several

repeat times (r).

. Media Description. This part describes the type and other parameters for the

media stream(s). This part includes a mandatory line for each media stream

containing its name and transport address; this line is denoted by “m.”

Additional optional lines for each media stream include media title (i), connec-

tion information (c), bandwidth information (b), encryption key (k), and zero

or more media attribute lines each starting with “a.” If all the media streams

share a common connection address, it can be mentioned once in the media


description part. The value corresponding to most typed fields is not free-form

text and has a certain defined format.

Figure 9.12 illustrates parts of session description with SDP using an example taken

right out of the RFC 2327 [28].

9.5.1.3 Other Session Control ProtocolsThere are a number of other session control protocols available: the most notable

ones are the (1) wireless session protocol [WSP], used with WAP, and (2) the SIP

and H.323 family of protocols, which are typically used for real-time conversational

media communication.

Although these protocols can in principle be used for streaming multimedia with

minor modifications, in practice, these protocols, despite their rich functionalities,

are seldom used for streaming multimedia. In some cases streaming media protocols

may be used in conjunction with conversational media protocols. For example,

RTSP may be used for interacting with a voice or video mail system, while the

remaining infrastructure may be based on SIP. There is, however, some preliminary

discussion to use SIP for streaming media as well. This may eliminate the need for

having multiple protocols of similar functionality at the terminal. This could be

something to look forward to in the future.

9.5.1.4 Description LanguagesA number of description languages are used in today’s multimedia systems to

describe the session integration and scene description, device capabilities,

context, and metadata associated with media. The main purpose of well-formed

description languages is to facilitate consumption of media information by compu-

ters, such as in search engines and semantic Web. However, this is not the only

reason why description languages are used. Synchronized Multimedia Integration

Language (SMIL) [31], for instance, is used to describe the spacetime relationship

Figure 9.12 Parts of session description using SDP.


between a set of multimedia. Other examples of multimedia descriptions include

ISO’s Multimedia Content Description Interface (MPEG-7) and Composite

Capabilities/Preferences Profile (CC/PP) [32]. In the following sections we will

learn about SMIL and CC/PP as they have important role to play in multimedia

content delivery and presentation.

Synchronized Multimedia Integration Language (SMIL) For commercial ser-

vices, media presentation is perhaps just as important as the media itself. Content

providers want to present the media in a manner that is both flexible for commercial

services, such as integrating location specific advertisements with the media presen-

tation, and at same time functional and appealing to the consumer. SMIL, an XML-

based language developed by the World Wide Web Consortium, is the “glue” that

combines various media elements such as video, audio, images, and formatted

text to create an interactive multimedia presentation. SMIL does not control the

session, but it can be used to specify how the media are rendered at the client appli-

cation (user agent). SMIL allows description of the temporal behavior of a multime-

dia presentation, associates hyperlinks with media objects and describes both

temporal and spatial layout of a multimedia presentation on the user device.

SMIL is an HTML-like language and like HTML, it also consists of elements, attri-

butes, and attribute values.

SMIL PRESENTATION EXAMPLE Following is a simple SMIL presentation. It demon-

strates the timing, synchronization, prefetch and layout capabilities of SMIL. The

SMIL user agent completely (100%) prefetches the media objects. It then displays

a video clip and displays a series of static images one after another. The images

appearing in “region2” change every 10 seconds, and those in “region3” change

every second giving impression of a counter. The layout and presentation behavior

is pictorially shown in Figure 9.13 using a video clip showing a moving airplane, and

the images in “region3” change every second.

0 0 0 1 0 2 0 3 0 4 0 5

1 2 3 4 5

0 0 0 1 0 2 0 3 0 4 0 5

0

Region 3

Region 1

Region 2

Root-layout

Time (seconds)

Figure 9.13 A SMIL presentation example.


1:<smil xmlns¼”http://www.w3.org/2001/SMIL20/Language”>

2:

3:<head>

4: <meta name ¼ ”SMIL Layout Example” content ¼ ”SMIL

Example”/>

5: <layout>

6: <root-layout background-color ¼ ”gray” height ¼ ” 270”

width ¼ ” 210”/>

7: <region id ¼ ”region1” top ¼ ” 5” left ¼ ” 5”

height ¼ ” 200” width ¼ ” 200” />


height ¼ ” 48” width ¼ ” 48” />


height ¼ ” 48” width ¼ ” 48” />

10: </layout>

11:</head>

12:<body>

13:

14: <seq>

15: <prefetch src ¼ “0.jpg” mediaSize ¼ “100%” />










25: <prefetch src ¼ “video1.mpg” mediaSize ¼ “100%” />

26: </seq>

27:

28: <par endsync ¼ “video1”>

29: <video id ¼ “video1” src ¼ “video1.mpg”

region ¼ “region1” />

30: <seq repeatDur ¼ “indefinite”>

31: <img src ¼ ”0.jpg” region ¼ “region2” dur ¼ “10s”

fill ¼ “freeze”/>












37: </seq>

38: <seq repeatDur ¼ “indefinite”>


fill ¼ “freeze” />



















49: </seq>

50: </par>

51:</body>

52:</smil>

Composite Capabilities/Preferences Profile (CC/PP) RTS does not provide a

very good capability exchange mechanism. In most cases the server decides on the

type of media and its other properties without first consulting the client about its

capabilities. The client may have several capabilities or limitations, which, if com-

municated to the server, would allow the server to customize the presentation and

media based on client capabilities.

The client device may have limited bandwidth, or a constrained display, software

constraints such as support for some SMIL features and not other features, or some

user preferences that may impact the presentation of media at the user agent. CC/PP

can be used to express all these scenarios and more.

CC/PP OVERVIEW A CC/PP description is a statement of capabilities and profiles

of a device or a user agent. CC/PP is based on resource description framework


(RDF1) and can be expressed using an XML document or some other structured rep-

resentation format. A CC/PP description is structured such that each profile has a

number of components and each component has one or more related attribute–

value pairs, which are sometimes also referred to as properties. Figure 9.14 shows

CC/PP structure for a hypothetical profile. Two components, HardwarePlatform

and Streaming, and some of their respective attributes are shown. The Hardware-

Profile component above, groups together BitsPerPixel, ColorCapable, and

PixelAspectRatio properties, which are presumably properties related to the

hardware of the device.

As with all the languages and description formats, we must have a set of mutually

understood vocabulary and rules for their interpretation. CC/PP is no exception.

With CC/PP any operational environment may define its own vocabulary and

schema that specify the allowable attributes and values, along with their syntax

and semantics. This vocabulary and schema may be understood only by the relevant

applications. For instance, W3C [32] defines a core vocabulary for print and display,

and WAP forum’s user-agent profile (UAProf) specification WAP [33] defines a

vocabulary that can be used to express different capabilities and preferences

related to the hardware, software, and networking available at the device. A discus-

sion on CC/PP attribute vocabularies can be found in Ref. 34.

CC/PP allows specification of default attributes and values in the schema corre-

sponding to each component. If a user agent’s capabilities and preferences related to

a particular component match the default, it can just specify so without actually

giving details of all the attributes and their values. If values of some of the attributes

differ from the default values, only a device can create a profile containing only the

differing attribute value pairs while referring to the defaults for other attributes. This

mechanism shortens the profile descriptions and saves precious wireless bandwidth.

Other methods of reducing size of profile description include using binary encoding

such as WAP binary XML.

Profile

HardwarePlatform Streaming

More Components

16

BitsPerPixel

yes

1x2

Mono

AudioChannels

8

3GPP-R5

ioPixelAspectRat

ColorCapable MaxPolyphony

PssVersion

Figure 9.14 An example CC/PP profile.

1If you are not familiar with RDF, an excellent premier can be found in [68].


9.5.1.5 UAProf SpecificationUAProf [33] is worth mentioning here because the capability exchange framework

and vocabulary defined in this specification is used, with modifications in some

cases, in many mobile content delivery systems, including 3GPP-PSS. UAProf spe-

cifies (1) end-to-end capability exchange architecture; (2) a vocabulary and schema

comprising six components, namely, HardwarePlatform, SoftwarePlatform, Brow-

serUA, NetworkCharacteristics, WapCharacteristics, and PushCharacteristics; (3)

encoding methods for the profiles; and (4) methods for transport of profiles.

UAProf also outlines usage scenarios for user-agent profiles and behavior of differ-

ent entities involved in the capability exchange process. A brief description of the

six components described in Ref. 33 follows in Table 9.2.

CC/PP Exchange HTTP is typically used as the transport protocol for CC/PP

description from client to server. However, potentially tens of components and hun-

dreds of properties may be required to fully express the capabilities and preferences

profile of a user device. A profile description can therefore be very large and trans-

port of such description between the user device and the server can entail significant

overhead.

TABLE 9.2 UAProf Component Description

Component Description

HardwarePlatform Comprises a set of attributes that describe the hardware

characteristics of a user-agent device, such as type,

model, and input/output capabilities

SoftwarePlatform Consists of a set of attributes related to the software

environment on the device, such as the operating

system, available audio video encoding/decoding

components, user language preferences

BrowserUA This component encompasses the properties related to the

HTML browser at the user agent

NetworkCharacteristics The attributes in this component describe the

characteristics of the network that the user device is

connected to

WapCharacteristics Includes attributes concerning Wireless Application

Protocol (WAP) capabilities

PushCharacteristics Covers attributes specific to push capabilities of the

device; the push model is slightly different from the

traditional request/response model used for most

content; instead the content can be “pushed” to the

client without receiving an explicit request from the

client (see Ref. 69 for details)


We already saw that CC/PP allows referring to default attribute values, which

may reduce size of the description, but what about the properties that deviate

from default. The CC/PP exchange protocol [35] has been designed with precisely

these constraints in mind. This protocol allows the user agents to specify only the

attributes that differ from default or last capability exchange. This reduces the

size of descriptions significantly. Because of the dependency between different

descriptions sent by a client, the network must maintain state information about pre-

vious a CC/PP exchanges. For this purpose a new logical entity called a CC/PPrepository is introduced. This repository stores the default and predefined profiles.

The CC/PP exchange protocol [35] extends HTTP by defining three new HTTP

headers, two of which are request headers, namely, profile, profile-diff, and

one response header, named profile-warning. The profile header contains a

list of references to (predefined) profiles or profile descriptions expressed carried

in profile-diff header in the same message. profile-diff header contains the

actual profile description. Profile-warning header is used to convey any

warning information to the requestor, such as when the server fails to fully

resolve a profile description. Ref. 33 defines similar headers for use with Wireless

profiled HTTP, and these headers are called x-wap-profile, x-wap-profile-

diff, and x-wap-profile-warning, respectively, and have meanings similar to

those of the corresponding headers defined for CCPP exchange protocol.

A simple example of the content delivery process based on CC/PP is shown in

Figure 9.15. The client includes the CC/PP description in the request for the

content. The server resolves the profile and selects or creates appropriate content

and sends it back to the client. In reality this same model may include intermediaries

such as proxies and gateways, which may manipulate the user request and its capa-

bility profile before forwarding the request to the server.

Content ServerClientHTTP or RTSP request for content with references to profile

1

4 Response

Profile Repository

Server retrieves the referenced pieces of profile

2

Appropriate content is selected or created

3

Delivered content is appropriate for user’s capability and preference profile

Figure 9.15 Capability exchange with CC/PP.


Needless to say, CC/PP is a generic mechanism for expressing capabilities and

profiles and can be used in a variety of situations besides the classical client–

server scenario depicted in Figure 12.15. It should also be noted here that currently

mostly HTTP is used to carry CC/PP descriptions, RTSP may become more widely

used in the future.

9.5.2 The Streaming Media Transport Protocols

For the application to render the media while they are still being transmitted over the

data network, some care must be taken in media transport. The media transport

mechanisms must provide means through which the media are transported in a

sequential manner, and with all the relevant information about how and when

they must be rendered (e.g., the media format types and the timestamps). Currently

the hypertext transport protocol (HTTP) [36] TCP [37], UDP [38], and real-time

transport protocol (RTP) [39] [coupled with the real-time transport control protocol

(RTCP)] are used for multimedia streaming over the Internet. Among these proto-

cols, only RTP can be regarded as a true real-time transport protocol, but presence

of firewalls that do not understand the streaming protocols and block UDP-based

traffic can sometimes make use of HTTP and TCP unavoidable.

In many scenarios a multimedia session consists of many different streams, each

with its own unique requirements with respect to media transport, thus necessitating

the use of more than one media transport protocols. One such scenario is the 3GPP-

PSS architecture, which we will describe later in this chapter

9.5.2.1 The Real-Time Transport ProtocolThis protocol has emerged as the dominant streaming media transport protocol. The

basic protocol is defined in IETF RFC 1889 [39]. The RFC defines two protocols that

are meant to work in tandem, namely, the RTP for media transport and the accom-

panying protocol called real-time transport control protocol (RTCP) for transport

feedback to the senders from the receivers. While RFC 1889 provides the base spe-

cification, several additional specifications have been developed for packetization

and use with individual media types such as H.263 [40] and GSM-AMR [41]. In

the following text we will briefly overview functionality provided by RTP and

RTCP and their use in streaming media environment.

Figure 9.16 shows the RTP packet format. RTP provides payload type identifi-

cation, fragmentation (M-bit), sequencing, and timing information in each individ-

ual packet. The payload type field allows the application to determine the correct

codec type to use with the media. Fragmentation information allows the appli-

cations to reassemble protocol data units correctly. Timing and sequence infor-

mation allows the applications to recognize any out of sequence packets and

compensate for delay-jitter variations incurred on the network. All of these com-

bined allow an application to render the multimedia stream correctly and

smoothly. RTP also provides synchronization source (SSRC) and contributing

source (CSRC) identifiers to identify the packets belonging to same stream inde-

pendent of the transport layer address. This is especially helpful in multiparty


streaming scenarios but is rarely used in contemporary streaming multimedia

delivery. RTP is also capable of transporting encrypted media; however, the key

generation and distribution is out of scope of RTP.

RTCP specifies periodic transmission of control packets to all the participants in a

session. It serves four main functions:

1. Feedback on quality of reception of data through RTCP sender and receiver

reports.

2. Carrying a persistent transport level identifier for RTP source. This identifier

is called canonical (CNAME), this is very helpful in multimedia scenarios

where a RTP source may contribute more than one streams. Such as when

transmitting audiovideo streams of a conversation, the common CNAME

for the individual SSRC allows the receiver to recognize these streams as

associated, indicating need for synchronization (e.g., for lip-synchronization).

3. Rate control for RTCP messages. The number of RTCP messages generated

can quickly get out of control in a conference with large number of partici-

pants. This functionality allows the participants to control the rate of RTCP

reports.

4. Session control information for loosely controlled sessions, where, partici-

pants may join and leave without strict membership control. However, stream-

ing multimedia sessions are often tightly controlled and complete session

control information is established via separate session control protocols

such as RTSP and RTCP, allowing only loose control within the parameters

established by the session control protocol.

Figure 9.17 shows the format of RTCP senders report. Receiver reports are similar,

except that the header does not contain the NTP timestamp and there is no sender

information block. The payload type for receiver reports is 201.

Figure 9.16 RTP packet format.


In addition to senders and receivers reports, RTCP also provides for source

description or SDES packets (see Fig. 9.18). These packets include information,

such as name, email, phone number, and geographic location about the synchroni-

zation and contributing sources.

Although RTP is transport-independent as long as the transport protocol provides

multiplexing and correct delivery, because of the stringent delay requirements of

most real-time traffic and high acceptance of IP, UDP is primarily used as transport

Figure 9.17 RTCP sender report packet format.

V=2 P Source Count (SC)

Payload type=SDES=202 Length

0 7 15 23 31

SSRC-1 or CSRC-1

SDES Items for SSRC/CSRC -1

……

SSRC-2 or CSRC-2

SDES Items for SSRC/CSRC -1

……

Figure 9.18 RTCP source description format.


for RTP. Although RFC 1889 states that RTP uses checksum and multiplexing capa-

bility of UDP, it is worth noting that most media codecs are either not sensitive to bit

errors, or may be encoded with error correction codes; therefore, it is not wise to

discard the entire packet if the checksum fails. In such cases it may be wise to

disable UDP checksum or use protocols such as “UDP-lite” [42,43].

RTP and RTCP are usually used in tandem and multiplexed onto the same

network layer address; for instance, if UDP/IP is used, they will typically share

the IP address. By convention the RTP stream uses an even-numbered port

number and the corresponding RTCP channel uses one immediately following the

odd-numbered port.

As stated earlier, individual profiles for specific media types have been defined.

These profiles specify the payload type, any modifications to the semantics of differ-

ent fields in the header and payload, and any new header types if necessary.

Examples of such media-specific profiles include Ref. 44 for H.263 and Ref. 41

for AMR. These profiles sometimes provide functionality for rate adaptation and

other in-band signaling; for example, Sjoberg et al. [41] allow the receiver to

specify one of several AMR codec rates or modes of operation. Applications

using these media types must conform to the corresponding profiles to ensure

compatibility.

9.5.2.2 Other Media Transport ProtocolsHTTP and RTSP tunneling or plain UDP or TCP are sometimes used for media trans-

port. HTTP and RTSP tunneling is useful in cases where a firewall blocks RTP/UDP

traffic. With HTTP and RTSP tunneling, the streaming media are sent embedded or

interleaved in the body of the HTTP or RTSP messages; this approach, however, can

be highly inefficient in terms of the amount of bandwidth used. But as streaming mul-

timedia gains wider deployment and acceptability, there are more firewalls that

understand the streaming media protocols and can therefore open the desired ports

to allow streaming media. So we will likely see less use of tunneling in the future.

9.6 3GPP PACKET-SWITCHED STREAMING SERVICE

As discussed in previous sections, a basic streaming service consists of streaming

control protocols, transport protocols, media codecs, and scene description proto-

cols. 3GPP has formulated a set of 3G PSS standards to provide mobile packet-

switched streaming service (PSS). The 3GPP standard specifies protocols, codecs

and architecture to provide mobile streaming service. The 3GPP codecs and

media types were discussed in Section 3.3 of this chapter. Figures 9.19 and 9.20

depict the 3GPP protocols and applications used in a PSS client. The protocols

and their applications are

. RTSP and SDP for session setup and description

. SMIL for session layout description


. HTTP for capability exchange and transporting static media such as session

layout description (SMIL files), text, graphics, and so on

. RTP for transporting real-time media such as audio, video, and speech

Providing end-to-end streaming service implies harmonized interworking between

protocols and mechanisms specified by IETF and 3GPP. Both 3GPP and IETF

Figure 9.19 3GPP streaming protocols and their applications.

Figure 9.20 3GPP packet-switched streaming service.

9.6 3GPP PACKET-SWITCHED STREAMING SERVICE 309

have their own sets of protocols and mechanisms to provide QoS and connectivity in

3G access network and external IP-PDN (Internet), respectively. External IP-PDN

can deploy either IntServ or DiffServ QoS model to provide QoS.

3GPP release 4 does have a support for streaming services in its QoS model.

3GPP release 5 has an upgraded packet-switched core network by adding an “Inter-

net multimedia subsystem (IMS)” that consists of network elements used in session

initiation protocol (SIP)-based session control. Release 5 has also upgraded network

elements GSNs (GPRS support nodes) to support delay-sensitive real-time services.

In addition, the radio access network (UTRAN) has been upgraded to support real-

time handover of PS (packet-switched) traffic. The main purpose of release 5 is to

enable an operator to offer new services like multimedia, gaming, and location-

based services. The Internet multimedia domain is mainly concerned with new ser-

vices—their access, creation, and payment—but in a way that gives an operator full

control over the content and revenue.

9.6.1 3GPP Packet-Switched Domain Architecture

Figure 9.20 depicts the network architecture of an end-to-end 3GPP packet-switched

streaming service. We need at least a streaming client and a content server to

implement the streaming service. Content servers may be either hosted in the

UMTS architecture itself or accessed externally through an IP-PDN. A proxy

server may be needed in UMTS architecture to provide sufficient QoS, if the

content servers are accessed externally through an IP-PDN. The end-to-end stream-

ing architecture has following network elements that are specific to streaming:

. Content Servers. They can be either hosted in the UMTS architecture (added to

the IMS) or can be accessed externally. Content servers consist of streaming

servers that store streaming content and Web servers that hold SMIL pages,

images, and other static content.

. Proxy Server. This may be included in the IMS (especially when the streaming

server is external) to provide enhanced QoS streaming service. The proxy

server’s [45,46] main role is to smooth (eliminate delay jitter) incoming

streaming traffic from the external IP-PDN. During transmission of the stream-

ing content to the client, the proxy dynamically adapts the delivered QoS in

accordance with the available bandwidth. The proxy server uses the feedback

from the client application, radio network, and IP network. The proxy server

can also implement an appropriate quality adaptation scheme by switching

on the fly to a lower-quality streaming when the available bandwidth is not suf-

ficient. Moreover, it can perform additional functionality of transcoding.

Transcoding may be needed for several reasons, such as, when a user moves

from a high-bandwidth wireless LAN to a GPRS or 3G networks. This may

also be needed if the mobile node is unable to handle high-bandwidth stream-

ing traffic.


. User and Profile Servers. These servers store user preferences and device capa-

bilities. This information can be used to control presentation of streamed media

to a mobile user.

. Content Cache. Content cache can be optionally used to improve the overall

service quality.

. Portals. Portals are servers that allow convenient access to streamed media

content. For example, a portal might offer content browse and search facilities.

In the simplest case, it can be a Webpage with a list of links to streaming content.

Apart from the abovementioned network elements that are specific to streaming

service, other network elements in the 3GPP UMTS architecture play a significant

role in the QoS management of streaming service. The UMTS radio access

network (UTRAN) ensures seamless handover between basestations with minimal

disruption to ongoing real-time services. The radio resource control (RRC) protocol

[1] (3GPP-TS-25.331) is used for controlling resources on the UTRAN (universal ter-

restrial radio access network). The radio access network application part (RANAP)

protocol [1] (TS-25.431) is used between UTRAN and core network entities. The

serving GPRS support node (SGSN) acts as the gateway for the entire packet-based

communications between user equipments (UEs) within its serving area. The

SGSN is responsible for packet routing and transfer, mobility management (attach/detach and location management), logical link management, authentication, and char-

ging functions. The gateway GPRS support node (GGSN) acts as a gateway between

UMTS core network and external IP-PDN. There is an active PDP context for every

active packet-switched bearer or session. The PDP context is stored in UE, SGSN, and

GGSN. With an active PDP context, the UE is visible for the external IP-PDN and is

able to send and receive data packets. The PDP context describes the characteristics of

the session. It contains a PDP type (e.g., IPv4), the IP address assigned to the UE,

requested QoS, and the address of the GGSN that serves as the access point

to the IP-PDN. Table 9.3 shows the different QoS classes supported in the UMTS

architecture [1].

The PDP activitation (see Fig. 9.21) in the UMTS architecture works as follows.

The UE first sends an “Activate PDP context request” message to the SGSN through

the session management (SM) protocol. SGSN contacts the home location register

TABLE 9.3 UMTS QoS Classes

Class Requirements Example

Conversational Very delay-sensitive Traditional voice; VoIP

Streaming Better channel coding;

Retransmission

One-way real-time

audio/video

Interactive Delay-insensitive Telnet; interactive e-mail;

WWW

Background Delay-insensitive Ftp; background email


(HLR) and performs authentication and authorization functions. SGSN then per-

forms the local admission and initiates radio access bearer (RAB) assignment pro-

cedure in the RAN/GERAN through RANAP procedure. A local call admission

based on the availability of radio resources and UMTS QoS attributes is mapped

on radio bearer (RB) parameters used in the physical and link layers. After the estab-

lishment of RB, SGSN sends a “Create PDP context request” message to the GGSN.

The GGSN performs local admission control and creates a new entry in the PDP

context table that enables the GGSN to route data between SGSN and external

IP-PDN. Afterward, the GGSN returns a confirmation message “Create PDP

context response” to the SGSN” that contains the PDP address. The SGSN

updates its local PDP context table and sends an “Activate PDP context accept”

message to the UE.

9.6.2 The 3GPP PSS Framework

The 3GPP PSS specifications consist of three 3GPP technical specifications: 3GPP

TS 22.233, 3GPP TS 26.233, and 3GPP TS 26.234. PSS provides a framework for

IP-based streaming applications in 3G networks. This framework is very much in

line with what we have discussed so far in this chapter. This framework uses CC/PP for capability exchange (see Fig. 9.22), SMIL for presentation description, and

UE UTRAN/GERAN SGSN GGSN

Radio Bearer

Activate PDP Context RequestSecurity Functions

RAB Assignment Request

RAB Assignment Response

Create PDP Context Request

Create PDP Context Response

Activate PDP Context Accept

Figure 9.21 PDP context activation procedure.


RTSP for session control SDP for session description. However, there are minor

differences here and there. Let’s go over these one by one.

9.6.2.1 Streaming Media Session Setup Procedures for PSSFigure 9.23 shows an example of a simple session establishment. The first step is to

know what content to get and where to start. The client can obtain the URI of the

content from an SMIL presentation document, a simple Webpage, or an email, or

just simply by word of mouth. Once the URI is known, the client application

sends a request for the primary PDP context that is opened to allocate the IP

address for the UE as well as the access point. The primary PDP context is used

to access content servers in either IMS domain or external IP-PDN. Since the

primary PDP context is used for RTSP signaling, it is created with UMTS interactive

QoS profile. A socket is opened for RTSP signaling and is tied to the primary PDP

context. The client can now query the content server to learn more about the content

using RTSP DESCRIBE request.2 The client may include its CC/PP description in

the request. The client does not need to include the profile description if it is sure that

the URI that it is using in the RTSP request already points to a resource that is com-

patible with its profile. Such would be the case if the URI were obtained from an

SMIL document, which was obtained after presenting a valid CC/PP description.

If the profile is included, it is carried using the x-wap-profile and the x-wap-

profile-diff headers for CC/PP exchange protocol that we discussed earlier.

Device capability

prof ilesMatching

Content Server

StreamingClient

Device Profile Server

HTTP/RTSP requestincluding URL descand profileDiffheaders

HTTP Requestfor a device

capabilityprofile

HTTP Responsewith device

capability profile

Figure 9.22 Capability negotiation mechanism applied in PSS.

2RTSP DESCRIBE is mandatory in 3GPP-PSS architecture; however, IETF does not mandate its use.


If the profile description is included, the server can find or create content that is

most suitable for the client’s request URI and its profile. Otherwise it just

selects the default content corresponding to the URI. The server sends back the

response with the description of the session that will be used to deliver the selected

content.

On receiving the description, the client can determine whether it likes the descrip-

tion, which it is likely to be the case because it has presumably been tailored to the

client’s capabilities and preferences. The client can now send a SETUP request to

the server, asking it to make necessary arrangements for the streaming session.

The server acknowledges the SETUP request by sending a “200 OK” response

message back to the client. The client now needs to establish a PDP context that

is suitable for the anticipated multimedia streaming session. It does so by opening

two sockets for RTP and RTCP traffic and tying it to two secondary PDP contexts.

The secondary PDP contexts are assigned appropriate UMTS QoS profiles. The sec-

ondary PDP contexts reuse the same IP address and access point as the primary PDP

context.

Now that everything is ready, the client can send a PLAY request asking the

content server to start the streaming session. The streaming media are typically

transported over UDP/RTP/IP protocols as described in SDP.

Figure 9.23 shows the presentation and content server as single entity, but these

may in fact be logically and physically separate entities.

UE GGSN Presentation and content server

[RTSP]DESCRIBE [with capability and preference profile]

[RTSP]200 OK [session description suitable for the device profile][RTSP]SET UP

[RTSP]200 OK

[RTSP]PLAY

[RTSP]200 OK

RTP and RTCP flows

Secondary PDP context activation(one for RTP traffic and another

For RTCP)[see Figure 9.21]

Primary PDP context activation(RTSP) signaling[see Figure 9.21]

Figure 9.23 Streaming multimedia session establishment in PSS.


9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESSENVIRONMENTS

The main factors that differentiate wireless mobile environments are

. Limited Bandwidth and Error-Prone Channel. The channel characteristics of a

wireless channel have a very unpredictable time-varying behavior due to

several factors such as interference, multipath fading, and atmospheric con-

ditions. The last hop of communication is wireless, which not only offers rela-

tively low bandwidth, but also suffers from higher bit error rate (BER).

Furthermore, retransmissions needed to recover from these errors induce vari-

able delay across the wireless channels.

. The Movement. The mobile users move!! The movement triggers a handoff

mechanism to minimize interruption to an ongoing session. The wireless

channel characteristics may vary significantly from one segment of the

network to another. Since the handoffs almost always incur packet loss, they

further aggravate the already lossy nature of wireless medium. Finally, the

relative pathlength from the server to the clients may vary as the clients

move across networks. This is especially true if the server is close to the

edge, as in the content distribution networks.

In the following text we will cover some recent proposals to alleviate the problems

that arise as a result of these mobility and error-prone problems faced by mobile

content delivery systems. Also, we look into the research issues regarding providing

streaming service in heterogeneous network environments.

9.7.1 Differentiating Transmission Error Lossesfrom Congestion Losses

In the wired and wide area Internet, most of the packet loss occurs as a result of con-

gestion. In wireless environments, however, the major source of packet loss is trans-

mission errors over the wireless channel. The natural approach for avoiding packet

losses due to congestion is rate control and slowing down the sender. But this

approach is not suitable for avoiding or recovering from errors on wireless channels.

The techniques used for error recovery or packet loss avoidance over the wireless

channels build better error resiliency in the packets so that even if some packets

are dropped, they can still be recovered at the receiver. Alternatively, some

senders use aggressive retransmissions, but that is bound to introduce congestion

problem.

A typical mobile multimedia delivery environment comprises both wired and

wireless links. In such an environment an end-to-end feedback mechanism, such

as RTCP feedback messages can convey information only about the net end-to-

end packet loss and there is no way for the sender to ascertain whether the packet

was lost on the wired network or the wireless network. Since counteracting the

9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESS ENVIRONMENTS 315

two types of the packet loss requires different techniques, the sender cannot cope

with the situation effectively without being able to distinguish between the two

types of packet loss.

To address this problem, a novel RTP monitoring technique has been intro-

duced [47,48]. This technique relies on placement of a RTP monitoring agent at

the edge of the wired/wireless network. This agent monitors the RTP streams

and sends RTCP feedback to the sender of the stream, such as a streaming

server. This feedback is in addition to the RTCP feedback generated by the reci-

pient itself (see Fig. 9.24). The RTCP feedback from the client gives aggregate

loss over both the wireless and the wired segments of the end-to-end path. On

the other hand, RTCP feedback from RTP monitoring agent gives the loss over

the wired segment only. This helps the recipient (typically a streaming server)

to determine whether the loss occurred in the wireless or wired segment of the

path. It is worth mentioning here that since the RTCP feedback messages are

not generated at the same rate as the RTP packets, the feedback captures aggregate

packet loss over the RTCP period, which is typically a few seconds. Thus the

server can only estimate the percentage of packet loss over the wired and wireless

segments and must adapt the stream accordingly. Details on the RTP monitoring

techniques and its applications can be found in two papers by Yoshimura and

colleagues [47,48].

9.7.2 Counteracting Handover Packet Loss

As we pointed out earlier in the section, handovers are the cause of additional packet

loss in mobile networks. Although network layer mobility protocols such as mobile

Wired Network

UTRAN

Most of the packets lost here are due to

transmission errors on the wireless channels

Most of the packets lost here are due to

congestion

Media Stream over RTP

RTCP Feedback sent by client RTCP Feedback sent by

RTP monitorRTP Monitor or Streaming Agent –

placed at the edge of wired and wireless network

Figure 9.24 Streaming agent to differentiate wired and wireless packet loss.


IP [49] and fast mobile IP [50] attempt to provide seamless handovers during host

movement, some packet loss is inevitable because of signaling propagation delay.

A novel end-to-end technique for soft IP handover has been proposed [51].

Figure 9.25 shows an overview of this scheme.

This scheme assumes that the receiver host is at least temporarily attached to mul-

tiple interfaces during the handoff process. The receiver host signals this situation to

the sender, along with the information about the interfaces, such as their IP addresses

and their relative priority based on signal strength, estimated bandwidth, or packet

loss rate on individual interfaces.

The IP stack on the sender host then generates redundant error correction

symbols, denoted as F1, F2, D1, and so on in Figure 9.25, and dispatches them to

the multiple interfaces of the receiver. Reed–Solomon codes are used to generate

the redundant symbols [51]. In general, if a message is extended from k symbols

to n symbols through addition of (n2 k) redundant symbols, then up to (n2 k)

redundant symbols can be recovered at the receiver node. For example, in

Figure 9.25 n ¼ 2k, that is, there are just as many redundant symbols as the

symbols in the original message; thus the receiver should be able to recover the

application data even when any n symbols are lost.

Application Data

D1 D1 D1

D1 D2 D3 F1 F2 F3

D1 D2 D3 F1 F2 F3

D1 D2D3 F1F2 F3

D1 D2D3 F1F2 F3

Network

(a)

(b)

(c)

Network Layer multiplexes the data between destination interfaces based on the priority assigned by the application and dispatches them accordingly

Application data can be recovered if some packets are lost on either interfaces

Transport Layer (a) fragments the application data, (b) generates error correction codes and (c) attaches the transport header

Multihomed bicastcapable host in handover

situation

Figure 9.25 Bicasting forward error correction codes.


9.7.3 Mobility-Aware Server Selection and Request Routingin Mobile CDN Environments

We mentioned earlier that the movement of a mobile host might result in the

establishment of an entirely new path with very different path characteristics.

If the servers are present very close to the edge of the network, as in a high-

density content distribution network, this change of relative distance and path

characteristics may result in significant QoS degradation, especially for streaming

multimedia, where the sessions are typically long. This situation can, however, be

alleviated by changing the content server as the host moves as proposed in Refs.

52 and 53.

The technique revolves around keeping track of host movement and assigning a

new server as the host moves from optimal content delivery region of one server to

another (see Fig. 9.26). A number of methods may be used to keep track of host

movement and then perform server handoff. Tarig and Takeshita [53] define

server coverage areas are as sets of IP subnets, and mobile IP binding update mess-

ages are used to track user movement. Server handoff is treated as a process of estab-

lishing session with new server and terminating with old one, and is achieved using

extended RTSP methods [53]. Yoshimura et al. [52] use SOAP messages to update

the presentation file used by the host, so that the next segment is fetched from the

most appropriate server.

The techniques described in Ref. 54 go a step further and analyze the host mobi-

lity in terms of how rapidly or slowly it is moving and try to assign a server on that

basis. This predictive algorithm can significantly reduce the number of server hand-

offs that may be necessary.

Server 1

Coverage Area of Server 1

Movement from coverage area of one server to another may trigger a server handoff

Server 2

Coverage Area of Server 2

Figure 9.26 Mobility-based server selection techniques.


9.7.4 Architectural Considerations to Provide Streaming Servicesin Integrated Cellular/WLAN Environments

The wireless LAN is fast emerging as a complementary technology to 3G networks.

This technology provides very high-speed access (11 Mbps for 802.11b and

54 Mbps for 802.11a) but covers very small area and allows limited mobility. On

the other hand, the 3G technology provides access at relatively low speed

(�100 kbps for GPRS) to medium speed (�2 Mbps for UMTS) but covers a wide

area and allows high mobility. A number of interworking mechanisms [55–57]

have been developed to integrate these two technologies into a single wireless

data network that allows very high-speed access at hotspot areas such as airports

and shopping malls. Integration of the WLAN and the cellular network falls in

two categories depending on who owns and manages the WLAN. For example,

operators can own and manage WLANs to augment their cellular data networks.

Thus, an operator can gain competitive advantage by providing enhanced data ser-

vices at strategic locations such as airports and hotels. In the alternative scenario an

independent wireless Internet service provider (WISP) or enterprise can own the

WLAN. In either of the two cases an end user can obtain very high quality streaming

service in hotspot locations.

Two methods are used to integrate cellular and WLAN networks: tight coupling

and loose coupling as illustrated in Figure 9.27. The architectural issues to provide

seamless streaming service for both the methods are described in the following

paragraphs.

SGSN GGSN Gi

AP AP

APIWU

AP AP

APIWU

Loose Coupling

Tight Coupling

Gi

WLAN

WLAN

IP Network

Internet MultimediaSubsystem (IMS)

Proxy

ContentServers

StreamingClient

GERAN/UTRAN

Gb/Iu

UMTS CoreNetwork

Figure 9.27 Generalized integrated UMTS/WLAN architecture.


9.7.4.1 Tight CouplingUnder this integration scheme, the WLAN is connected to the GPRS core network

in the same manner as any other radio access network (RAN), such as GPRS RAN

(GERAN) and UMTS terrestrial network (UTRAN). The WLAN is deemed as a

new radio access technology within the cellular systems. The WLAN may either

emulate a radio network controller (RNC) or a SGSN. From the core network

point of view, the WLAN is like any other GPRS routing area (RA) in the

system. An interworking unit (IWU) is needed to interface the WLAN to the

GPRS core network. The main advantage of this solution is that the mechanisms

for mobility, QoS, and security in the core network can be reused. Within this

architecture the handover takes place when a mobile user either enters or leaves

a hotspot area. The IP address allocated to the mobile user under this scheme

does not change during the handover process since the mobile user still remains

under the same GGSN. The hotspot areas and cellular coverage areas normally

overlap, and the handover is based on end user’s desire. For example, a mobile

user, receiving multimedia streaming service, would like to hand over to the

WLAN when moving into the hotspot area to improve performance. Since the bot-

tleneck bandwidth in wireless environments lies in the air interface, the transcod-

ing functionality (in the proxy server) may not be needed in the delivery path when

the mobile user hands over from the cellular RAN to the WLAN. This scheme may

also require implementation of additional QoS adaptation mechanisms to support

seamless handover between the WLAN and the cellular RAN for real-time

applications such as streaming.

9.7.4.2 Loose CouplingUnder this integration scheme, the WLAN interfaces directly with the IP-PDN (e.g.,

the Internet) and has no direct interface with the GPRS core network. In this scen-

ario, WLANs and cellular networks are two separate access networks. Loose coup-

ling scheme may deploy IETF-based protocols to handle authentication, accounting,

and mobility. The WLAN appears as a visiting network to the UMTS core network.

A mobile user is typically allocated a new IP address while handing over from the

UMTS network to the WLAN or vice versa. Seamless handover under this scheme

may require advanced mechanisms like context transfer [50] (session context, QoS

context, security context, etc.) and resource reservation. Providing seamless stream-

ing service under this integration scheme is an open research problem.

The streaming in mobile and wireless environments is a subject of active

research. Some of the open research issues to provide multimedia streaming services

in mobile and wireless environments are

. Seamless service during interdomain and intertechnology handoffs

. Dynamic QoS adaptations and channel allocations

. Optimizations across lower and higher layers

. Efficient micromobility protocols [58] to make smoother intradomain

handovers


. Secured streaming, digital rights management schemes

. Efficient implementations of multicast streaming services

Some of the more recent studies on these topics are listed at the end of this chapter

[e.g., 3, 4, 59–67].

9.8 CONCLUSIONS

This chapter addresses the architectural and design issues to provide streaming ser-

vices in wireless environments. Supporting streaming services in wireless environ-

ments is a big challenge, due to error-prone wireless channels and mobility-

induced factors. Also, limited buffering and processing power available in portable

mobile devices impact the design of wireless access network architecture. A lot of

research work has been done to address these issues [51,54,59–62]. The wireless

access network architecture should implement appropriate mechanisms to mitigate

the impact of wireless/mobility-induced factors in order to minimize the resource

and processing requirements at the mobile terminal. We have discussed some of

these research issues and related work. The chapter gives a general overview of an

end-to-end architecture including network elements and protocols to provide stream-

ing services in wireless/mobile environments. We also describe packet-switched

streaming service architecture developed by 3GPP (abbreviated 3GPP-PSS).

There has been widespread effort to develop adaptive modulation, equalization

and coding schemes that uses the real-time estimation of channel characteristics

to achieve certain performance objectives such as error rate and delay at the physical

layer. A number of smart-antenna-based technologies have been developed that use

space diversity techniques to mitigate the impact of multipath fading and achieve

higher capacity. Also, there has been lot of work on micromobility protocols [58]

(such as FMIP) at the network layer to reduce mobility-induced disruption. There

is a need to look into joint optimization issues across various layers to provide

good-quality seamless streaming service in wireless/mobile environments. The

wireless bandwidth can be utilized in an effective manner if the lower layers have

detailed understanding of the application requirements. A well-defined interface

between IP layer and lower layers would be very useful in next-generation wireless

networks. Indeed, the EU IST project BRAIN has already defined an IP-to-Wireless

(IP2W) interface for this purpose. There are still a number of design issues regarding

providing streaming services in heterogeneous wireless networks that include

various wireless access technologies (3G, WLAN, Bluetooth, etc). Secured stream-

ing is yet another area of active research. The ability to protect the intellectual prop-

erty rights of the content owners will be a key factor in the mobile digital content

market.

Multimedia streaming services are becoming very popular on the Internet, and

when these services become mobile, animation, music, and news services will be

available to users regardless of the location and time. Next-generation mobile

9.8 CONCLUSIONS 321

networks will combine the standardized streaming service with a range of unique

services to offer a wide range of innovative and exciting multimedia services to

the rapidly growing mobile market.

REFERENCES

1. The Third Generation Partnership Project, http://www.3gpp.org.

2. The Third Generation Partnership Project 2, http://www.3gpp2.org/.

3. I. Elson et al., Streaming technology in 3G mobile communication systems, IEEE

Comput., 34(9): 46–52. (Sept. 2001).

4. H. Montes et al., Deployment of IP multimedia streaming services in third-generation

mobile networks, IEEE Wireless Commun. 84–92 (Oct. 2002).

5. D. Wu et al., Streaming video over the Internet: Approaches and directions, IEEE Trans.

Circuits Syst. Video Technol. 11(3) (March 2001).

6. S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, the

Internet, and the Telephone Network, Addison-Wesley Professional.

7. S. Floyd and K. Fall, Promoting the use of end-to-end congestion control in the Internet,

IEEE Trans. Network., 7: 458–472, (Aug. 1999).

8. S. Floyd et al., Equation based congestion control for unicast applications, Proc. ACM

SIGCOMM, Stockholm, Sweden, Aug. 2000, pp. 43–56.

9. The TCP-Friendly Web Page, URL: http://www.psc.edu/networking/

tcp_friendly.html.

10. R. Rejaie, M. Handley, and D. Estrin, RAP: An end-to-end rate-based congestion control

mechanism for real-time streams in the Internet, Proc. IEEE INFOCOM ’99, March

1999, Vol. 3, pp. 1337–1345.

11. S. McCanne, V. Jacobson, and M. Vetterli, “Receiver Driven Layered Multicast,” Proc.

of ACM Sigcomm, pp. 117–130, Palo Alto, CA, USA, Aug. 1996.

12. R. Rejaie, M. Handley, and D. Estrin, Quality adaptation for congestion controlled play-

back video over the Internet, Proc. ACM SIGCOMM’99, Cambridge, Sept. 1999, pp.

1337–1345.

13. Q. Guo et al., Sender-adaptive and receiver-driven video multicasting, Proc. IEEE Int.

Symp. Circuits and Systems (ISCAS 2001), Sydney, Australia, May 2001.

14. Y. Wang, M. T. Orchard, and A. R. Reibman, Multiple description image coding for

noisy channels by pairing transform coefficients, Proc. IEEE Workshop on Multimedia

Signal Processing, June 1997, pp. 419–424.

15. Xue Li et al., Layered video multicast with retransmission (LVMR): Evaluation of error-

recovery schemes, Proc. INFOCOM’98, March 29–April 1998, Vol. 3, pp. 1062–1072.

16. S. Shenker, C. Patridge, and R. Guerin, Specification of the Guaranteed Quality of

Service, RFC 2212.

17. S. Blake et al., An Architecture for Differentiated Services, RFC 2475.

18. R. Braden et al., Resource Reservation Protocol (RSVP)—Version 1 Functional Specifi-

cation, RFC 2205.

19. D. Durham et al., The COPS (Common Open Policy Service) Protocol, RFC 2748.

20. T. Sikora, MPEG digital video-coding standards, IEEESignal Process.Mag., (Sept. 1997).


21. T. Sikora, The MPEG-4 video standard verification model, IEEE Trans. Circuits Syst.

Video Technol., 7(1) (Feb. 1997).

22. R. Talluri, Error-resilient video coding in the ISO MPEG-4 standard, IEEE Commun.

Mag., (June 1998).

23. N. Poll, MPEG digital audio coding, IEEE Signal Process. Mag., (Sept. 1997).

24. 3GPP, Transparent End-to-End Packet-Switched Streaming Service (PSS): Protocols and

Codes (Release 5), Generation Partnership Project TS 26.234. V5.4.0.

25. H. Schulzrinne, A. Rao, and R. Lanphier, Real Time Streaming Protocol (RTSP). IETF

Standards Track RFC 2326, April 1998.

26. A. Barbir, B. Cain, R. Nair, and O. Spatscheck, Known CN Request-Routing Mechan-

isms, IETF Work in Progress, April 2003. (Note: CDI working group at IETF has

concluded.)

27. M. Handley, C. Perkins, and E. Whelan, Session Announcement Protocol, IETF Exper-

imental RFC 2974, Oct. 2000.

28. M. Handley, and V. Jacobson, SDP: Session Description Protocol, IETF Standards Track

RFC 2327, April 1998.

29. J. Rosenberg et al., SIP: Session Initiation Protocol, IETF Standards track RFC 3261 June

2002.

30. H.323v5, ITU-T Recommendation H.323, Packet-Based Multimedia Communications

Systems, 2003.

31. http://www.w3.org/TR/2001/REC-smil20-20010807/.

32. Composite Capabilities/Preference Profiles (CC/PP), Structure and Vocabularies,

http://www.w3c.org/TR/CCPP-struct-vocab/.

33. WAP User Agent Profile Specification, Oct. 2001.

34. CC/PP Attribute Vocabularies, http://www.w3.org/TR/2000/WD-CCPP-vocab-

20000721/.

35. Capability Exchange Using HTTP Extension Framework, http://www.w3.org/TR/

NOTE-CCPPexchange.

36. R. Fielding, et al., Hypertext Transfer Protocol—HTTP/1.1, IETF Standards Track RFC

2616, June 1999.

37. Transmission Control Protocol, IETF RFC 793, Sept. 1981.

38. J. Postel, User Datagram Protocol, RFC 768.

39. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: A Transport Protocol for

Real-Time Applications, IETF Standards Track RFC 1889, Jan. 1996.

40. C. Bormann et al., RTP Payload Format for the 1998 Version of ITU-T Recommendation.

H.263 Video (H.263þ ), IETF Standards Track RFC 2429, Oct. 1998.

41. J. Sjoberg et al., Real-Time Transport Protocol (RTP) Payload Format for the Adaptive

Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs, RFC

3267.

42. L.-A. Larzon, M. Degermark, and S. Pink, The UDP-Lite Protocol, IETF Internet Draft,

Work in Progress, Dec. 2002.

43. L.-A. Larzon, M. Degermark, and S. Pink, UDP Lite for Real Time Multimedia Appli-

cations, HPL-IRI-1999-001, April 1999.

44. J. Sjoberg et al., Real-Time Transport Protocol (RTP) Payload Format and File Storage

Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-

WB) Audio Codecs, IETF Standards Track RFC 2429, June 2002.

REFERENCES 323

45. S. Sen, J. Rexford, and D. Towsley, Proxy prefix caching for multimedia streams, Proc.

INFOCOM’99, March 1999, Vol. 3, pp. 1310–1319.

46. J. Rexford, S. Sen, and A. Basso, A smoothing proxy service for variable-bit-rate stream-

ing video, Proc. GLOBECOM’99, Vol. 3, pp. 1823–1829.

47. T. Yoshimura, T. Ohya, T. Kawahara, and M. Etoh, Rate and robustness control with RTP

monitoring agent for mobile multimedia streaming, Proc. IEEE Int. Conf. Communi-

cations (ICC 2002), April 2002.

48. G. Cheung and T. Yoshimura, Streaming agent: A network proxy for media streaming in

3G wireless networks, Proc. IEEE Packet Video Workshop, April 2002.

49. C. E. Perkins, Mobile IP, IEEE Commun. Mag., 66–82 (May 2002).

50. R. Koodli and C. E. Perkins, Fast handovers and context transfers in mobile networks,

paper presented at ACM SIGCOMM, 2002.

51. H. Matsuoka, T. Yoshimura, and T. Ohya, A robust method for soft IP handover, IEEE

Internet Comput. 18–24, (March/April 2003).

52. T. Yoshimura, Y. Yonemoto, T. Ohya, M. Etoh, and S. Wee, Mobile streaming media

CDN enabled by dynamic SMIL, Proc. WWW2002, May 7–11, 2002, Honolulu.

53. M. Tariq and A. Takeshita, Management of cacheable streaming multimedia content

in networks with mobile hosts, Proc. IEEE GLOBECOM2002, Nov. 17–22, 2002,

Taipei, Taiwan.

54. M. Tariq, R. Jain, and T. Kawahara, Mobility aware server selection for mobile streaming

multimedia content distribution networks, Proc. 8th Int. Workshop on Web Content

Caching and Distribution, Hawthorne, NY, Sept. 29–Oct. 1, 2003.

55. A. K. Salkintzis, C. Fors, and R. Pazhyannur, WLAN-GPRS integration for next-gener-

ation mobile data networks, Proc. IEEE Wireless Commun., 112–124 (Oct. 2002).

56. V. K. Varma et al., Mobility management in integrated UMTS/WLAN networks, Proc.

IEEE ICC’03, May 2003, Vol. 2, pp. 1048–1053.

57. 3GPP, Feasibility Study on 3GPP System to WLAN Interworking, Technical Report 3GPP

TR22.934 v6.1.0, Dec. 2002.

58. A. T. Campbell and J. Gomez-Castellanos, IP micro-mobility protocols, Proc. ACM

Sigmobile, 4(4): 45–54 (Oct. 2001).

59. S. Verma and R. Barnes, A QoS architecture to support streaming applications in the

mobile Internet, Proc. 5th IEEE Symp. Wireless Multimedia Communications

(WPMC), Honolulu, Oct. 27–30, 2002.

60. S. Verma and R. Barnes, DiffServ-based QoS architecture to support streaming appli-

cations in 3G networks, Proc. 13th IEEE Symp. Personal, Indoor and Mobile Radio Com-

munications (PIMRC), Lisbon, Sept. 15–18, 2002.

61. S. Verma and R. Barnes, A QoS architecture to support streaming applications in the

mobile Internet, Proc. 12th IEEE Workshop on Local and Metropolitan Area Networks,

Stockholm, Aug. 11–14, 2002.

62. F. H. P. Fitzek and M. Reisslein, A prefetching protocol for continuous media streaming in

wireless environments, IEEE J. Select. Areas Commun., 19(10): 2015–2028 (Oct. 2001).

63. K. K. Leung et al., Link adaptation and power control for streaming services in EGPRS

wireless networks, IEEE J. Select. Areas Commun., 19(10): 2029–2039 (Oct. 2001).

64. S. Dogan et al., Error-resilient video transcoding for robust interwork communications

using GPRS, IEEE Trans. Circuits Syst. Video Technol. 12: 453–464 (June 2002).


65. A. Boukerche, H. Sungbum, and T. Jacob, An efficient synchronization scheme of multi-

media streams in wireless and mobile systems, IEEE Trans. Parallel Distrib. Syst., 13:

911–923 (Sept. 2002).

66. A. Majumda et al., Multicast and unicast real-time video streaming over wireless LANs,

IEEE Trans. Circuits Syst. Video Technol., 12: 524–534 (June 2002).

67. B. Zheng and M. Atiquzzaman, A novel scheme for streaming multimedia to personal

wireless handheld devices, IEEE Trans. Consum. Electron., 49: 32–40 (Feb. 2003).

68. RDF Premier, http://www.w3c.org/TR/rdf-premier/.

69. WAP Push Architectural Overview, July 2003.

70. R. Rejaie, M. Handley, and D. Estrin, Architectural considerations for playback of quality

adaptive video over the Internet, Proc. IEEE ICON 2000, Sept. 2000, pp. 204–209.

REFERENCES 325

MultimediaStreaming_bookchapter_proof

Documents

mobile wireless networks

multimedia streaming