Mobile Formats Explained

8/6/2019 Mobile Formats Explained

1/21

HELSINKI UNIVERSITY OF TECHNOLOGY 30.11.2004

Telecommunications Software and Multimedia Laboratory

T-111.550 Multimedia Seminar

Fall 2004: Mobile Multimedia Application Platforms

Mobile Audio

from MP3 to AAC and further

Henri Autti Johnny Bistrm

51194K 21548C


2/21

1

Mobile Audio from MP3 to AAC and further

Henri Autti and Johnny BistrmHUT, Telecommunications Software and Multimedia Laboratory

[email protected]

[email protected]

Abstract

The purpose of this paper is to evaluate the advanced audio codecs and

reflect over their suitability for mobile needs of today and tomorrow. The

historical development of different codecs for different purposes is

analyzed. The features of the most common codecs are discussed in parallelwith performance and other criteria. The capabilities of mobile devices and

the telecommunication possibilities now and in the future are also

considered in the analysis. Finally some comparisons of the codecs

performances are done. Some existing applications on mobile audio are

presented and ideas for audio applications in the future are discussed. Some

of the questions to be answered here are. Can one codec, that is superior for

the mobile world, be found or do we have to prepare for a wide diversity in

the future? Will the codecs continue to develop as rapidly as they have

done so far?

1 INTRODUCTIONA look at audio applications for the mobile world today reveals that the diversity in

implementations is wide. The solutions chosen for the representation of audio streams in

certain situations differ greatly and the number of codecs used for the purpose of

encoding and decoding audio data streams is large. It is not obvious which codec should

be used for what purpose. The selection of codec depends on several factors as on

content type of the audio material, the available communication speed and the quality

requirements of the listening situation. Other factors that might influence the selection

of codec are the standardization situation, the licensing policy and the competitors

choices in the market. During the last years the mp3 format has been a great success butit does not fit well into mobile devices. Lately more efficient codecs as AAC and AMR

have been presented and they have been refined for mobile audio purposes.

The purpose of this paper is to evaluate the most important audio codecs by

revealing the technical principles of the en- and decoding, the standardization situation

and the suitability of the codec in relation to technology available and the market needs.

The analysis also takes into consideration the development situation of mobile and

telecommunication hardware and software. Using technical literature and documented


3/21

2

listener testing combined with mobile manufacturer specifications and published white

papers we try to find out if there is one superior codec for mobile audio applications and

which codec it would be. To do that, an analysis of existing and future audio

applications in the market has to be done to clarify the needs and expectations on mobile

audio. Finally the result is reflected against the development trends of the mobile

technology and the persistency of the chosen solution is judged.

2 BACKGROUNDIn this chapter we take a look at the background of mobile audio formats. First we

present some basic facts about the development of audio codecs and the reasons for

developing them. Then we discuss some facts about the development of mobile devices

including phones, PDAs and Laptops. Later in this part we take a quick look at the

applications available today and the demands these applications pose

The short history of audio codecs dates back to the mid-1980s, in the Fraunhofer

Institut in Erlangen (Fraunhofer, 2004) , Germany, which first began working on a highquality, low bit-rate audio coding with the help of Dieter Seitzer, a professor at the

University of Erlangen. Their project was financed by the European Union as a part of

the market-oriented Eureka research program where it was commonly known as EU-

147. In Germany in 1989, Fraunhofer was granted a patent for mp3, which we are going

to discuss more thoroughly in the next chapter. A few years later it was submitted to the

International Standards Organization (ISO), and mp3 was introduced as a part of the

official MPEG-1 standard in 1992. It was in January 1995 that Fraunhofer applied

patent on mp3 in America as well and it was granted in November 1996. The

revolutionary thing created was, that using mp3-compression PC-users were for the first

time in history able to compress an ordinary music-CD to one tenth of its original size,

with only a small sacrifice in the sound quality - thus 12 hours of music could be storedon a recordable CD that on the other hand could be played by a mp3-CD-player or an

ordinary PC.

In the rapidly evolving world of mobile content development things have changed a

lot since those days. Nowadays ranging from small laptops through palmtops to phones,

these devices are more available, and high-speed wireless networks are getting better

day by day. At the same time speech and audio compression have advanced rapidly in

recent years spurred on by cost effective digital technology and diverse commercial

application. Wideband speech and high fidelity audio compression have also made great

progressions in recent years, accelerated by the commercial success of consumer and

professional digital audio products. Telephone speech, wideband speech and wideband

audio signals differ not only in bandwidth and dynamic range, but also in listenerexpectations of the offered quality. Using of wideband not only improves the

intelligibility and naturalness of speech, but also adds a feeling of transparent

communication and eases speaker recognition.

The commercial applications in the mobile content area of today are also developing

at a growing rate. Mobile device services contain, according to Ericsson (Bruhn, 2004)

streaming, messaging, downloading and broadcasting. Streaming scenarios include

news listening, monitoring of sports events, audio books, music listening, commercial


4/21

3

advertisements, access to information systems and interactive gaming. Broadcasting

scenarios are very close to screaming scenarios including web casting or Internet radio

broadcasting. They have become especially popular allowing listeners to "stream" audio

on their computers. Unlike downloaded audio files, streamed audio files are not stored

on the users hard drive, but are broadcasted like traditional radio through the users

audio player. Messaging scenarios are also similar to streaming, but with size

limitations, including business-to-person and person-to-person scenarios. Downloadscenarios include music, books and comics downloading over the network. Important

for all of these scenarios named above, is to be able to handle mixed content - covering,

music, speech, speech-between-music and speech-over-music.

The demands these applications pose today on audio codecs for mobile services

include the ability to cope with generic content, sufficient and consistent quality at

lowest rates, best quality at lowest rates, and high quality operation with relaxed bit rate

requirement. The new audio codecs also have to be optimized for low-resource devices

(low memory and computational resources) and have to be supportive for a variety of

operating systems, e.g. Symbian, WinCE, Palm OS5 and OS6. Developing and

standardizing the codecs is at the moment focusing on 3GPP, which is the body

standardizing GSM, evolved GSM UMTS and 3G. In the next chapter we are going to

introduce some of the most important audio standards and codecs, which play an

important role in the 3GPP.

3 AUDIO STANDARDS AND CODECSIn this chapter we describe what we consider the most important audio standards and

codecs at the moment. In the first part of this chapter, we are going to discuss the 3GPP

audio standard format families AAC and AMR, introducing the underlying technology.

First we present mp3 (predecessor to AAC), AAC, HE-AAC and EAAC+ and then thechallengers; AMR, AMR-WB and AMR-WB+. In the second part we are going to

discuss an open source codec Vorbis Ogg ACM and some of the most important non-

standard audio formats, using the streaming technology, Windows Media Audio and

RealMedia. For terminology, architecture and technology see Wales (2004) and ARM

Developers Guide (2004).

3.1 MPEG-1 (mp3)Mp3 stands for MPEG-1 Audio Layer III. It is not a separate format, but a part of the

MPEG-1 video encoding format, as described earlier. Mp3 is a lossy data compression

method (meaning that compressing a file and then decompressing it retrieves a file thatmay well be different to the original, but is "close enough") to store good quality audio

into small files by using psychoacoustics in order to get rid of the data from the audio

that most humans can't hear. Mp3's bit rates vary from 8 kbps to 320 kbps. When the

mp3 phenomenon began in 1996, most of the audio files were encoded using 128 kbps

bit rate, which is still the most popular bit rate in the world - although most of the

people agree that by using slightly higher bit rates, like 192 kbps or 256 kbps, the audio

quality can be compared with the CD quality.


5/21

4

The problem with mp3s takes place at lower bit rates (64 kbps and below), because

the sound starts lacking the high frequency components. The reason is that mp3 at these

bit rates runs out of bits to compress the music in full audio bandwidth and with

significant detail. Mp3PRO was created to solve the problem of limited bandwidth mp3

files. To improve the sound quality of mp3 at lower bit rates, an enhancement

technology that gives back the sound the high frequency components has been

developed. The technology is called "Spectral Band Replication" (SBR). SBR is a veryefficient method to generate the high frequency components of an audio signal. The

resulting audio format is composed out of two components, the mp3 part for the low

frequencies and the SBR or "PRO" part for the high frequencies. The first part analyses

the low frequency band information and encodes it into a normal mp3 stream. This

enables the encoder to concentrate on less information and allows it to do a better job of

encoding. This also maintains complete compatibility to existing mp3 players. The

second part analyses the high frequency band information and encodes it into a part of

the mp3 stream that is normally ignored by existing mp3 decoders. Detailed information

can be found at mp3PRO Zone (2004).

3.2 MPEG-2 AACAAC (Advanced Audio Coding), also known as MPEG-2 AAC, is a lossy data

compression scheme intended for audio streams. AAC was designed to replace mp3. It

is part of the MPEG-2 standard introduced in 1994 and developed by the MPEG group

that includes Dolby, Fraunhofer (FhG), AT&T, Sony, and Nokia - companies that have

also been involved in the development of audio codecs such as mp3 and AC3 (also

known as Dolby Digital). Unlike older MPEG audio encoding methods, MPEG-2 AAC

is not backwards compatible to older MPEG audio formats. For example, mp3 is

backwards compatible to mp2.

The function of AAC is based on a wideband audio coding algorithm that exploitstwo primary coding strategies to dramatically reduce the amount of data needed to

convey high-quality digital audio. First, the signal components that are "perceptually

irrelevant" and can be discarded without a perceived loss of audio quality are removed.

Next, redundancies in the coded audio signal are eliminated. Efficient audio

compression is achieved by a variety of perceptual audio coding and data compression

tools.

When compared side-by-side with its predecessor, mp3, AAC is proving itself

worthy of replacing mp3 as the new Internet audio standard. It has improved

compression, which provides higher-quality results with smaller file sizes. It has support

for multi-channel audio, providing up to 48 full frequency channels, higher resolution

audio, yielding sampling rates from 8 up to 96 kHz and improved decoding efficiency,requiring less processing power for the decoding. These result in higher quality output at

lower data rates, allowing even modem users to hear a difference. It also enables the

listener to get a better and more stable quality than mp3 at equivalent or slightly lower

bit rates. Depending on the AAC profile and the mp3 encoder, 96 kbps AAC can give

nearly the same or better perceptional quality as 128 kbps mp3.


6/21

5

3.3 MPEG-4 HE-AACMPEG-4 High Efficiency AAC is the combination of MPEG-2 AAC and the SBR

Bandwidth Extension amendment that is based on SBR (Spectral Band Replication)

technology. HE-AAC is not a replacement for AAC, but rather a superset, that extends

the reach of high-quality MPEG-4 audio to much lower bit rates (as low as 32 kbps).

HE-AAC is able to achieve superior audio quality without losing treble sound or thecollapsing of the stereo image.

HE-AAC decoders will decode plain AAC and the enhanced AAC plus SBR. The

result is a backward compatible extension of the standard that nearly doubles the

efficiency of MPEG-4 audio. As discussed before, SBR is a unique bandwidth extension

technique, that doesnt replace the core codec, but operates in conjunction with it to

create a more efficient superset, that can cut the required bit rate in half. Present in both

the encoding and the decoding process, SBR leverages the correlation between the low

and the high frequencies in an audio signal to describe the high-end of the signal using

only a very small amount of data. This SBR data describing the high-frequencies is

coupled with the low-frequency compressed data from the AAC codec. Once combined,

the complete HE-AAC bit stream contains enough data to recreate the original signal.(See figure 1.)

For example, to create 48 kbps stereo HE-AAC, the encoder generates two signals:

an MPEG AAC signal at about 42 kbps and a SBR signal at about 6 kbps. The SBR

signal is then placed into the MPEG AAC auxiliary fields as defined in MPEG-4 and

sent out as a complete 48 kbps MPEG-4 HE-AAC bit stream.

Figure 1. The encoding and decoding process of HE-AAC.

Because the SBR data is placed within the AAC auxiliary fields, the enhanced signal

will be accepted by both an existing AAC and a new HE-AAC decoder. If sent to an

AAC decoder, only the low-frequency audio signal will be recognized and decoded. Ifsent to an HE-AAC decoder, the SBR and the AAC will be decoded to recreate the full

frequency signal. This technique makes the new profile forward compatible with AAC.

Because the HE-AAC decoder contains a full-fledged AAC decoder, it is also able to

decode both the Plain AAC and HE-AAC MPEG-4 Audio profiles. This

combination makes HE-AAC backward compatible with AAC.As a result, HE-AAC delivers cd-quality stereo at 48 kbps and 5.1 channel surround

sound at 128 kbps. This level of efficiency is ideal for Internet content delivery and


7/21

6

fundamentally enables new applications in the markets of mobile and digital

broadcasting. However HE-AAC is not good enough for two-way communications, due

to its very high delay according to Frerichs (2003).

3.4 EAAC+Enhanced AAC+ was introduced in 3GPP release-6 standard in 2004. It has an

optimal operating range from 18 kbps and higher. According to 3rd Generation

Partnership Project, enhanced AAC+ general audio codec consist of MPEG-4 AAC,

MPEG-4 SBR and MPEG-4 Parametric Stereo. The AAC is a general audio codec, SBR

is a bandwidth extension technique offering substantial coding gain in combination with

AAC, and Parametric Stereo enables stereo coding at very low bit rates. According to

IBC 2003 Conference Papers, the basic principle behind the parametric stereo is similar

to the SBR principle - a guided reconstruction of a stereo signal based on a transmitted

mono signal. In addition to a coded mono mixdown of the stereo input signal,

parameters describing the stereo image are transmitted. The stereo parameters require a

small fraction of the total bit rate, ensuring a high quality of the mono signal at the given

bit rate. Two parameters are used to describe the stereo information, a panoramaparameter and an ambience parameter. The panorama parameter contains information

about the left to right level differences within different frequency bands. Similarly, the

ambience parameter depicts the stereo ambience for a set of frequency bands. The

encoding of both parameters uses the same principle of entropy coding of time- or

frequency-direction differences as is used for the SBR envelopes. In addition, the

quantization steps are frequency dependent.

Also in addition to the older codecs, there are 3 additional tools included in the

Enhanced AAC+ decoder. Error concealment tools for AAC, SBR, and PS make the

decoder robust against transmission errors like frame loss. These tools mitigate audible

effects of such errors. The stereo-to-mono down mix tool enables a decoder onlycapable of mono output to down mix a stereo bit stream. For the AAC part this is done

in the time domain after the stereo decoding but for SBR this is done on the SBR

parameters and thus saving complexity since only a mono decoding of SBR is needed.

The Spline resampler tool gives the possibility to resample the output to a sampling

frequency different than what was supplied in the bit stream. This gives for example

handsets with a D/A converter only capable of 16 kHz sampling frequency the

possibility to play bit streams encoded with 22.05 kHz sampling frequency.

Figure 2 shows a block diagram of the EAAC+ encoder. The encoder basically

consists of the AAC waveform encoder, the SBR high frequency reconstruction

encoding tool and the PS encoding tool. The encoder operates in a dual rate mode,

whereas the SBR encoder operates at the encoding sampling ratefsenc as delivered fromthe IIR resampler and the AAC encoder at half of this sampling rate fsenc/2.

Consequently a 2:1 down sampler is present at the input to the AAC encoder. The PS

tool is used for low bit rate stereo coding, i.e. up to and including a bit rate of 32 kbps.

The AAC encoder implementation complies with the AAC Low Complexity Object

Type and is a highly optimized low-resource implementation, requiring only little

computational complexity and memory resources. This is basically achieved by mapping

the psychoacoustic based threshold estimation directly to scale factor amplification

values to shape the encoding quantization noise according to the input signal


8/21

7

characteristics, rather than employing time-consuming iterative analysis-by-synthesis

methods.

The SBR encoder consists of a QMF (Quadrature Mirror Filter) analysis filter bank,

which is used to derive the spectral envelope of the original input signal. Furthermore

the SBR related modules control the selection of an input signal adaptive grid

partitioning of the QMF samples on the time axis (i.e. control the framing), analyze ofthe relation of noise floor to tonal components in the high band, collect guidance

information for the transposition process in the decoder and detect missing harmonic

components which could not be reconstructed by pure transposition. This gathered

information about the characteristics of the input signal, together with the spectral

envelope data forms the SBR stream. The amount of bits for the SBR stream is

subtracted from the bits available to the AAC encoder in order to achieve a constant bit

rate encoding of the multiplexed EAAC+ stream.

The Parametric Stereo encoding tool in the EAAC+ encoder estimates parameters

characterizing the perceived stereo image of the input signal. These stereo parameters

are embedded in the SBR stream. At the same time, a signal adaptive mono down mix

of the input signal is generated in the QMF domain and fed into the SBR encoderoperating in mono. This down mix is also processed by a down sampled QMF synthesis

filter bank to obtain the time domain input signal for the AAC core encoder with the

sampling ratefsenc/2. In this case, the 2:1 IIR down sampler is not active.

Figure 2. 3rd Generation Partnership Project; EAAC+ Encoder overview

In the decoder ( figure 3) the bit stream is de-multiplexed into the AAC and the SBRstream. Error concealment, e.g. in case of frame loss, is achieved by designated

algorithms in the decoder for AAC, SBR and PS: the AAC core decoder employs signal-

adaptive spectrally shaped noise generation for error concealment, in the SBR and PS

decoders, error concealment is based on extrapolation of guidance, envelope, and stereo

information.

For the SBR processing, a Low-Power tool of SBR is used for full stereo decoding in

order to keep the peak computational complexity as low as possible over all channel


9/21

8

modes. Usage of the SBR Low-Power tool provides a computational complexity of an

HE-AAC stereo decoder in the same range as plain AAC stereo decoders. The low band

AAC time domain signal, sampled atfsenc/2, is first fed to a 32-channel QMF analysis

filter bank. The QMF low band samples are then used to generate a high band signal,

whereas the transmitted transposition guidance information is used to best match the

original input signal characteristics.

The transposed high band signal is then adjusted according to the transmitted spectralenvelope signal to best match the originals spectral envelope. Missing components that

could not be reconstructed by the transposition process are also introduced. Finally, the

low band and the reconstructed high band are combined to obtain the complete output

signal in the QMF domain.

In case of a stream using parametric stereo, the mono output signal from the underlying

HE-AAC+ decoder is converted into a stereo signal. This processing is carried out in the

QMF domain and is controlled by the parametric stereo parameters embedded in the

SBR stream.

Figure 3. 3rd Generation Partnership Project; EAAC+ Decoder overview

3.5 AMRThe AMR (Adaptive Multi-Rate) standard was introduced in 1998. Its main function

is mobile baseline speech. It operates at variable mono bit rates in the range of 4.75 to12.2 kbps in its narrowband (bandwidth 3.5 kHz) configuration. It was adapted by the

3GPP as the mandatory codec for 3G wireless systems based on the evolved GSM core

network (WCDMA, EDGE, GPRS).

The philosophy behind AMR is to lower the codec rate as the interference increases

and thus enabling more error correction to be applied. The AMR codec is also used to

harmonize the codec standards amongst different cellular systems. This is based on

technology called ACELP (Algebraic Code Excited Linear Prediction). ACELP is a


10/21

9

speech compression system, used to provide a good standard of speech quality when the

network is operating at low data rates (narrow bandwidth). The analogue voice signal is

converted to a digital data signal, so that it can be compressed for transmission over the

network, and the process is then reversed at the other end when the digital data is

converted back to an analogue voice signal. The quality of the reproduced speech will

appear to be much better at the receiving phone than without the ACELP system.

3.6 AMR-WBAMR-WB (wideband extension) is a speech coding standard developed after the

AMR using the same technology as ACELP. The AMR Wideband codec was

standardized by ETSI/3GPP in December 2000, and selected and approved by the ITU-T

in July 2001 and January 2002, respectively. The ITU-T standard is referred to as

G.722.2.

The codec provides excellent speech quality due to its wider speech bandwidth of 50

- 7000 Hz, significantly improving the intelligibility and naturalness of speech and

adding a feeling of face-to-face communication. The AMR-WB speech codec consistsof nine speech codec modes with mono bit rates of 23.85, 23.05, 19.85, 18.25, 15.85,

14.25, 12.65, 8.85 and 6.6 kbps. The lowest bit rate providing excellent speech quality

in clean environment is 12.65 kbps. Higher bit rates are useful in background noise

conditions and in case of music. Also lower bit rates of 6.60 and 8.85 provide

reasonable quality especially if compared to narrowband codecs. Background noise

mode is designed to be used in discontinuous transmission (DTX) operation in GSM

and as a low bit-rate source dependent mode for coding background noise in other

systems.

AMR-WB can also carry narrowband signals. It eliminates the need for transcoding

and eases the implementation of wideband applications and services across a wide rangeof wireless and wire line communication systems and platforms. AMR-WB is already

standardized for future usage in networks such as UMTS. There it provides so much

higher speech quality that it seems probable that also older networks will have to

gradually be transformed to support wideband.

3.7 AMR-WB+Adopted as an audio codec standard in September 2004 by ETSI/3GPP, AMR-WB+

is an audio extension of AMR-WB, which utilizes a hybrid of two technologies: ACELP

and TCX (Transform Coded Excitation) to deliver very high sound quality for both

speech and audio content types, including music, voice-between-music, and voice-over-music.

AMR-WB+ adds support for stereo signals and higher sampling rates. Also, high-

efficiency parametric stereo (HE-PS), as discussed under EAAC+, provides high-fidelity

stereo image reproduction at the lowest bit rates. Another main improvement is the use

of transform coding additionally to ACELP. This greatly improves the generic audio

coding. Automatic switching between transform coding and ACELP provides both very

good speech and other audio quality with moderate bit rates. Sound quality is not


11/21

10

compromised even in networks where the bandwidth is limited.

The AMR-WB+ codec has a wide bit-rate range, from 6 to 48 kbps. Mono rates are

scalable from 6 to 36 kbps, and stereo rates are scalable from 8 to 48 kbps, reproducing

bandwidth up to 24 kHz (approaching CD quality). Moreover, it provides backward

compatibility with AMR wideband. AMR-WB+ brings speech and music to mobile

phones (VoiceAge, 2004).

3.8 Vorbis Ogg ACMDue to numerous patenting and licensing issues with various parts of the MPEG

specifications, there has been a significant movement to create and popularize audio

formats and/or algorithms which lack that significant problem. The most popular of

these is probably Ogg Vorbis, which is a completely open and free codec project from

Xiph.org Foundation (2004).

Vorbis was started as a result of a plan to charge licensing fees for the mp3 format,

which was announced in September 1998. The first version 1.0 of the codec wasreleased on July 19, 2002. The latest version is 1.1.0 released on September 22, 2004.

The Ogg Vorbis format has proved popular among open source communities; they argue

that its higher fidelity and completely free nature make it a natural replacement for the

entrenched mp3 format. In the commercial sector, Vorbis has already had success with

many newer video game titles employing Vorbis as opposed to mp3.

Given 44.1 kHz as the standard CD audio sample frequency stereo input, the current

encoder will produce output 45 - 500 kbps, depending on the specified quality setting.

Though Vorbis 1.0.1 is tuned for bit rates of 16 - 128 Kbps/channel, it is still possible to

encode arbitrary bit rates chosen by the user. Such figures are only approximate,

however, as Vorbis is inherently variable-bit rate.

Vorbis uses the modified discrete cosine transform (MDCT) for converting sound

data from the time domain to the frequency domain. The resulting frequency-domain

data is broken into noise floor and residue components, and then quantized and entropy

coded using a codebook-based vector quantization algorithm. The decompression

algorithm reverses these stages.

3.9 Windows MediaWindows Media Audio (WMA) is a proprietary compressed audio file format used

by Microsoft. It has a large user base through Windows. It was initially a competitor tothe mp3 format, but with the introduction of Apples iTunes Music Store, it has

positioned itself as a competitor to the AAC format used by Apple. It is part of the

Microsoft Corporation (2004) Windows Media framework. An initial reason for the

development of WMA might have been that mp3 technology is patented and has to be

licensed from Thomson, which controls licensing of the mp3 patents in many countries

including the United States of America, for inclusion in the Microsoft Windows

operating system. It includes multi-channel-coding.


12/21

11

With the publishing of Windows Media Audio 9, the codec was updated to WMA. It

is considered to reach close to AAC in quality. Pro and a new lossless codec has been

introduced to accompany the existing lossy codec. Support for variable bit rates has also

been introduced. WMA Pro has not been reverse engineered yet.

Microsoft's Windows Media Audio (WMA) file format, which they claim is a higher

quality audio format at smaller file sizes, is starting to gain more acceptance as it comesbundled as the standard audio format in Windows 98/2000/XP. Microsoft might be able

to challenge the dominance of MP3s or at the very least offer a second, popular audio

format choice.

3.10 Real MediaRealAudio is a proprietary audio codec developed by RealNetwork. It is especially

designed to conform to low bandwidths, and it can be used as a streaming audio format.

As a matter of fact, it was one of the first to offer streamed audio software in the world.

For high bit rates, Real Media uses AAC. Many radio stations use RealAudio to stream

their programming over the internet in real time. The first version of RealAudio wasreleased in 1995. The current version of the codec, RealAudio 10 was published in

2004. It includes multichannel-coding (RealNetworks Incorporated, 2004).

4 DEVELOPMENT TRENDS AND COMPARISON OF CODECS FOR THEAPPLICATIONS OF TOMORROW

The hardware of the mobile platforms is going through a rapid development and thus

new software and applications can be expected in mobile devices of tomorrow. The

capacity of the central processing unit grows and more memory is already available at a

lower price. This chapter should extrapolate what will happen to the devices in the nearfuture. The wireless communication channels are also going through a development

which leads to faster transmission to the mobile devices. Is there any need for such an

effective compression the HEAAC offers or will it go to history while the limitations

of today disappear?

4.1 The Features of new Mobile PhonesThe main target of this hardware study is the mobile phone, as the number of mobile

phones is much larger than the number of PDAs. The mobile phone is also a good low-

end platform representative for mobile devices as one of the main requirements for a

phone always is its size and weight. According to Symbian Ltd. (Symbian, 2004), the

leading manufacturer of 3G Operating Systems for Mobile Phones, the latest Symbian

operating system, OS 8, is already used on Series 60, Platform 2.0 based 3G phones as

the Nokia 6630 which give them a wide support for audio codecs as NB-AMR, WB-

AMR, MP3, AAC and RealAudio. As the phone has 10 MB of internal dynamic and 64

MB on a MMC, it offers fairly good possibilities for audio and video applications in the

mid-price range. More expensive phones as Nokia 7710, in series 90, with Symbian OS

7, support the same audio codecs even in stereo. The Nokia 7710 has 90 MB of RAM


13/21

12

and can handle a MMC on 512 MB which makes it an excellent choice for audio and

multimedia applications. The same applies to the Nokia Communicator 9500. Thus the

hardware limits for audio have been eliminated in the mid- and high-end mobile phones.

In the low-end mobile phones there are still some relevant hardware restrictions

considering the use of audio, mainly because of the low price requirement, but they will

disappear in the near future.

4.2 The Telecommunication Features of the Mobile NetworksThe basic GPRS (General Radio Packet Services) network still used in many mobile

phones support communication speeds of 30-50 kbps. The EDGE (Enhanced Data rates

for GSM Evolution) or EGPRS technology increases the speed for the end-user to rates

of 120-150 kbps and even a bit higher. EGPRS is available in most mid-end and even

some low-end phones so it can be considered as the standard today. EGPRS is however

available only in urban and suburban areas today. The UMTS (Universal Mobile

Telecommunication System) offers data speeds from 384 kbps (TDD Mode) to 2 Mbps

(TDD Mode) (Compagnie Financire Alcatel, 2004), which removes some of thelimitations.

So far UMTS is available only in high-end mobile phones and only in urban areas.

Finland will not be covered by UMTS networks in the near future, which means that

EGPRS still will be the fastest alternative for a large group of phone users here. The

speed of the EGPRS is however enough for streaming music audio applications if the

latest codecs are used.

4.3 Comparison of Mobile Audio CodecsThere are many methods to compare the quality of audio streams. One method is to

use an audience to judge the quality. A test used by the European Broadcasting Union

(EBU) called MUSHRA is often used as a reference. MUSHRA stands for MUlti

Stimulus test with Hidden Reference and Anchors and is an advanced testing method

developed and proposed by the EBU Project Group B/AIM. The method has been

submitted to ITU for standardization.

MUSHRA (Stoll and Kozamernik, 2000) is a subjective test where listeners in

different EMU-countries compare different types of audio to a reference signal and

grade it according to a scale from 0 to 100, where the interval 81-100 is considered

excellent, 61-80 is considered good, 41-60 is considered fair, 21-40 is consideredpoor and 0-20 is considered bad. Different types of music such as classical, folk, jazz

and pop music is tested. Broadcasting programs, both in a studio and a live

environment, with female and male voices, are also tested.

According to these listener tests, performed by EBU, only a little difference can be

heard between stereo cd-quality and HE-AAC compression at 48 kbps. The test results

are described by Kozamernik (2003). This is also illustrated in figure 4, which shows

that aacPlus, also called HE-AAC gets the highest MUSHRA index of 80 compared to


14/21

13

mp3PRO which gets the index 76. At the rate 48 kbps the more well-known RealMedia

Real 8, mp3 and MS Windows Media 8 codecs get much lower ratings. EBU has not

reported MUSHRA testing of the AMR-WB+ codec yet.

Figure 4. European Broadcasting Union MUSHRA testing at 48 kbps stereo (Coding

Technologies, 2004).

The 3rd Generation Partnership Project (3GPP) is a collaboration agreement that was

established in December 1998. 3GPP has conducted a standardization process for Packet

Switched Streaming (PSS) and Multimedia Messaging Services (MMS). Two bit rates

have been defined:

1. low-rate range up to 24 kbps, where the candidates are: AMR-WB+, HE-AAC+ /aacPlus) and Enhanced AAC+

2. high-rate range, with rates higher than 24 kbps. Here the candidates are: HE-AAC+ /aacPlus) and Enhanced AAC+

The comparison tests that 3GPP conducted for the selection audio coding standard

shows the following quality scalability for AMR-WB+ in a MUSHRA test. Figure 5

shows that the MUSHRA score for AMR-WB+ at 48 kbps is 83, which overrides the

EBU figures for aacPlus.


15/21

14

Figure 5. Quality scalability of AMR-WB+ based on a MUSHRA test (Bruhn, 2004).

The comparison tests between EAAC+ and AMR-WB+ that 3GPP conducted for the

selection of low rate-range audio coding showed that AMR-WB+ is a slightly better

codec for stereo at rates lower than 24 kbps which can be seen in figure 6. Both codecs

however represent edge coding technology giving the highest quality possible for mobile

devices today.

Figure 6. Comparison of AMR-WB+ and EAAC+ by 3GPP (Mkinen, J. et al., 2004).


16/21

15

4.4 Support for latest Codecs in Mobile PhonesCodecs as AAC and AMR-WB are already supported in mobile mid- and high-end

mobile phones so they can be used if the target consumer is in the mid- or high-endclassification as office mobile phone users generally are. The latest codecs using SBR

however, are yet not supported by mobile phones. This means that HE-AAC (aacPlus),

EAAC+ and AMR-WB+ cannot yet be used in mobile applications. It will however not

take too long before also these codecs are supported as they are approved by the 3GPP

and the hardware manufacturers already have implemented them in the products.

Nokia has also signed a aacPlus license agreement in July, 2004

(www.3G.co.uk,2004) which indicates that aacPlus will be available on Nokia mobile

phones soon.

Open codecs, as Ogg Vorbis, do not seem to be so successful on the mobile

commercial market. They are generally not supported, as a standard feature, but usersthat are interested in them can install the codec and a player. There is an Open Source

Player called OggPlay by Leif H. Wilden (2004) for the Symbian OS. This player

currently supports ogg-, mp3- and acc-files on Series 60 phones having Symbian OS 7

or later.

Windows Media has not yet succeeded to get the same position in mobile phone

market as it has in the PC-market. No other phones than Microsofts own brands include

a Windows Media Player. Real Media Players are however available for most mobile

phones and it has thus established a special position on the mobile market.

4.5 Existing and upcoming applications for high-quality audio in mobile devicesApplications for downloading of music contents to the mobile phone already exist.

There are at least two commercial players (MP3go and UltraMP3) that support the

playing of mp3-based music. These players also support the creating and usage of play

lists. The main disadvantage at the moment is the need of memory (3-5 MB/song) for

high quality stereo mp3-music. If HE-AAC or AMR-WB could be used, the size of a

song would be below 1 MB. This would allow low-end mobile phones to store more

music than today. The high-end mobile phones already have enough memory available

on MMCs. The downloading time for a song in the EGPRS-network would decrease

from five minutes to one. Normally songs are not downloaded over the network but

directly from a PC through cable, Bluetooth or IR.

The UK mobile phone company MMO2 (mm02, 2004) is launching a service for

downloading music to mobile phones in November. It uses a special music player called

O2 Digital Music Player. The music files will be encoded in the MPEG4, aacPlus

format and should be about a megabyte in size, MMO2 says. One song would take

roughly 90 seconds to download across a GPRS connection. The copy-protection

technology will be provided by the Swiss company Secure Digital Container (SDC).


17/21

16

Streaming applications for mobile phones already exist. Both music and video can be

enjoyed from the mobile phone. The Finnish Broadcasting Company YLE, as an

example, sends the news as 20-50 kbps streams for the GPRS-network. This speed was

selected to make the news available anywhere in Finland. The most common format

today is RealMedia but other formats will certainly be available in the near future. One

of the problems today is the quality of the content due to the low bandwidth in the

GPRS network. In the near future the quality of the content will be much better due toboth increased bandwidth and more efficient codecs which will largely improve

enjoyment.

The American Market Research Centre In-Stat (In-Stat, 2004) expects the American

streaming video market to start to grow in the next two years but not until 2009 it will

reach 15 % of the total wireless revenues which is not very encouraging. Another study

shows that 11 % of the mobile phones users today are very or extremely interested in

buying music over the mobile phone network.

5 CONCLUSIONSThe development of audio codecs for mobile phones have been very rapid in the past

few years. Enhancements of codecs have been released yearly and there always seems

to be new technologies that can be applied on the compression procedure. Such

technologies that changed the world of encoding are MP3, AAC and SBR. At this

moment the ultimate codecs for audio seem to be AMR-WB+ and HE-AAC (aacPlus)

depending on what kind of audio material is encoded. This is most likely not the last

step in codecs. New codecs will probably be introduced yearly even in the future.

Figure 7. Applications for Mobile Audio (Mkinen, J. et al., 2004).


18/21

17

The need for more efficient codecs will probably gradually decrease as the

telecommunication speeds will continue to grow even beyond 3G networks. According

to the telecommunication company Alcatels White Paper on Mobile Network Evolution

the expected communication speed for mobile phones will approach 1 Gbps in 2010

2015(See figure 8).

Figure 8. Evolution of mobile networks from 2G to B3G (Hurel, J-L et al., 2004).

On the other hand, new applications utilizing the possibilities will certainly be

introduced on the market. These products also act as a drive for the technology as theydemand more computing power, more memory, better graphics and better audio which

in turn demand more efficient telecommunication possibilities. As long as there is a

need for those applications and a willingness to pay the price utilizing them the

development process is secured. Severe limiting factors, that could stop the

development of mobile audio applications, seem to be hard to find.


19/21

18

REFERENCES

ARM Developers Guide 2004-2005. Convergence Promotions. Developers Guide

(Online) 2004. [Referenced 25.11.2004]. Available:

http://arm.convergencepromotions.com/catalog/m_home.htm

Bruhn, S. 2004. Bridging the gap between speech and audio coding - AMR-WB+ - The

codec for mobile audio. Ericsson Research, Multimedia Technologies. Available:

http://www.s3.kth.se/radio/COURSES/S3_SEMINAR_2E1380_2004/presentations/Er

icssonAudio-040506.pdf

Coding Technologies. aacPlus. Products and Technologies. Promotion Page (Online)

2004. [Referenced 25.11.2004]. Available:

http://www.codingtechnologies.com/products/aacPlus.htm

Compagnie Financire Alcatel. Mobile Networks. Solutions. Technology Overview

Page (online) 2004. [Referenced 25.11.2004]. Available:

http://www.alcatel.com/mobilenetworks/mobileinternet/

F. Henn, R. Bhm, S. Meltzer, Th. Ziegler, 2003, SPECTRAL BAND REPLICATION

(SBR) TECHNOLOGY AND ITS APPLICATION IN BROADCASTING

http://www.broadcastpapers.com/radio/ibc2003CodingSBR04.htm

Fraunhofer Institute for Integrated Circuits IIS. Audio & Multimedia. MPEG Audio

Layer-3. Technology Report (online) 2004. [Referenced 25.11.2004]. Available:

http://www.iis.fraunhofer.de/amm/techinf/layer3/

Frerichs, D. 2003. New MPEG-4 High-efficiency AAC Audio: Enabling new

applications. Coding Technologies. Available: http://www.telos-

systems.com/techtalk/hosted/m4-in-30100%20(M4IF_HE_AAC_paper).pdf

Hurel, J-L. Lerouge, C. Evci, C. & Gui L. 2004. Mobile Network Evolution: From 3G

Onwards. Technical White Paper. Compagnie Financire Alcatel. Available:

http://www.alcatel.com/doctypes/articlepaperlibrary/pdf/ATR2003Q4/T0312-

Mobile-Evolution-EN.pdf

In-Stat, American Market Research Centre.Mobile Consumer Data & Multimedia

Services. Information Service (Online) 2004. [Referenced 25.11.2004]. Available:

http://www.instat.com/catalog/Wcatalogue.asp?id=230


20/21

19

Kozamernik,F. 2003. EBU subjective listening tests on low-bitrate audio codecs. EBU

Listening Tests. Tech 3296. June 2003. Available:

http://www.ebu.ch/CMSimages/en/tec_doc_t3296_tcm6-10497.pdf?display=EN

mp3PRO Zone 2004. Coding Technologies. Developers Guide (Online) 2004.

[Referenced 25.11.2004]. Available: http://www.mp3prozone.com/

Microsoft Corporation. Windows Media Home. Technology Page (Online) 2004.

[Referenced 25.11.2004]. Available:

http://www.microsoft.com/windows/windowsmedia/default.aspx

mm02. O2 Digital Music Player. Cellular Phone Operator. Promotion Page (Online)

2004 . [Referenced 25.11.2004]. Available: http://www.o2.co.uk/o2-digital-music-

player.html

Mkinen, J. et al. 2004. AMR-WB+: A new audio coding standard for 3rd

generation

mobile audio services. Nokia Research Center. Finland, submitted to ICASSP 2005.

Figures available: http://www.tml.hut.fi/Opinnot/T-

111.550/Mobileaudioformats2004-10-26.pdf

RealNetworks Incorporated. Real Player Page. Technology Page (Online) 2004.

[Referenced 25.11.2004]. Available: http://www.real.com/player/?src=realaudio

Stoll, G. and Kozamernik .F 2000. EBU Listening Tests on Internet Audio Codecs.

EBU Technical Review June 2000. Available: http://www.ebu.ch/trev_283-kozamernik.pdfSymbian Ltd. Symbian OS Phones. Technology Promotion Page (Online) 2004.

[Referenced 25.11.2004]. Available: http://www.symbian.com/phones/index.html

The 3rd Generation Partnership Project; Technical Specification Group Services and

System Aspects; General Audio Codec audio processing functions; Enhanced

aacPlus General Audio Codec; General Description (Release 6), (2004)

http://www.3gpp.org/ftp/tsg_sa/TSG_SA/TSGS_24/Docs/PDF/SP-040428.pdf

Vilermo, M. 2004. Audio Codecs. AES/RTI- Audiopivt 2004 Conference, Helsinki,

25.26.5.2004. Audio Engineering Society Finnish Section. Available:

http://www.aes.fi/audiopaivat2004/vilermo.pdf


21/21

VoiceAge Corporation Licencing Service. AMR-WB+ FAQs . Technologies. Frequently

Asked Questions (2004). [Referenced 25.11.2004]. Available:

http://www.voiceage.com/amrsite/tech_wbplus_faqs.php

Wales, J. 2004. Wikipedia - The Free Encyclopedia. Wikipedia Foundation. Electronic

Encyclopedia (Online) 2004. [Referenced 25.11.2004]. Available:http://en.wikipedia.org/

Wilden, L. H. Ogg Vorbis Player for Symbian OS phones. Technology Page (Online)

2004. [Referenced 25.11.2004]. Available: http://symbianoggplay.sourceforge.net/

www.3G.co.uk. aacPlus 2.5 / 3G License with Nokia. News service for 3G. News

Service Site (online) July 2004. [Referenced 25.11.2004]. Available:

http://www.3g.co.uk/PR/July2004/8026.htm

Xiph.Org Foundation. The Ogg Vorbis CODEC project. Ogg. Developers Page

(Online) 2004. [Referenced 25.11.2004]. Available:http://www.xiph.org/ogg/vorbis/

Mobile Formats Explained

Documents