8/6/2019 Mobile Formats Explained
1/21
HELSINKI UNIVERSITY OF TECHNOLOGY 30.11.2004
Telecommunications Software and Multimedia Laboratory
T-111.550 Multimedia Seminar
Fall 2004: Mobile Multimedia Application Platforms
Mobile Audio
from MP3 to AAC and further
Henri Autti Johnny Bistrm
51194K 21548C
8/6/2019 Mobile Formats Explained
2/21
1
Mobile Audio from MP3 to AAC and further
Henri Autti and Johnny BistrmHUT, Telecommunications Software and Multimedia Laboratory
Abstract
The purpose of this paper is to evaluate the advanced audio codecs and
reflect over their suitability for mobile needs of today and tomorrow. The
historical development of different codecs for different purposes is
analyzed. The features of the most common codecs are discussed in parallelwith performance and other criteria. The capabilities of mobile devices and
the telecommunication possibilities now and in the future are also
considered in the analysis. Finally some comparisons of the codecs
performances are done. Some existing applications on mobile audio are
presented and ideas for audio applications in the future are discussed. Some
of the questions to be answered here are. Can one codec, that is superior for
the mobile world, be found or do we have to prepare for a wide diversity in
the future? Will the codecs continue to develop as rapidly as they have
done so far?
1 INTRODUCTIONA look at audio applications for the mobile world today reveals that the diversity in
implementations is wide. The solutions chosen for the representation of audio streams in
certain situations differ greatly and the number of codecs used for the purpose of
encoding and decoding audio data streams is large. It is not obvious which codec should
be used for what purpose. The selection of codec depends on several factors as on
content type of the audio material, the available communication speed and the quality
requirements of the listening situation. Other factors that might influence the selection
of codec are the standardization situation, the licensing policy and the competitors
choices in the market. During the last years the mp3 format has been a great success butit does not fit well into mobile devices. Lately more efficient codecs as AAC and AMR
have been presented and they have been refined for mobile audio purposes.
The purpose of this paper is to evaluate the most important audio codecs by
revealing the technical principles of the en- and decoding, the standardization situation
and the suitability of the codec in relation to technology available and the market needs.
The analysis also takes into consideration the development situation of mobile and
telecommunication hardware and software. Using technical literature and documented
8/6/2019 Mobile Formats Explained
3/21
2
listener testing combined with mobile manufacturer specifications and published white
papers we try to find out if there is one superior codec for mobile audio applications and
which codec it would be. To do that, an analysis of existing and future audio
applications in the market has to be done to clarify the needs and expectations on mobile
audio. Finally the result is reflected against the development trends of the mobile
technology and the persistency of the chosen solution is judged.
2 BACKGROUNDIn this chapter we take a look at the background of mobile audio formats. First we
present some basic facts about the development of audio codecs and the reasons for
developing them. Then we discuss some facts about the development of mobile devices
including phones, PDAs and Laptops. Later in this part we take a quick look at the
applications available today and the demands these applications pose
The short history of audio codecs dates back to the mid-1980s, in the Fraunhofer
Institut in Erlangen (Fraunhofer, 2004) , Germany, which first began working on a highquality, low bit-rate audio coding with the help of Dieter Seitzer, a professor at the
University of Erlangen. Their project was financed by the European Union as a part of
the market-oriented Eureka research program where it was commonly known as EU-
147. In Germany in 1989, Fraunhofer was granted a patent for mp3, which we are going
to discuss more thoroughly in the next chapter. A few years later it was submitted to the
International Standards Organization (ISO), and mp3 was introduced as a part of the
official MPEG-1 standard in 1992. It was in January 1995 that Fraunhofer applied
patent on mp3 in America as well and it was granted in November 1996. The
revolutionary thing created was, that using mp3-compression PC-users were for the first
time in history able to compress an ordinary music-CD to one tenth of its original size,
with only a small sacrifice in the sound quality - thus 12 hours of music could be storedon a recordable CD that on the other hand could be played by a mp3-CD-player or an
ordinary PC.
In the rapidly evolving world of mobile content development things have changed a
lot since those days. Nowadays ranging from small laptops through palmtops to phones,
these devices are more available, and high-speed wireless networks are getting better
day by day. At the same time speech and audio compression have advanced rapidly in
recent years spurred on by cost effective digital technology and diverse commercial
application. Wideband speech and high fidelity audio compression have also made great
progressions in recent years, accelerated by the commercial success of consumer and
professional digital audio products. Telephone speech, wideband speech and wideband
audio signals differ not only in bandwidth and dynamic range, but also in listenerexpectations of the offered quality. Using of wideband not only improves the
intelligibility and naturalness of speech, but also adds a feeling of transparent
communication and eases speaker recognition.
The commercial applications in the mobile content area of today are also developing
at a growing rate. Mobile device services contain, according to Ericsson (Bruhn, 2004)
streaming, messaging, downloading and broadcasting. Streaming scenarios include
news listening, monitoring of sports events, audio books, music listening, commercial
8/6/2019 Mobile Formats Explained
4/21
3
advertisements, access to information systems and interactive gaming. Broadcasting
scenarios are very close to screaming scenarios including web casting or Internet radio
broadcasting. They have become especially popular allowing listeners to "stream" audio
on their computers. Unlike downloaded audio files, streamed audio files are not stored
on the users hard drive, but are broadcasted like traditional radio through the users
audio player. Messaging scenarios are also similar to streaming, but with size
limitations, including business-to-person and person-to-person scenarios. Downloadscenarios include music, books and comics downloading over the network. Important
for all of these scenarios named above, is to be able to handle mixed content - covering,
music, speech, speech-between-music and speech-over-music.
The demands these applications pose today on audio codecs for mobile services
include the ability to cope with generic content, sufficient and consistent quality at
lowest rates, best quality at lowest rates, and high quality operation with relaxed bit rate
requirement. The new audio codecs also have to be optimized for low-resource devices
(low memory and computational resources) and have to be supportive for a variety of
operating systems, e.g. Symbian, WinCE, Palm OS5 and OS6. Developing and
standardizing the codecs is at the moment focusing on 3GPP, which is the body
standardizing GSM, evolved GSM UMTS and 3G. In the next chapter we are going to
introduce some of the most important audio standards and codecs, which play an
important role in the 3GPP.
3 AUDIO STANDARDS AND CODECSIn this chapter we describe what we consider the most important audio standards and
codecs at the moment. In the first part of this chapter, we are going to discuss the 3GPP
audio standard format families AAC and AMR, introducing the underlying technology.
First we present mp3 (predecessor to AAC), AAC, HE-AAC and EAAC+ and then thechallengers; AMR, AMR-WB and AMR-WB+. In the second part we are going to
discuss an open source codec Vorbis Ogg ACM and some of the most important non-
standard audio formats, using the streaming technology, Windows Media Audio and
RealMedia. For terminology, architecture and technology see Wales (2004) and ARM
Developers Guide (2004).
3.1 MPEG-1 (mp3)Mp3 stands for MPEG-1 Audio Layer III. It is not a separate format, but a part of the
MPEG-1 video encoding format, as described earlier. Mp3 is a lossy data compression
method (meaning that compressing a file and then decompressing it retrieves a file thatmay well be different to the original, but is "close enough") to store good quality audio
into small files by using psychoacoustics in order to get rid of the data from the audio
that most humans can't hear. Mp3's bit rates vary from 8 kbps to 320 kbps. When the
mp3 phenomenon began in 1996, most of the audio files were encoded using 128 kbps
bit rate, which is still the most popular bit rate in the world - although most of the
people agree that by using slightly higher bit rates, like 192 kbps or 256 kbps, the audio
quality can be compared with the CD quality.
8/6/2019 Mobile Formats Explained
5/21
4
The problem with mp3s takes place at lower bit rates (64 kbps and below), because
the sound starts lacking the high frequency components. The reason is that mp3 at these
bit rates runs out of bits to compress the music in full audio bandwidth and with
significant detail. Mp3PRO was created to solve the problem of limited bandwidth mp3
files. To improve the sound quality of mp3 at lower bit rates, an enhancement
technology that gives back the sound the high frequency components has been
developed. The technology is called "Spectral Band Replication" (SBR). SBR is a veryefficient method to generate the high frequency components of an audio signal. The
resulting audio format is composed out of two components, the mp3 part for the low
frequencies and the SBR or "PRO" part for the high frequencies. The first part analyses
the low frequency band information and encodes it into a normal mp3 stream. This
enables the encoder to concentrate on less information and allows it to do a better job of
encoding. This also maintains complete compatibility to existing mp3 players. The
second part analyses the high frequency band information and encodes it into a part of
the mp3 stream that is normally ignored by existing mp3 decoders. Detailed information
can be found at mp3PRO Zone (2004).
3.2 MPEG-2 AACAAC (Advanced Audio Coding), also known as MPEG-2 AAC, is a lossy data
compression scheme intended for audio streams. AAC was designed to replace mp3. It
is part of the MPEG-2 standard introduced in 1994 and developed by the MPEG group
that includes Dolby, Fraunhofer (FhG), AT&T, Sony, and Nokia - companies that have
also been involved in the development of audio codecs such as mp3 and AC3 (also
known as Dolby Digital). Unlike older MPEG audio encoding methods, MPEG-2 AAC
is not backwards compatible to older MPEG audio formats. For example, mp3 is
backwards compatible to mp2.
The function of AAC is based on a wideband audio coding algorithm that exploitstwo primary coding strategies to dramatically reduce the amount of data needed to
convey high-quality digital audio. First, the signal components that are "perceptually
irrelevant" and can be discarded without a perceived loss of audio quality are removed.
Next, redundancies in the coded audio signal are eliminated. Efficient audio
compression is achieved by a variety of perceptual audio coding and data compression
tools.
When compared side-by-side with its predecessor, mp3, AAC is proving itself
worthy of replacing mp3 as the new Internet audio standard. It has improved
compression, which provides higher-quality results with smaller file sizes. It has support
for multi-channel audio, providing up to 48 full frequency channels, higher resolution
audio, yielding sampling rates from 8 up to 96 kHz and improved decoding efficiency,requiring less processing power for the decoding. These result in higher quality output at
lower data rates, allowing even modem users to hear a difference. It also enables the
listener to get a better and more stable quality than mp3 at equivalent or slightly lower
bit rates. Depending on the AAC profile and the mp3 encoder, 96 kbps AAC can give
nearly the same or better perceptional quality as 128 kbps mp3.
8/6/2019 Mobile Formats Explained
6/21
5
3.3 MPEG-4 HE-AACMPEG-4 High Efficiency AAC is the combination of MPEG-2 AAC and the SBR
Bandwidth Extension amendment that is based on SBR (Spectral Band Replication)
technology. HE-AAC is not a replacement for AAC, but rather a superset, that extends
the reach of high-quality MPEG-4 audio to much lower bit rates (as low as 32 kbps).
HE-AAC is able to achieve superior audio quality without losing treble sound or thecollapsing of the stereo image.
HE-AAC decoders will decode plain AAC and the enhanced AAC plus SBR. The
result is a backward compatible extension of the standard that nearly doubles the
efficiency of MPEG-4 audio. As discussed before, SBR is a unique bandwidth extension
technique, that doesnt replace the core codec, but operates in conjunction with it to
create a more efficient superset, that can cut the required bit rate in half. Present in both
the encoding and the decoding process, SBR leverages the correlation between the low
and the high frequencies in an audio signal to describe the high-end of the signal using
only a very small amount of data. This SBR data describing the high-frequencies is
coupled with the low-frequency compressed data from the AAC codec. Once combined,
the complete HE-AAC bit stream contains enough data to recreate the original signal.(See figure 1.)
For example, to create 48 kbps stereo HE-AAC, the encoder generates two signals:
an MPEG AAC signal at about 42 kbps and a SBR signal at about 6 kbps. The SBR
signal is then placed into the MPEG AAC auxiliary fields as defined in MPEG-4 and
sent out as a complete 48 kbps MPEG-4 HE-AAC bit stream.
Figure 1. The encoding and decoding process of HE-AAC.
Because the SBR data is placed within the AAC auxiliary fields, the enhanced signal
will be accepted by both an existing AAC and a new HE-AAC decoder. If sent to an
AAC decoder, only the low-frequency audio signal will be recognized and decoded. Ifsent to an HE-AAC decoder, the SBR and the AAC will be decoded to recreate the full
frequency signal. This technique makes the new profile forward compatible with AAC.
Because the HE-AAC decoder contains a full-fledged AAC decoder, it is also able to
decode both the Plain AAC and HE-AAC MPEG-4 Audio profiles. This
combination makes HE-AAC backward compatible with AAC.As a result, HE-AAC delivers cd-quality stereo at 48 kbps and 5.1 channel surround
sound at 128 kbps. This level of efficiency is ideal for Internet content delivery and
8/6/2019 Mobile Formats Explained
7/21
6
fundamentally enables new applications in the markets of mobile and digital
broadcasting. However HE-AAC is not good enough for two-way communications, due
to its very high delay according to Frerichs (2003).
3.4 EAAC+Enhanced AAC+ was introduced in 3GPP release-6 standard in 2004. It has an
optimal operating range from 18 kbps and higher. According to 3rd Generation
Partnership Project, enhanced AAC+ general audio codec consist of MPEG-4 AAC,
MPEG-4 SBR and MPEG-4 Parametric Stereo. The AAC is a general audio codec, SBR
is a bandwidth extension technique offering substantial coding gain in combination with
AAC, and Parametric Stereo enables stereo coding at very low bit rates. According to
IBC 2003 Conference Papers, the basic principle behind the parametric stereo is similar
to the SBR principle - a guided reconstruction of a stereo signal based on a transmitted
mono signal. In addition to a coded mono mixdown of the stereo input signal,
parameters describing the stereo image are transmitted. The stereo parameters require a
small fraction of the total bit rate, ensuring a high quality of the mono signal at the given
bit rate. Two parameters are used to describe the stereo information, a panoramaparameter and an ambience parameter. The panorama parameter contains information
about the left to right level differences within different frequency bands. Similarly, the
ambience parameter depicts the stereo ambience for a set of frequency bands. The
encoding of both parameters uses the same principle of entropy coding of time- or
frequency-direction differences as is used for the SBR envelopes. In addition, the
quantization steps are frequency dependent.
Also in addition to the older codecs, there are 3 additional tools included in the
Enhanced AAC+ decoder. Error concealment tools for AAC, SBR, and PS make the
decoder robust against transmission errors like frame loss. These tools mitigate audible
effects of such errors. The stereo-to-mono down mix tool enables a decoder onlycapable of mono output to down mix a stereo bit stream. For the AAC part this is done
in the time domain after the stereo decoding but for SBR this is done on the SBR
parameters and thus saving complexity since only a mono decoding of SBR is needed.
The Spline resampler tool gives the possibility to resample the output to a sampling
frequency different than what was supplied in the bit stream. This gives for example
handsets with a D/A converter only capable of 16 kHz sampling frequency the
possibility to play bit streams encoded with 22.05 kHz sampling frequency.
Figure 2 shows a block diagram of the EAAC+ encoder. The encoder basically
consists of the AAC waveform encoder, the SBR high frequency reconstruction
encoding tool and the PS encoding tool. The encoder operates in a dual rate mode,
whereas the SBR encoder operates at the encoding sampling ratefsenc as delivered fromthe IIR resampler and the AAC encoder at half of this sampling rate fsenc/2.
Consequently a 2:1 down sampler is present at the input to the AAC encoder. The PS
tool is used for low bit rate stereo coding, i.e. up to and including a bit rate of 32 kbps.
The AAC encoder implementation complies with the AAC Low Complexity Object
Type and is a highly optimized low-resource implementation, requiring only little
computational complexity and memory resources. This is basically achieved by mapping
the psychoacoustic based threshold estimation directly to scale factor amplification
values to shape the encoding quantization noise according to the input signal
8/6/2019 Mobile Formats Explained
8/21
7
characteristics, rather than employing time-consuming iterative analysis-by-synthesis
methods.
The SBR encoder consists of a QMF (Quadrature Mirror Filter) analysis filter bank,
which is used to derive the spectral envelope of the original input signal. Furthermore
the SBR related modules control the selection of an input signal adaptive grid
partitioning of the QMF samples on the time axis (i.e. control the framing), analyze ofthe relation of noise floor to tonal components in the high band, collect guidance
information for the transposition process in the decoder and detect missing harmonic
components which could not be reconstructed by pure transposition. This gathered
information about the characteristics of the input signal, together with the spectral
envelope data forms the SBR stream. The amount of bits for the SBR stream is
subtracted from the bits available to the AAC encoder in order to achieve a constant bit
rate encoding of the multiplexed EAAC+ stream.
The Parametric Stereo encoding tool in the EAAC+ encoder estimates parameters
characterizing the perceived stereo image of the input signal. These stereo parameters
are embedded in the SBR stream. At the same time, a signal adaptive mono down mix
of the input signal is generated in the QMF domain and fed into the SBR encoderoperating in mono. This down mix is also processed by a down sampled QMF synthesis
filter bank to obtain the time domain input signal for the AAC core encoder with the
sampling ratefsenc/2. In this case, the 2:1 IIR down sampler is not active.
Figure 2. 3rd Generation Partnership Project; EAAC+ Encoder overview
In the decoder ( figure 3) the bit stream is de-multiplexed into the AAC and the SBRstream. Error concealment, e.g. in case of frame loss, is achieved by designated
algorithms in the decoder for AAC, SBR and PS: the AAC core decoder employs signal-
adaptive spectrally shaped noise generation for error concealment, in the SBR and PS
decoders, error concealment is based on extrapolation of guidance, envelope, and stereo
information.
For the SBR processing, a Low-Power tool of SBR is used for full stereo decoding in
order to keep the peak computational complexity as low as possible over all channel
8/6/2019 Mobile Formats Explained
9/21
8
modes. Usage of the SBR Low-Power tool provides a computational complexity of an
HE-AAC stereo decoder in the same range as plain AAC stereo decoders. The low band
AAC time domain signal, sampled atfsenc/2, is first fed to a 32-channel QMF analysis
filter bank. The QMF low band samples are then used to generate a high band signal,
whereas the transmitted transposition guidance information is used to best match the
original input signal characteristics.
The transposed high band signal is then adjusted according to the transmitted spectralenvelope signal to best match the originals spectral envelope. Missing components that
could not be reconstructed by the transposition process are also introduced. Finally, the
low band and the reconstructed high band are combined to obtain the complete output
signal in the QMF domain.
In case of a stream using parametric stereo, the mono output signal from the underlying
HE-AAC+ decoder is converted into a stereo signal. This processing is carried out in the
QMF domain and is controlled by the parametric stereo parameters embedded in the
SBR stream.
Figure 3. 3rd Generation Partnership Project; EAAC+ Decoder overview
3.5 AMRThe AMR (Adaptive Multi-Rate) standard was introduced in 1998. Its main function
is mobile baseline speech. It operates at variable mono bit rates in the range of 4.75 to12.2 kbps in its narrowband (bandwidth 3.5 kHz) configuration. It was adapted by the
3GPP as the mandatory codec for 3G wireless systems based on the evolved GSM core
network (WCDMA, EDGE, GPRS).
The philosophy behind AMR is to lower the codec rate as the interference increases
and thus enabling more error correction to be applied. The AMR codec is also used to
harmonize the codec standards amongst different cellular systems. This is based on
technology called ACELP (Algebraic Code Excited Linear Prediction). ACELP is a
8/6/2019 Mobile Formats Explained
10/21
9
speech compression system, used to provide a good standard of speech quality when the
network is operating at low data rates (narrow bandwidth). The analogue voice signal is
converted to a digital data signal, so that it can be compressed for transmission over the
network, and the process is then reversed at the other end when the digital data is
converted back to an analogue voice signal. The quality of the reproduced speech will
appear to be much better at the receiving phone than without the ACELP system.
3.6 AMR-WBAMR-WB (wideband extension) is a speech coding standard developed after the
AMR using the same technology as ACELP. The AMR Wideband codec was
standardized by ETSI/3GPP in December 2000, and selected and approved by the ITU-T
in July 2001 and January 2002, respectively. The ITU-T standard is referred to as
G.722.2.
The codec provides excellent speech quality due to its wider speech bandwidth of 50
- 7000 Hz, significantly improving the intelligibility and naturalness of speech and
adding a feeling of face-to-face communication. The AMR-WB speech codec consistsof nine speech codec modes with mono bit rates of 23.85, 23.05, 19.85, 18.25, 15.85,
14.25, 12.65, 8.85 and 6.6 kbps. The lowest bit rate providing excellent speech quality
in clean environment is 12.65 kbps. Higher bit rates are useful in background noise
conditions and in case of music. Also lower bit rates of 6.60 and 8.85 provide
reasonable quality especially if compared to narrowband codecs. Background noise
mode is designed to be used in discontinuous transmission (DTX) operation in GSM
and as a low bit-rate source dependent mode for coding background noise in other
systems.
AMR-WB can also carry narrowband signals. It eliminates the need for transcoding
and eases the implementation of wideband applications and services across a wide rangeof wireless and wire line communication systems and platforms. AMR-WB is already
standardized for future usage in networks such as UMTS. There it provides so much
higher speech quality that it seems probable that also older networks will have to
gradually be transformed to support wideband.
3.7 AMR-WB+Adopted as an audio codec standard in September 2004 by ETSI/3GPP, AMR-WB+
is an audio extension of AMR-WB, which utilizes a hybrid of two technologies: ACELP
and TCX (Transform Coded Excitation) to deliver very high sound quality for both
speech and audio content types, including music, voice-between-music, and voice-over-music.
AMR-WB+ adds support for stereo signals and higher sampling rates. Also, high-
efficiency parametric stereo (HE-PS), as discussed under EAAC+, provides high-fidelity
stereo image reproduction at the lowest bit rates. Another main improvement is the use
of transform coding additionally to ACELP. This greatly improves the generic audio
coding. Automatic switching between transform coding and ACELP provides both very
good speech and other audio quality with moderate bit rates. Sound quality is not
8/6/2019 Mobile Formats Explained
11/21
10
compromised even in networks where the bandwidth is limited.
The AMR-WB+ codec has a wide bit-rate range, from 6 to 48 kbps. Mono rates are
scalable from 6 to 36 kbps, and stereo rates are scalable from 8 to 48 kbps, reproducing
bandwidth up to 24 kHz (approaching CD quality). Moreover, it provides backward
compatibility with AMR wideband. AMR-WB+ brings speech and music to mobile
phones (VoiceAge, 2004).
3.8 Vorbis Ogg ACMDue to numerous patenting and licensing issues with various parts of the MPEG
specifications, there has been a significant movement to create and popularize audio
formats and/or algorithms which lack that significant problem. The most popular of
these is probably Ogg Vorbis, which is a completely open and free codec project from
Xiph.org Foundation (2004).
Vorbis was started as a result of a plan to charge licensing fees for the mp3 format,
which was announced in September 1998. The first version 1.0 of the codec wasreleased on July 19, 2002. The latest version is 1.1.0 released on September 22, 2004.
The Ogg Vorbis format has proved popular among open source communities; they argue
that its higher fidelity and completely free nature make it a natural replacement for the
entrenched mp3 format. In the commercial sector, Vorbis has already had success with
many newer video game titles employing Vorbis as opposed to mp3.
Given 44.1 kHz as the standard CD audio sample frequency stereo input, the current
encoder will produce output 45 - 500 kbps, depending on the specified quality setting.
Though Vorbis 1.0.1 is tuned for bit rates of 16 - 128 Kbps/channel, it is still possible to
encode arbitrary bit rates chosen by the user. Such figures are only approximate,
however, as Vorbis is inherently variable-bit rate.
Vorbis uses the modified discrete cosine transform (MDCT) for converting sound
data from the time domain to the frequency domain. The resulting frequency-domain
data is broken into noise floor and residue components, and then quantized and entropy
coded using a codebook-based vector quantization algorithm. The decompression
algorithm reverses these stages.
3.9 Windows MediaWindows Media Audio (WMA) is a proprietary compressed audio file format used
by Microsoft. It has a large user base through Windows. It was initially a competitor tothe mp3 format, but with the introduction of Apples iTunes Music Store, it has
positioned itself as a competitor to the AAC format used by Apple. It is part of the
Microsoft Corporation (2004) Windows Media framework. An initial reason for the
development of WMA might have been that mp3 technology is patented and has to be
licensed from Thomson, which controls licensing of the mp3 patents in many countries
including the United States of America, for inclusion in the Microsoft Windows
operating system. It includes multi-channel-coding.
8/6/2019 Mobile Formats Explained
12/21
11
With the publishing of Windows Media Audio 9, the codec was updated to WMA. It
is considered to reach close to AAC in quality. Pro and a new lossless codec has been
introduced to accompany the existing lossy codec. Support for variable bit rates has also
been introduced. WMA Pro has not been reverse engineered yet.
Microsoft's Windows Media Audio (WMA) file format, which they claim is a higher
quality audio format at smaller file sizes, is starting to gain more acceptance as it comesbundled as the standard audio format in Windows 98/2000/XP. Microsoft might be able
to challenge the dominance of MP3s or at the very least offer a second, popular audio
format choice.
3.10 Real MediaRealAudio is a proprietary audio codec developed by RealNetwork. It is especially
designed to conform to low bandwidths, and it can be used as a streaming audio format.
As a matter of fact, it was one of the first to offer streamed audio software in the world.
For high bit rates, Real Media uses AAC. Many radio stations use RealAudio to stream
their programming over the internet in real time. The first version of RealAudio wasreleased in 1995. The current version of the codec, RealAudio 10 was published in
2004. It includes multichannel-coding (RealNetworks Incorporated, 2004).
4 DEVELOPMENT TRENDS AND COMPARISON OF CODECS FOR THEAPPLICATIONS OF TOMORROW
The hardware of the mobile platforms is going through a rapid development and thus
new software and applications can be expected in mobile devices of tomorrow. The
capacity of the central processing unit grows and more memory is already available at a
lower price. This chapter should extrapolate what will happen to the devices in the nearfuture. The wireless communication channels are also going through a development
which leads to faster transmission to the mobile devices. Is there any need for such an
effective compression the HEAAC offers or will it go to history while the limitations
of today disappear?
4.1 The Features of new Mobile PhonesThe main target of this hardware study is the mobile phone, as the number of mobile
phones is much larger than the number of PDAs. The mobile phone is also a good low-
end platform representative for mobile devices as one of the main requirements for a
phone always is its size and weight. According to Symbian Ltd. (Symbian, 2004), the
leading manufacturer of 3G Operating Systems for Mobile Phones, the latest Symbian
operating system, OS 8, is already used on Series 60, Platform 2.0 based 3G phones as
the Nokia 6630 which give them a wide support for audio codecs as NB-AMR, WB-
AMR, MP3, AAC and RealAudio. As the phone has 10 MB of internal dynamic and 64
MB on a MMC, it offers fairly good possibilities for audio and video applications in the
mid-price range. More expensive phones as Nokia 7710, in series 90, with Symbian OS
7, support the same audio codecs even in stereo. The Nokia 7710 has 90 MB of RAM
8/6/2019 Mobile Formats Explained
13/21
12
and can handle a MMC on 512 MB which makes it an excellent choice for audio and
multimedia applications. The same applies to the Nokia Communicator 9500. Thus the
hardware limits for audio have been eliminated in the mid- and high-end mobile phones.
In the low-end mobile phones there are still some relevant hardware restrictions
considering the use of audio, mainly because of the low price requirement, but they will
disappear in the near future.
4.2 The Telecommunication Features of the Mobile NetworksThe basic GPRS (General Radio Packet Services) network still used in many mobile
phones support communication speeds of 30-50 kbps. The EDGE (Enhanced Data rates
for GSM Evolution) or EGPRS technology increases the speed for the end-user to rates
of 120-150 kbps and even a bit higher. EGPRS is available in most mid-end and even
some low-end phones so it can be considered as the standard today. EGPRS is however
available only in urban and suburban areas today. The UMTS (Universal Mobile
Telecommunication System) offers data speeds from 384 kbps (TDD Mode) to 2 Mbps
(TDD Mode) (Compagnie Financire Alcatel, 2004), which removes some of thelimitations.
So far UMTS is available only in high-end mobile phones and only in urban areas.
Finland will not be covered by UMTS networks in the near future, which means that
EGPRS still will be the fastest alternative for a large group of phone users here. The
speed of the EGPRS is however enough for streaming music audio applications if the
latest codecs are used.
4.3 Comparison of Mobile Audio CodecsThere are many methods to compare the quality of audio streams. One method is to
use an audience to judge the quality. A test used by the European Broadcasting Union
(EBU) called MUSHRA is often used as a reference. MUSHRA stands for MUlti
Stimulus test with Hidden Reference and Anchors and is an advanced testing method
developed and proposed by the EBU Project Group B/AIM. The method has been
submitted to ITU for standardization.
MUSHRA (Stoll and Kozamernik, 2000) is a subjective test where listeners in
different EMU-countries compare different types of audio to a reference signal and
grade it according to a scale from 0 to 100, where the interval 81-100 is considered
excellent, 61-80 is considered good, 41-60 is considered fair, 21-40 is consideredpoor and 0-20 is considered bad. Different types of music such as classical, folk, jazz
and pop music is tested. Broadcasting programs, both in a studio and a live
environment, with female and male voices, are also tested.
According to these listener tests, performed by EBU, only a little difference can be
heard between stereo cd-quality and HE-AAC compression at 48 kbps. The test results
are described by Kozamernik (2003). This is also illustrated in figure 4, which shows
that aacPlus, also called HE-AAC gets the highest MUSHRA index of 80 compared to
8/6/2019 Mobile Formats Explained
14/21
13
mp3PRO which gets the index 76. At the rate 48 kbps the more well-known RealMedia
Real 8, mp3 and MS Windows Media 8 codecs get much lower ratings. EBU has not
reported MUSHRA testing of the AMR-WB+ codec yet.
Figure 4. European Broadcasting Union MUSHRA testing at 48 kbps stereo (Coding
Technologies, 2004).
The 3rd Generation Partnership Project (3GPP) is a collaboration agreement that was
established in December 1998. 3GPP has conducted a standardization process for Packet
Switched Streaming (PSS) and Multimedia Messaging Services (MMS). Two bit rates
have been defined:
1. low-rate range up to 24 kbps, where the candidates are: AMR-WB+, HE-AAC+ /aacPlus) and Enhanced AAC+
2. high-rate range, with rates higher than 24 kbps. Here the candidates are: HE-AAC+ /aacPlus) and Enhanced AAC+
The comparison tests that 3GPP conducted for the selection audio coding standard
shows the following quality scalability for AMR-WB+ in a MUSHRA test. Figure 5
shows that the MUSHRA score for AMR-WB+ at 48 kbps is 83, which overrides the
EBU figures for aacPlus.
8/6/2019 Mobile Formats Explained
15/21
14
Figure 5. Quality scalability of AMR-WB+ based on a MUSHRA test (Bruhn, 2004).
The comparison tests between EAAC+ and AMR-WB+ that 3GPP conducted for the
selection of low rate-range audio coding showed that AMR-WB+ is a slightly better
codec for stereo at rates lower than 24 kbps which can be seen in figure 6. Both codecs
however represent edge coding technology giving the highest quality possible for mobile
devices today.
Figure 6. Comparison of AMR-WB+ and EAAC+ by 3GPP (Mkinen, J. et al., 2004).
8/6/2019 Mobile Formats Explained
16/21
15
4.4 Support for latest Codecs in Mobile PhonesCodecs as AAC and AMR-WB are already supported in mobile mid- and high-end
mobile phones so they can be used if the target consumer is in the mid- or high-endclassification as office mobile phone users generally are. The latest codecs using SBR
however, are yet not supported by mobile phones. This means that HE-AAC (aacPlus),
EAAC+ and AMR-WB+ cannot yet be used in mobile applications. It will however not
take too long before also these codecs are supported as they are approved by the 3GPP
and the hardware manufacturers already have implemented them in the products.
Nokia has also signed a aacPlus license agreement in July, 2004
(www.3G.co.uk,2004) which indicates that aacPlus will be available on Nokia mobile
phones soon.
Open codecs, as Ogg Vorbis, do not seem to be so successful on the mobile
commercial market. They are generally not supported, as a standard feature, but usersthat are interested in them can install the codec and a player. There is an Open Source
Player called OggPlay by Leif H. Wilden (2004) for the Symbian OS. This player
currently supports ogg-, mp3- and acc-files on Series 60 phones having Symbian OS 7
or later.
Windows Media has not yet succeeded to get the same position in mobile phone
market as it has in the PC-market. No other phones than Microsofts own brands include
a Windows Media Player. Real Media Players are however available for most mobile
phones and it has thus established a special position on the mobile market.
4.5 Existing and upcoming applications for high-quality audio in mobile devicesApplications for downloading of music contents to the mobile phone already exist.
There are at least two commercial players (MP3go and UltraMP3) that support the
playing of mp3-based music. These players also support the creating and usage of play
lists. The main disadvantage at the moment is the need of memory (3-5 MB/song) for
high quality stereo mp3-music. If HE-AAC or AMR-WB could be used, the size of a
song would be below 1 MB. This would allow low-end mobile phones to store more
music than today. The high-end mobile phones already have enough memory available
on MMCs. The downloading time for a song in the EGPRS-network would decrease
from five minutes to one. Normally songs are not downloaded over the network but
directly from a PC through cable, Bluetooth or IR.
The UK mobile phone company MMO2 (mm02, 2004) is launching a service for
downloading music to mobile phones in November. It uses a special music player called
O2 Digital Music Player. The music files will be encoded in the MPEG4, aacPlus
format and should be about a megabyte in size, MMO2 says. One song would take
roughly 90 seconds to download across a GPRS connection. The copy-protection
technology will be provided by the Swiss company Secure Digital Container (SDC).
8/6/2019 Mobile Formats Explained
17/21
16
Streaming applications for mobile phones already exist. Both music and video can be
enjoyed from the mobile phone. The Finnish Broadcasting Company YLE, as an
example, sends the news as 20-50 kbps streams for the GPRS-network. This speed was
selected to make the news available anywhere in Finland. The most common format
today is RealMedia but other formats will certainly be available in the near future. One
of the problems today is the quality of the content due to the low bandwidth in the
GPRS network. In the near future the quality of the content will be much better due toboth increased bandwidth and more efficient codecs which will largely improve
enjoyment.
The American Market Research Centre In-Stat (In-Stat, 2004) expects the American
streaming video market to start to grow in the next two years but not until 2009 it will
reach 15 % of the total wireless revenues which is not very encouraging. Another study
shows that 11 % of the mobile phones users today are very or extremely interested in
buying music over the mobile phone network.
5 CONCLUSIONSThe development of audio codecs for mobile phones have been very rapid in the past
few years. Enhancements of codecs have been released yearly and there always seems
to be new technologies that can be applied on the compression procedure. Such
technologies that changed the world of encoding are MP3, AAC and SBR. At this
moment the ultimate codecs for audio seem to be AMR-WB+ and HE-AAC (aacPlus)
depending on what kind of audio material is encoded. This is most likely not the last
step in codecs. New codecs will probably be introduced yearly even in the future.
Figure 7. Applications for Mobile Audio (Mkinen, J. et al., 2004).
8/6/2019 Mobile Formats Explained
18/21
17
The need for more efficient codecs will probably gradually decrease as the
telecommunication speeds will continue to grow even beyond 3G networks. According
to the telecommunication company Alcatels White Paper on Mobile Network Evolution
the expected communication speed for mobile phones will approach 1 Gbps in 2010
2015(See figure 8).
Figure 8. Evolution of mobile networks from 2G to B3G (Hurel, J-L et al., 2004).
On the other hand, new applications utilizing the possibilities will certainly be
introduced on the market. These products also act as a drive for the technology as theydemand more computing power, more memory, better graphics and better audio which
in turn demand more efficient telecommunication possibilities. As long as there is a
need for those applications and a willingness to pay the price utilizing them the
development process is secured. Severe limiting factors, that could stop the
development of mobile audio applications, seem to be hard to find.
8/6/2019 Mobile Formats Explained
19/21
18
REFERENCES
ARM Developers Guide 2004-2005. Convergence Promotions. Developers Guide
(Online) 2004. [Referenced 25.11.2004]. Available:
http://arm.convergencepromotions.com/catalog/m_home.htm
Bruhn, S. 2004. Bridging the gap between speech and audio coding - AMR-WB+ - The
codec for mobile audio. Ericsson Research, Multimedia Technologies. Available:
http://www.s3.kth.se/radio/COURSES/S3_SEMINAR_2E1380_2004/presentations/Er
icssonAudio-040506.pdf
Coding Technologies. aacPlus. Products and Technologies. Promotion Page (Online)
2004. [Referenced 25.11.2004]. Available:
http://www.codingtechnologies.com/products/aacPlus.htm
Compagnie Financire Alcatel. Mobile Networks. Solutions. Technology Overview
Page (online) 2004. [Referenced 25.11.2004]. Available:
http://www.alcatel.com/mobilenetworks/mobileinternet/
F. Henn, R. Bhm, S. Meltzer, Th. Ziegler, 2003, SPECTRAL BAND REPLICATION
(SBR) TECHNOLOGY AND ITS APPLICATION IN BROADCASTING
http://www.broadcastpapers.com/radio/ibc2003CodingSBR04.htm
Fraunhofer Institute for Integrated Circuits IIS. Audio & Multimedia. MPEG Audio
Layer-3. Technology Report (online) 2004. [Referenced 25.11.2004]. Available:
http://www.iis.fraunhofer.de/amm/techinf/layer3/
Frerichs, D. 2003. New MPEG-4 High-efficiency AAC Audio: Enabling new
applications. Coding Technologies. Available: http://www.telos-
systems.com/techtalk/hosted/m4-in-30100%20(M4IF_HE_AAC_paper).pdf
Hurel, J-L. Lerouge, C. Evci, C. & Gui L. 2004. Mobile Network Evolution: From 3G
Onwards. Technical White Paper. Compagnie Financire Alcatel. Available:
http://www.alcatel.com/doctypes/articlepaperlibrary/pdf/ATR2003Q4/T0312-
Mobile-Evolution-EN.pdf
In-Stat, American Market Research Centre.Mobile Consumer Data & Multimedia
Services. Information Service (Online) 2004. [Referenced 25.11.2004]. Available:
http://www.instat.com/catalog/Wcatalogue.asp?id=230
8/6/2019 Mobile Formats Explained
20/21
19
Kozamernik,F. 2003. EBU subjective listening tests on low-bitrate audio codecs. EBU
Listening Tests. Tech 3296. June 2003. Available:
http://www.ebu.ch/CMSimages/en/tec_doc_t3296_tcm6-10497.pdf?display=EN
mp3PRO Zone 2004. Coding Technologies. Developers Guide (Online) 2004.
[Referenced 25.11.2004]. Available: http://www.mp3prozone.com/
Microsoft Corporation. Windows Media Home. Technology Page (Online) 2004.
[Referenced 25.11.2004]. Available:
http://www.microsoft.com/windows/windowsmedia/default.aspx
mm02. O2 Digital Music Player. Cellular Phone Operator. Promotion Page (Online)
2004 . [Referenced 25.11.2004]. Available: http://www.o2.co.uk/o2-digital-music-
player.html
Mkinen, J. et al. 2004. AMR-WB+: A new audio coding standard for 3rd
generation
mobile audio services. Nokia Research Center. Finland, submitted to ICASSP 2005.
Figures available: http://www.tml.hut.fi/Opinnot/T-
111.550/Mobileaudioformats2004-10-26.pdf
RealNetworks Incorporated. Real Player Page. Technology Page (Online) 2004.
[Referenced 25.11.2004]. Available: http://www.real.com/player/?src=realaudio
Stoll, G. and Kozamernik .F 2000. EBU Listening Tests on Internet Audio Codecs.
EBU Technical Review June 2000. Available: http://www.ebu.ch/trev_283-kozamernik.pdfSymbian Ltd. Symbian OS Phones. Technology Promotion Page (Online) 2004.
[Referenced 25.11.2004]. Available: http://www.symbian.com/phones/index.html
The 3rd Generation Partnership Project; Technical Specification Group Services and
System Aspects; General Audio Codec audio processing functions; Enhanced
aacPlus General Audio Codec; General Description (Release 6), (2004)
http://www.3gpp.org/ftp/tsg_sa/TSG_SA/TSGS_24/Docs/PDF/SP-040428.pdf
Vilermo, M. 2004. Audio Codecs. AES/RTI- Audiopivt 2004 Conference, Helsinki,
25.26.5.2004. Audio Engineering Society Finnish Section. Available:
http://www.aes.fi/audiopaivat2004/vilermo.pdf
8/6/2019 Mobile Formats Explained
21/21
VoiceAge Corporation Licencing Service. AMR-WB+ FAQs . Technologies. Frequently
Asked Questions (2004). [Referenced 25.11.2004]. Available:
http://www.voiceage.com/amrsite/tech_wbplus_faqs.php
Wales, J. 2004. Wikipedia - The Free Encyclopedia. Wikipedia Foundation. Electronic
Encyclopedia (Online) 2004. [Referenced 25.11.2004]. Available:http://en.wikipedia.org/
Wilden, L. H. Ogg Vorbis Player for Symbian OS phones. Technology Page (Online)
2004. [Referenced 25.11.2004]. Available: http://symbianoggplay.sourceforge.net/
www.3G.co.uk. aacPlus 2.5 / 3G License with Nokia. News service for 3G. News
Service Site (online) July 2004. [Referenced 25.11.2004]. Available:
http://www.3g.co.uk/PR/July2004/8026.htm
Xiph.Org Foundation. The Ogg Vorbis CODEC project. Ogg. Developers Page
(Online) 2004. [Referenced 25.11.2004]. Available:http://www.xiph.org/ogg/vorbis/