Top Banner

of 21

Mobile Formats Explained

Apr 07, 2018

Download

Documents

Sandeep Hm
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/6/2019 Mobile Formats Explained

    1/21

    HELSINKI UNIVERSITY OF TECHNOLOGY 30.11.2004

    Telecommunications Software and Multimedia Laboratory

    T-111.550 Multimedia Seminar

    Fall 2004: Mobile Multimedia Application Platforms

    Mobile Audio

    from MP3 to AAC and further

    Henri Autti Johnny Bistrm

    51194K 21548C

  • 8/6/2019 Mobile Formats Explained

    2/21

    1

    Mobile Audio from MP3 to AAC and further

    Henri Autti and Johnny BistrmHUT, Telecommunications Software and Multimedia Laboratory

    [email protected]

    [email protected]

    Abstract

    The purpose of this paper is to evaluate the advanced audio codecs and

    reflect over their suitability for mobile needs of today and tomorrow. The

    historical development of different codecs for different purposes is

    analyzed. The features of the most common codecs are discussed in parallelwith performance and other criteria. The capabilities of mobile devices and

    the telecommunication possibilities now and in the future are also

    considered in the analysis. Finally some comparisons of the codecs

    performances are done. Some existing applications on mobile audio are

    presented and ideas for audio applications in the future are discussed. Some

    of the questions to be answered here are. Can one codec, that is superior for

    the mobile world, be found or do we have to prepare for a wide diversity in

    the future? Will the codecs continue to develop as rapidly as they have

    done so far?

    1 INTRODUCTIONA look at audio applications for the mobile world today reveals that the diversity in

    implementations is wide. The solutions chosen for the representation of audio streams in

    certain situations differ greatly and the number of codecs used for the purpose of

    encoding and decoding audio data streams is large. It is not obvious which codec should

    be used for what purpose. The selection of codec depends on several factors as on

    content type of the audio material, the available communication speed and the quality

    requirements of the listening situation. Other factors that might influence the selection

    of codec are the standardization situation, the licensing policy and the competitors

    choices in the market. During the last years the mp3 format has been a great success butit does not fit well into mobile devices. Lately more efficient codecs as AAC and AMR

    have been presented and they have been refined for mobile audio purposes.

    The purpose of this paper is to evaluate the most important audio codecs by

    revealing the technical principles of the en- and decoding, the standardization situation

    and the suitability of the codec in relation to technology available and the market needs.

    The analysis also takes into consideration the development situation of mobile and

    telecommunication hardware and software. Using technical literature and documented

  • 8/6/2019 Mobile Formats Explained

    3/21

    2

    listener testing combined with mobile manufacturer specifications and published white

    papers we try to find out if there is one superior codec for mobile audio applications and

    which codec it would be. To do that, an analysis of existing and future audio

    applications in the market has to be done to clarify the needs and expectations on mobile

    audio. Finally the result is reflected against the development trends of the mobile

    technology and the persistency of the chosen solution is judged.

    2 BACKGROUNDIn this chapter we take a look at the background of mobile audio formats. First we

    present some basic facts about the development of audio codecs and the reasons for

    developing them. Then we discuss some facts about the development of mobile devices

    including phones, PDAs and Laptops. Later in this part we take a quick look at the

    applications available today and the demands these applications pose

    The short history of audio codecs dates back to the mid-1980s, in the Fraunhofer

    Institut in Erlangen (Fraunhofer, 2004) , Germany, which first began working on a highquality, low bit-rate audio coding with the help of Dieter Seitzer, a professor at the

    University of Erlangen. Their project was financed by the European Union as a part of

    the market-oriented Eureka research program where it was commonly known as EU-

    147. In Germany in 1989, Fraunhofer was granted a patent for mp3, which we are going

    to discuss more thoroughly in the next chapter. A few years later it was submitted to the

    International Standards Organization (ISO), and mp3 was introduced as a part of the

    official MPEG-1 standard in 1992. It was in January 1995 that Fraunhofer applied

    patent on mp3 in America as well and it was granted in November 1996. The

    revolutionary thing created was, that using mp3-compression PC-users were for the first

    time in history able to compress an ordinary music-CD to one tenth of its original size,

    with only a small sacrifice in the sound quality - thus 12 hours of music could be storedon a recordable CD that on the other hand could be played by a mp3-CD-player or an

    ordinary PC.

    In the rapidly evolving world of mobile content development things have changed a

    lot since those days. Nowadays ranging from small laptops through palmtops to phones,

    these devices are more available, and high-speed wireless networks are getting better

    day by day. At the same time speech and audio compression have advanced rapidly in

    recent years spurred on by cost effective digital technology and diverse commercial

    application. Wideband speech and high fidelity audio compression have also made great

    progressions in recent years, accelerated by the commercial success of consumer and

    professional digital audio products. Telephone speech, wideband speech and wideband

    audio signals differ not only in bandwidth and dynamic range, but also in listenerexpectations of the offered quality. Using of wideband not only improves the

    intelligibility and naturalness of speech, but also adds a feeling of transparent

    communication and eases speaker recognition.

    The commercial applications in the mobile content area of today are also developing

    at a growing rate. Mobile device services contain, according to Ericsson (Bruhn, 2004)

    streaming, messaging, downloading and broadcasting. Streaming scenarios include

    news listening, monitoring of sports events, audio books, music listening, commercial

  • 8/6/2019 Mobile Formats Explained

    4/21

    3

    advertisements, access to information systems and interactive gaming. Broadcasting

    scenarios are very close to screaming scenarios including web casting or Internet radio

    broadcasting. They have become especially popular allowing listeners to "stream" audio

    on their computers. Unlike downloaded audio files, streamed audio files are not stored

    on the users hard drive, but are broadcasted like traditional radio through the users

    audio player. Messaging scenarios are also similar to streaming, but with size

    limitations, including business-to-person and person-to-person scenarios. Downloadscenarios include music, books and comics downloading over the network. Important

    for all of these scenarios named above, is to be able to handle mixed content - covering,

    music, speech, speech-between-music and speech-over-music.

    The demands these applications pose today on audio codecs for mobile services

    include the ability to cope with generic content, sufficient and consistent quality at

    lowest rates, best quality at lowest rates, and high quality operation with relaxed bit rate

    requirement. The new audio codecs also have to be optimized for low-resource devices

    (low memory and computational resources) and have to be supportive for a variety of

    operating systems, e.g. Symbian, WinCE, Palm OS5 and OS6. Developing and

    standardizing the codecs is at the moment focusing on 3GPP, which is the body

    standardizing GSM, evolved GSM UMTS and 3G. In the next chapter we are going to

    introduce some of the most important audio standards and codecs, which play an

    important role in the 3GPP.

    3 AUDIO STANDARDS AND CODECSIn this chapter we describe what we consider the most important audio standards and

    codecs at the moment. In the first part of this chapter, we are going to discuss the 3GPP

    audio standard format families AAC and AMR, introducing the underlying technology.

    First we present mp3 (predecessor to AAC), AAC, HE-AAC and EAAC+ and then thechallengers; AMR, AMR-WB and AMR-WB+. In the second part we are going to

    discuss an open source codec Vorbis Ogg ACM and some of the most important non-

    standard audio formats, using the streaming technology, Windows Media Audio and

    RealMedia. For terminology, architecture and technology see Wales (2004) and ARM

    Developers Guide (2004).

    3.1 MPEG-1 (mp3)Mp3 stands for MPEG-1 Audio Layer III. It is not a separate format, but a part of the

    MPEG-1 video encoding format, as described earlier. Mp3 is a lossy data compression

    method (meaning that compressing a file and then decompressing it retrieves a file thatmay well be different to the original, but is "close enough") to store good quality audio

    into small files by using psychoacoustics in order to get rid of the data from the audio

    that most humans can't hear. Mp3's bit rates vary from 8 kbps to 320 kbps. When the

    mp3 phenomenon began in 1996, most of the audio files were encoded using 128 kbps

    bit rate, which is still the most popular bit rate in the world - although most of the

    people agree that by using slightly higher bit rates, like 192 kbps or 256 kbps, the audio

    quality can be compared with the CD quality.

  • 8/6/2019 Mobile Formats Explained

    5/21

    4

    The problem with mp3s takes place at lower bit rates (64 kbps and below), because

    the sound starts lacking the high frequency components. The reason is that mp3 at these

    bit rates runs out of bits to compress the music in full audio bandwidth and with

    significant detail. Mp3PRO was created to solve the problem of limited bandwidth mp3

    files. To improve the sound quality of mp3 at lower bit rates, an enhancement

    technology that gives back the sound the high frequency components has been

    developed. The technology is called "Spectral Band Replication" (SBR). SBR is a veryefficient method to generate the high frequency components of an audio signal. The

    resulting audio format is composed out of two components, the mp3 part for the low

    frequencies and the SBR or "PRO" part for the high frequencies. The first part analyses

    the low frequency band information and encodes it into a normal mp3 stream. This

    enables the encoder to concentrate on less information and allows it to do a better job of

    encoding. This also maintains complete compatibility to existing mp3 players. The

    second part analyses the high frequency band information and encodes it into a part of

    the mp3 stream that is normally ignored by existing mp3 decoders. Detailed information

    can be found at mp3PRO Zone (2004).

    3.2 MPEG-2 AACAAC (Advanced Audio Coding), also known as MPEG-2 AAC, is a lossy data

    compression scheme intended for audio streams. AAC was designed to replace mp3. It

    is part of the MPEG-2 standard introduced in 1994 and developed by the MPEG group

    that includes Dolby, Fraunhofer (FhG), AT&T, Sony, and Nokia - companies that have

    also been involved in the development of audio codecs such as mp3 and AC3 (also

    known as Dolby Digital). Unlike older MPEG audio encoding methods, MPEG-2 AAC

    is not backwards compatible to older MPEG audio formats. For example, mp3 is

    backwards compatible to mp2.

    The function of AAC is based on a wideband audio coding algorithm that exploitstwo primary coding strategies to dramatically reduce the amount of data needed to

    convey high-quality digital audio. First, the signal components that are "perceptually

    irrelevant" and can be discarded without a perceived loss of audio quality are removed.

    Next, redundancies in the coded audio signal are eliminated. Efficient audio

    compression is achieved by a variety of perceptual audio coding and data compression

    tools.

    When compared side-by-side with its predecessor, mp3, AAC is proving itself

    worthy of replacing mp3 as the new Internet audio standard. It has improved

    compression, which provides higher-quality results with smaller file sizes. It has support

    for multi-channel audio, providing up to 48 full frequency channels, higher resolution

    audio, yielding sampling rates from 8 up to 96 kHz and improved decoding efficiency,requiring less processing power for the decoding. These result in higher quality output at

    lower data rates, allowing even modem users to hear a difference. It also enables the

    listener to get a better and more stable quality than mp3 at equivalent or slightly lower

    bit rates. Depending on the AAC profile and the mp3 encoder, 96 kbps AAC can give

    nearly the same or better perceptional quality as 128 kbps mp3.

  • 8/6/2019 Mobile Formats Explained

    6/21

    5

    3.3 MPEG-4 HE-AACMPEG-4 High Efficiency AAC is the combination of MPEG-2 AAC and the SBR

    Bandwidth Extension amendment that is based on SBR (Spectral Band Replication)

    technology. HE-AAC is not a replacement for AAC, but rather a superset, that extends

    the reach of high-quality MPEG-4 audio to much lower bit rates (as low as 32 kbps).

    HE-AAC is able to achieve superior audio quality without losing treble sound or thecollapsing of the stereo image.

    HE-AAC decoders will decode plain AAC and the enhanced AAC plus SBR. The

    result is a backward compatible extension of the standard that nearly doubles the

    efficiency of MPEG-4 audio. As discussed before, SBR is a unique bandwidth extension

    technique, that doesnt replace the core codec, but operates in conjunction with it to

    create a more efficient superset, that can cut the required bit rate in half. Present in both

    the encoding and the decoding process, SBR leverages the correlation between the low

    and the high frequencies in an audio signal to describe the high-end of the signal using

    only a very small amount of data. This SBR data describing the high-frequencies is

    coupled with the low-frequency compressed data from the AAC codec. Once combined,

    the complete HE-AAC bit stream contains enough data to recreate the original signal.(See figure 1.)

    For example, to create 48 kbps stereo HE-AAC, the encoder generates two signals:

    an MPEG AAC signal at about 42 kbps and a SBR signal at about 6 kbps. The SBR

    signal is then placed into the MPEG AAC auxiliary fields as defined in MPEG-4 and

    sent out as a complete 48 kbps MPEG-4 HE-AAC bit stream.

    Figure 1. The encoding and decoding process of HE-AAC.

    Because the SBR data is placed within the AAC auxiliary fields, the enhanced signal

    will be accepted by both an existing AAC and a new HE-AAC decoder. If sent to an

    AAC decoder, only the low-frequency audio signal will be recognized and decoded. Ifsent to an HE-AAC decoder, the SBR and the AAC will be decoded to recreate the full

    frequency signal. This technique makes the new profile forward compatible with AAC.

    Because the HE-AAC decoder contains a full-fledged AAC decoder, it is also able to

    decode both the Plain AAC and HE-AAC MPEG-4 Audio profiles. This

    combination makes HE-AAC backward compatible with AAC.As a result, HE-AAC delivers cd-quality stereo at 48 kbps and 5.1 channel surround

    sound at 128 kbps. This level of efficiency is ideal for Internet content delivery and

  • 8/6/2019 Mobile Formats Explained

    7/21

    6

    fundamentally enables new applications in the markets of mobile and digital

    broadcasting. However HE-AAC is not good enough for two-way communications, due

    to its very high delay according to Frerichs (2003).

    3.4 EAAC+Enhanced AAC+ was introduced in 3GPP release-6 standard in 2004. It has an

    optimal operating range from 18 kbps and higher. According to 3rd Generation

    Partnership Project, enhanced AAC+ general audio codec consist of MPEG-4 AAC,

    MPEG-4 SBR and MPEG-4 Parametric Stereo. The AAC is a general audio codec, SBR

    is a bandwidth extension technique offering substantial coding gain in combination with

    AAC, and Parametric Stereo enables stereo coding at very low bit rates. According to

    IBC 2003 Conference Papers, the basic principle behind the parametric stereo is similar

    to the SBR principle - a guided reconstruction of a stereo signal based on a transmitted

    mono signal. In addition to a coded mono mixdown of the stereo input signal,

    parameters describing the stereo image are transmitted. The stereo parameters require a

    small fraction of the total bit rate, ensuring a high quality of the mono signal at the given

    bit rate. Two parameters are used to describe the stereo information, a panoramaparameter and an ambience parameter. The panorama parameter contains information

    about the left to right level differences within different frequency bands. Similarly, the

    ambience parameter depicts the stereo ambience for a set of frequency bands. The

    encoding of both parameters uses the same principle of entropy coding of time- or

    frequency-direction differences as is used for the SBR envelopes. In addition, the

    quantization steps are frequency dependent.

    Also in addition to the older codecs, there are 3 additional tools included in the

    Enhanced AAC+ decoder. Error concealment tools for AAC, SBR, and PS make the

    decoder robust against transmission errors like frame loss. These tools mitigate audible

    effects of such errors. The stereo-to-mono down mix tool enables a decoder onlycapable of mono output to down mix a stereo bit stream. For the AAC part this is done

    in the time domain after the stereo decoding but for SBR this is done on the SBR

    parameters and thus saving complexity since only a mono decoding of SBR is needed.

    The Spline resampler tool gives the possibility to resample the output to a sampling

    frequency different than what was supplied in the bit stream. This gives for example

    handsets with a D/A converter only capable of 16 kHz sampling frequency the

    possibility to play bit streams encoded with 22.05 kHz sampling frequency.

    Figure 2 shows a block diagram of the EAAC+ encoder. The encoder basically

    consists of the AAC waveform encoder, the SBR high frequency reconstruction

    encoding tool and the PS encoding tool. The encoder operates in a dual rate mode,

    whereas the SBR encoder operates at the encoding sampling ratefsenc as delivered fromthe IIR resampler and the AAC encoder at half of this sampling rate fsenc/2.

    Consequently a 2:1 down sampler is present at the input to the AAC encoder. The PS

    tool is used for low bit rate stereo coding, i.e. up to and including a bit rate of 32 kbps.

    The AAC encoder implementation complies with the AAC Low Complexity Object

    Type and is a highly optimized low-resource implementation, requiring only little

    computational complexity and memory resources. This is basically achieved by mapping

    the psychoacoustic based threshold estimation directly to scale factor amplification

    values to shape the encoding quantization noise according to the input signal

  • 8/6/2019 Mobile Formats Explained

    8/21

    7

    characteristics, rather than employing time-consuming iterative analysis-by-synthesis

    methods.

    The SBR encoder consists of a QMF (Quadrature Mirror Filter) analysis filter bank,

    which is used to derive the spectral envelope of the original input signal. Furthermore

    the SBR related modules control the selection of an input signal adaptive grid

    partitioning of the QMF samples on the time axis (i.e. control the framing), analyze ofthe relation of noise floor to tonal components in the high band, collect guidance

    information for the transposition process in the decoder and detect missing harmonic

    components which could not be reconstructed by pure transposition. This gathered

    information about the characteristics of the input signal, together with the spectral

    envelope data forms the SBR stream. The amount of bits for the SBR stream is

    subtracted from the bits available to the AAC encoder in order to achieve a constant bit

    rate encoding of the multiplexed EAAC+ stream.

    The Parametric Stereo encoding tool in the EAAC+ encoder estimates parameters

    characterizing the perceived stereo image of the input signal. These stereo parameters

    are embedded in the SBR stream. At the same time, a signal adaptive mono down mix

    of the input signal is generated in the QMF domain and fed into the SBR encoderoperating in mono. This down mix is also processed by a down sampled QMF synthesis

    filter bank to obtain the time domain input signal for the AAC core encoder with the

    sampling ratefsenc/2. In this case, the 2:1 IIR down sampler is not active.

    Figure 2. 3rd Generation Partnership Project; EAAC+ Encoder overview

    In the decoder ( figure 3) the bit stream is de-multiplexed into the AAC and the SBRstream. Error concealment, e.g. in case of frame loss, is achieved by designated

    algorithms in the decoder for AAC, SBR and PS: the AAC core decoder employs signal-

    adaptive spectrally shaped noise generation for error concealment, in the SBR and PS

    decoders, error concealment is based on extrapolation of guidance, envelope, and stereo

    information.

    For the SBR processing, a Low-Power tool of SBR is used for full stereo decoding in

    order to keep the peak computational complexity as low as possible over all channel

  • 8/6/2019 Mobile Formats Explained

    9/21

    8

    modes. Usage of the SBR Low-Power tool provides a computational complexity of an

    HE-AAC stereo decoder in the same range as plain AAC stereo decoders. The low band

    AAC time domain signal, sampled atfsenc/2, is first fed to a 32-channel QMF analysis

    filter bank. The QMF low band samples are then used to generate a high band signal,

    whereas the transmitted transposition guidance information is used to best match the

    original input signal characteristics.

    The transposed high band signal is then adjusted according to the transmitted spectralenvelope signal to best match the originals spectral envelope. Missing components that

    could not be reconstructed by the transposition process are also introduced. Finally, the

    low band and the reconstructed high band are combined to obtain the complete output

    signal in the QMF domain.

    In case of a stream using parametric stereo, the mono output signal from the underlying

    HE-AAC+ decoder is converted into a stereo signal. This processing is carried out in the

    QMF domain and is controlled by the parametric stereo parameters embedded in the

    SBR stream.

    Figure 3. 3rd Generation Partnership Project; EAAC+ Decoder overview

    3.5 AMRThe AMR (Adaptive Multi-Rate) standard was introduced in 1998. Its main function

    is mobile baseline speech. It operates at variable mono bit rates in the range of 4.75 to12.2 kbps in its narrowband (bandwidth 3.5 kHz) configuration. It was adapted by the

    3GPP as the mandatory codec for 3G wireless systems based on the evolved GSM core

    network (WCDMA, EDGE, GPRS).

    The philosophy behind AMR is to lower the codec rate as the interference increases

    and thus enabling more error correction to be applied. The AMR codec is also used to

    harmonize the codec standards amongst different cellular systems. This is based on

    technology called ACELP (Algebraic Code Excited Linear Prediction). ACELP is a

  • 8/6/2019 Mobile Formats Explained

    10/21

    9

    speech compression system, used to provide a good standard of speech quality when the

    network is operating at low data rates (narrow bandwidth). The analogue voice signal is

    converted to a digital data signal, so that it can be compressed for transmission over the

    network, and the process is then reversed at the other end when the digital data is

    converted back to an analogue voice signal. The quality of the reproduced speech will

    appear to be much better at the receiving phone than without the ACELP system.

    3.6 AMR-WBAMR-WB (wideband extension) is a speech coding standard developed after the

    AMR using the same technology as ACELP. The AMR Wideband codec was

    standardized by ETSI/3GPP in December 2000, and selected and approved by the ITU-T

    in July 2001 and January 2002, respectively. The ITU-T standard is referred to as

    G.722.2.

    The codec provides excellent speech quality due to its wider speech bandwidth of 50

    - 7000 Hz, significantly improving the intelligibility and naturalness of speech and

    adding a feeling of face-to-face communication. The AMR-WB speech codec consistsof nine speech codec modes with mono bit rates of 23.85, 23.05, 19.85, 18.25, 15.85,

    14.25, 12.65, 8.85 and 6.6 kbps. The lowest bit rate providing excellent speech quality

    in clean environment is 12.65 kbps. Higher bit rates are useful in background noise

    conditions and in case of music. Also lower bit rates of 6.60 and 8.85 provide

    reasonable quality especially if compared to narrowband codecs. Background noise

    mode is designed to be used in discontinuous transmission (DTX) operation in GSM

    and as a low bit-rate source dependent mode for coding background noise in other

    systems.

    AMR-WB can also carry narrowband signals. It eliminates the need for transcoding

    and eases the implementation of wideband applications and services across a wide rangeof wireless and wire line communication systems and platforms. AMR-WB is already

    standardized for future usage in networks such as UMTS. There it provides so much

    higher speech quality that it seems probable that also older networks will have to

    gradually be transformed to support wideband.

    3.7 AMR-WB+Adopted as an audio codec standard in September 2004 by ETSI/3GPP, AMR-WB+

    is an audio extension of AMR-WB, which utilizes a hybrid of two technologies: ACELP

    and TCX (Transform Coded Excitation) to deliver very high sound quality for both

    speech and audio content types, including music, voice-between-music, and voice-over-music.

    AMR-WB+ adds support for stereo signals and higher sampling rates. Also, high-

    efficiency parametric stereo (HE-PS), as discussed under EAAC+, provides high-fidelity

    stereo image reproduction at the lowest bit rates. Another main improvement is the use

    of transform coding additionally to ACELP. This greatly improves the generic audio

    coding. Automatic switching between transform coding and ACELP provides both very

    good speech and other audio quality with moderate bit rates. Sound quality is not

  • 8/6/2019 Mobile Formats Explained

    11/21

    10

    compromised even in networks where the bandwidth is limited.

    The AMR-WB+ codec has a wide bit-rate range, from 6 to 48 kbps. Mono rates are

    scalable from 6 to 36 kbps, and stereo rates are scalable from 8 to 48 kbps, reproducing

    bandwidth up to 24 kHz (approaching CD quality). Moreover, it provides backward

    compatibility with AMR wideband. AMR-WB+ brings speech and music to mobile

    phones (VoiceAge, 2004).

    3.8 Vorbis Ogg ACMDue to numerous patenting and licensing issues with various parts of the MPEG

    specifications, there has been a significant movement to create and popularize audio

    formats and/or algorithms which lack that significant problem. The most popular of

    these is probably Ogg Vorbis, which is a completely open and free codec project from

    Xiph.org Foundation (2004).

    Vorbis was started as a result of a plan to charge licensing fees for the mp3 format,

    which was announced in September 1998. The first version 1.0 of the codec wasreleased on July 19, 2002. The latest version is 1.1.0 released on September 22, 2004.

    The Ogg Vorbis format has proved popular among open source communities; they argue

    that its higher fidelity and completely free nature make it a natural replacement for the

    entrenched mp3 format. In the commercial sector, Vorbis has already had success with

    many newer video game titles employing Vorbis as opposed to mp3.

    Given 44.1 kHz as the standard CD audio sample frequency stereo input, the current

    encoder will produce output 45 - 500 kbps, depending on the specified quality setting.

    Though Vorbis 1.0.1 is tuned for bit rates of 16 - 128 Kbps/channel, it is still possible to

    encode arbitrary bit rates chosen by the user. Such figures are only approximate,

    however, as Vorbis is inherently variable-bit rate.

    Vorbis uses the modified discrete cosine transform (MDCT) for converting sound

    data from the time domain to the frequency domain. The resulting frequency-domain

    data is broken into noise floor and residue components, and then quantized and entropy

    coded using a codebook-based vector quantization algorithm. The decompression

    algorithm reverses these stages.

    3.9 Windows MediaWindows Media Audio (WMA) is a proprietary compressed audio file format used

    by Microsoft. It has a large user base through Windows. It was initially a competitor tothe mp3 format, but with the introduction of Apples iTunes Music Store, it has

    positioned itself as a competitor to the AAC format used by Apple. It is part of the

    Microsoft Corporation (2004) Windows Media framework. An initial reason for the

    development of WMA might have been that mp3 technology is patented and has to be

    licensed from Thomson, which controls licensing of the mp3 patents in many countries

    including the United States of America, for inclusion in the Microsoft Windows

    operating system. It includes multi-channel-coding.

  • 8/6/2019 Mobile Formats Explained

    12/21

    11

    With the publishing of Windows Media Audio 9, the codec was updated to WMA. It

    is considered to reach close to AAC in quality. Pro and a new lossless codec has been

    introduced to accompany the existing lossy codec. Support for variable bit rates has also

    been introduced. WMA Pro has not been reverse engineered yet.

    Microsoft's Windows Media Audio (WMA) file format, which they claim is a higher

    quality audio format at smaller file sizes, is starting to gain more acceptance as it comesbundled as the standard audio format in Windows 98/2000/XP. Microsoft might be able

    to challenge the dominance of MP3s or at the very least offer a second, popular audio

    format choice.

    3.10 Real MediaRealAudio is a proprietary audio codec developed by RealNetwork. It is especially

    designed to conform to low bandwidths, and it can be used as a streaming audio format.

    As a matter of fact, it was one of the first to offer streamed audio software in the world.

    For high bit rates, Real Media uses AAC. Many radio stations use RealAudio to stream

    their programming over the internet in real time. The first version of RealAudio wasreleased in 1995. The current version of the codec, RealAudio 10 was published in

    2004. It includes multichannel-coding (RealNetworks Incorporated, 2004).

    4 DEVELOPMENT TRENDS AND COMPARISON OF CODECS FOR THEAPPLICATIONS OF TOMORROW

    The hardware of the mobile platforms is going through a rapid development and thus

    new software and applications can be expected in mobile devices of tomorrow. The

    capacity of the central processing unit grows and more memory is already available at a

    lower price. This chapter should extrapolate what will happen to the devices in the nearfuture. The wireless communication channels are also going through a development

    which leads to faster transmission to the mobile devices. Is there any need for such an

    effective compression the HEAAC offers or will it go to history while the limitations

    of today disappear?

    4.1 The Features of new Mobile PhonesThe main target of this hardware study is the mobile phone, as the number of mobile

    phones is much larger than the number of PDAs. The mobile phone is also a good low-

    end platform representative for mobile devices as one of the main requirements for a

    phone always is its size and weight. According to Symbian Ltd. (Symbian, 2004), the

    leading manufacturer of 3G Operating Systems for Mobile Phones, the latest Symbian

    operating system, OS 8, is already used on Series 60, Platform 2.0 based 3G phones as

    the Nokia 6630 which give them a wide support for audio codecs as NB-AMR, WB-

    AMR, MP3, AAC and RealAudio. As the phone has 10 MB of internal dynamic and 64

    MB on a MMC, it offers fairly good possibilities for audio and video applications in the

    mid-price range. More expensive phones as Nokia 7710, in series 90, with Symbian OS

    7, support the same audio codecs even in stereo. The Nokia 7710 has 90 MB of RAM

  • 8/6/2019 Mobile Formats Explained

    13/21

    12

    and can handle a MMC on 512 MB which makes it an excellent choice for audio and

    multimedia applications. The same applies to the Nokia Communicator 9500. Thus the

    hardware limits for audio have been eliminated in the mid- and high-end mobile phones.

    In the low-end mobile phones there are still some relevant hardware restrictions

    considering the use of audio, mainly because of the low price requirement, but they will

    disappear in the near future.

    4.2 The Telecommunication Features of the Mobile NetworksThe basic GPRS (General Radio Packet Services) network still used in many mobile

    phones support communication speeds of 30-50 kbps. The EDGE (Enhanced Data rates

    for GSM Evolution) or EGPRS technology increases the speed for the end-user to rates

    of 120-150 kbps and even a bit higher. EGPRS is available in most mid-end and even

    some low-end phones so it can be considered as the standard today. EGPRS is however

    available only in urban and suburban areas today. The UMTS (Universal Mobile

    Telecommunication System) offers data speeds from 384 kbps (TDD Mode) to 2 Mbps

    (TDD Mode) (Compagnie Financire Alcatel, 2004), which removes some of thelimitations.

    So far UMTS is available only in high-end mobile phones and only in urban areas.

    Finland will not be covered by UMTS networks in the near future, which means that

    EGPRS still will be the fastest alternative for a large group of phone users here. The

    speed of the EGPRS is however enough for streaming music audio applications if the

    latest codecs are used.

    4.3 Comparison of Mobile Audio CodecsThere are many methods to compare the quality of audio streams. One method is to

    use an audience to judge the quality. A test used by the European Broadcasting Union

    (EBU) called MUSHRA is often used as a reference. MUSHRA stands for MUlti

    Stimulus test with Hidden Reference and Anchors and is an advanced testing method

    developed and proposed by the EBU Project Group B/AIM. The method has been

    submitted to ITU for standardization.

    MUSHRA (Stoll and Kozamernik, 2000) is a subjective test where listeners in

    different EMU-countries compare different types of audio to a reference signal and

    grade it according to a scale from 0 to 100, where the interval 81-100 is considered

    excellent, 61-80 is considered good, 41-60 is considered fair, 21-40 is consideredpoor and 0-20 is considered bad. Different types of music such as classical, folk, jazz

    and pop music is tested. Broadcasting programs, both in a studio and a live

    environment, with female and male voices, are also tested.

    According to these listener tests, performed by EBU, only a little difference can be

    heard between stereo cd-quality and HE-AAC compression at 48 kbps. The test results

    are described by Kozamernik (2003). This is also illustrated in figure 4, which shows

    that aacPlus, also called HE-AAC gets the highest MUSHRA index of 80 compared to

  • 8/6/2019 Mobile Formats Explained

    14/21

    13

    mp3PRO which gets the index 76. At the rate 48 kbps the more well-known RealMedia

    Real 8, mp3 and MS Windows Media 8 codecs get much lower ratings. EBU has not

    reported MUSHRA testing of the AMR-WB+ codec yet.

    Figure 4. European Broadcasting Union MUSHRA testing at 48 kbps stereo (Coding

    Technologies, 2004).

    The 3rd Generation Partnership Project (3GPP) is a collaboration agreement that was

    established in December 1998. 3GPP has conducted a standardization process for Packet

    Switched Streaming (PSS) and Multimedia Messaging Services (MMS). Two bit rates

    have been defined:

    1. low-rate range up to 24 kbps, where the candidates are: AMR-WB+, HE-AAC+ /aacPlus) and Enhanced AAC+

    2. high-rate range, with rates higher than 24 kbps. Here the candidates are: HE-AAC+ /aacPlus) and Enhanced AAC+

    The comparison tests that 3GPP conducted for the selection audio coding standard

    shows the following quality scalability for AMR-WB+ in a MUSHRA test. Figure 5

    shows that the MUSHRA score for AMR-WB+ at 48 kbps is 83, which overrides the

    EBU figures for aacPlus.

  • 8/6/2019 Mobile Formats Explained

    15/21

    14

    Figure 5. Quality scalability of AMR-WB+ based on a MUSHRA test (Bruhn, 2004).

    The comparison tests between EAAC+ and AMR-WB+ that 3GPP conducted for the

    selection of low rate-range audio coding showed that AMR-WB+ is a slightly better

    codec for stereo at rates lower than 24 kbps which can be seen in figure 6. Both codecs

    however represent edge coding technology giving the highest quality possible for mobile

    devices today.

    Figure 6. Comparison of AMR-WB+ and EAAC+ by 3GPP (Mkinen, J. et al., 2004).

  • 8/6/2019 Mobile Formats Explained

    16/21

    15

    4.4 Support for latest Codecs in Mobile PhonesCodecs as AAC and AMR-WB are already supported in mobile mid- and high-end

    mobile phones so they can be used if the target consumer is in the mid- or high-endclassification as office mobile phone users generally are. The latest codecs using SBR

    however, are yet not supported by mobile phones. This means that HE-AAC (aacPlus),

    EAAC+ and AMR-WB+ cannot yet be used in mobile applications. It will however not

    take too long before also these codecs are supported as they are approved by the 3GPP

    and the hardware manufacturers already have implemented them in the products.

    Nokia has also signed a aacPlus license agreement in July, 2004

    (www.3G.co.uk,2004) which indicates that aacPlus will be available on Nokia mobile

    phones soon.

    Open codecs, as Ogg Vorbis, do not seem to be so successful on the mobile

    commercial market. They are generally not supported, as a standard feature, but usersthat are interested in them can install the codec and a player. There is an Open Source

    Player called OggPlay by Leif H. Wilden (2004) for the Symbian OS. This player

    currently supports ogg-, mp3- and acc-files on Series 60 phones having Symbian OS 7

    or later.

    Windows Media has not yet succeeded to get the same position in mobile phone

    market as it has in the PC-market. No other phones than Microsofts own brands include

    a Windows Media Player. Real Media Players are however available for most mobile

    phones and it has thus established a special position on the mobile market.

    4.5 Existing and upcoming applications for high-quality audio in mobile devicesApplications for downloading of music contents to the mobile phone already exist.

    There are at least two commercial players (MP3go and UltraMP3) that support the

    playing of mp3-based music. These players also support the creating and usage of play

    lists. The main disadvantage at the moment is the need of memory (3-5 MB/song) for

    high quality stereo mp3-music. If HE-AAC or AMR-WB could be used, the size of a

    song would be below 1 MB. This would allow low-end mobile phones to store more

    music than today. The high-end mobile phones already have enough memory available

    on MMCs. The downloading time for a song in the EGPRS-network would decrease

    from five minutes to one. Normally songs are not downloaded over the network but

    directly from a PC through cable, Bluetooth or IR.

    The UK mobile phone company MMO2 (mm02, 2004) is launching a service for

    downloading music to mobile phones in November. It uses a special music player called

    O2 Digital Music Player. The music files will be encoded in the MPEG4, aacPlus

    format and should be about a megabyte in size, MMO2 says. One song would take

    roughly 90 seconds to download across a GPRS connection. The copy-protection

    technology will be provided by the Swiss company Secure Digital Container (SDC).

  • 8/6/2019 Mobile Formats Explained

    17/21

    16

    Streaming applications for mobile phones already exist. Both music and video can be

    enjoyed from the mobile phone. The Finnish Broadcasting Company YLE, as an

    example, sends the news as 20-50 kbps streams for the GPRS-network. This speed was

    selected to make the news available anywhere in Finland. The most common format

    today is RealMedia but other formats will certainly be available in the near future. One

    of the problems today is the quality of the content due to the low bandwidth in the

    GPRS network. In the near future the quality of the content will be much better due toboth increased bandwidth and more efficient codecs which will largely improve

    enjoyment.

    The American Market Research Centre In-Stat (In-Stat, 2004) expects the American

    streaming video market to start to grow in the next two years but not until 2009 it will

    reach 15 % of the total wireless revenues which is not very encouraging. Another study

    shows that 11 % of the mobile phones users today are very or extremely interested in

    buying music over the mobile phone network.

    5 CONCLUSIONSThe development of audio codecs for mobile phones have been very rapid in the past

    few years. Enhancements of codecs have been released yearly and there always seems

    to be new technologies that can be applied on the compression procedure. Such

    technologies that changed the world of encoding are MP3, AAC and SBR. At this

    moment the ultimate codecs for audio seem to be AMR-WB+ and HE-AAC (aacPlus)

    depending on what kind of audio material is encoded. This is most likely not the last

    step in codecs. New codecs will probably be introduced yearly even in the future.

    Figure 7. Applications for Mobile Audio (Mkinen, J. et al., 2004).

  • 8/6/2019 Mobile Formats Explained

    18/21

    17

    The need for more efficient codecs will probably gradually decrease as the

    telecommunication speeds will continue to grow even beyond 3G networks. According

    to the telecommunication company Alcatels White Paper on Mobile Network Evolution

    the expected communication speed for mobile phones will approach 1 Gbps in 2010

    2015(See figure 8).

    Figure 8. Evolution of mobile networks from 2G to B3G (Hurel, J-L et al., 2004).

    On the other hand, new applications utilizing the possibilities will certainly be

    introduced on the market. These products also act as a drive for the technology as theydemand more computing power, more memory, better graphics and better audio which

    in turn demand more efficient telecommunication possibilities. As long as there is a

    need for those applications and a willingness to pay the price utilizing them the

    development process is secured. Severe limiting factors, that could stop the

    development of mobile audio applications, seem to be hard to find.

  • 8/6/2019 Mobile Formats Explained

    19/21

    18

    REFERENCES

    ARM Developers Guide 2004-2005. Convergence Promotions. Developers Guide

    (Online) 2004. [Referenced 25.11.2004]. Available:

    http://arm.convergencepromotions.com/catalog/m_home.htm

    Bruhn, S. 2004. Bridging the gap between speech and audio coding - AMR-WB+ - The

    codec for mobile audio. Ericsson Research, Multimedia Technologies. Available:

    http://www.s3.kth.se/radio/COURSES/S3_SEMINAR_2E1380_2004/presentations/Er

    icssonAudio-040506.pdf

    Coding Technologies. aacPlus. Products and Technologies. Promotion Page (Online)

    2004. [Referenced 25.11.2004]. Available:

    http://www.codingtechnologies.com/products/aacPlus.htm

    Compagnie Financire Alcatel. Mobile Networks. Solutions. Technology Overview

    Page (online) 2004. [Referenced 25.11.2004]. Available:

    http://www.alcatel.com/mobilenetworks/mobileinternet/

    F. Henn, R. Bhm, S. Meltzer, Th. Ziegler, 2003, SPECTRAL BAND REPLICATION

    (SBR) TECHNOLOGY AND ITS APPLICATION IN BROADCASTING

    http://www.broadcastpapers.com/radio/ibc2003CodingSBR04.htm

    Fraunhofer Institute for Integrated Circuits IIS. Audio & Multimedia. MPEG Audio

    Layer-3. Technology Report (online) 2004. [Referenced 25.11.2004]. Available:

    http://www.iis.fraunhofer.de/amm/techinf/layer3/

    Frerichs, D. 2003. New MPEG-4 High-efficiency AAC Audio: Enabling new

    applications. Coding Technologies. Available: http://www.telos-

    systems.com/techtalk/hosted/m4-in-30100%20(M4IF_HE_AAC_paper).pdf

    Hurel, J-L. Lerouge, C. Evci, C. & Gui L. 2004. Mobile Network Evolution: From 3G

    Onwards. Technical White Paper. Compagnie Financire Alcatel. Available:

    http://www.alcatel.com/doctypes/articlepaperlibrary/pdf/ATR2003Q4/T0312-

    Mobile-Evolution-EN.pdf

    In-Stat, American Market Research Centre.Mobile Consumer Data & Multimedia

    Services. Information Service (Online) 2004. [Referenced 25.11.2004]. Available:

    http://www.instat.com/catalog/Wcatalogue.asp?id=230

  • 8/6/2019 Mobile Formats Explained

    20/21

    19

    Kozamernik,F. 2003. EBU subjective listening tests on low-bitrate audio codecs. EBU

    Listening Tests. Tech 3296. June 2003. Available:

    http://www.ebu.ch/CMSimages/en/tec_doc_t3296_tcm6-10497.pdf?display=EN

    mp3PRO Zone 2004. Coding Technologies. Developers Guide (Online) 2004.

    [Referenced 25.11.2004]. Available: http://www.mp3prozone.com/

    Microsoft Corporation. Windows Media Home. Technology Page (Online) 2004.

    [Referenced 25.11.2004]. Available:

    http://www.microsoft.com/windows/windowsmedia/default.aspx

    mm02. O2 Digital Music Player. Cellular Phone Operator. Promotion Page (Online)

    2004 . [Referenced 25.11.2004]. Available: http://www.o2.co.uk/o2-digital-music-

    player.html

    Mkinen, J. et al. 2004. AMR-WB+: A new audio coding standard for 3rd

    generation

    mobile audio services. Nokia Research Center. Finland, submitted to ICASSP 2005.

    Figures available: http://www.tml.hut.fi/Opinnot/T-

    111.550/Mobileaudioformats2004-10-26.pdf

    RealNetworks Incorporated. Real Player Page. Technology Page (Online) 2004.

    [Referenced 25.11.2004]. Available: http://www.real.com/player/?src=realaudio

    Stoll, G. and Kozamernik .F 2000. EBU Listening Tests on Internet Audio Codecs.

    EBU Technical Review June 2000. Available: http://www.ebu.ch/trev_283-kozamernik.pdfSymbian Ltd. Symbian OS Phones. Technology Promotion Page (Online) 2004.

    [Referenced 25.11.2004]. Available: http://www.symbian.com/phones/index.html

    The 3rd Generation Partnership Project; Technical Specification Group Services and

    System Aspects; General Audio Codec audio processing functions; Enhanced

    aacPlus General Audio Codec; General Description (Release 6), (2004)

    http://www.3gpp.org/ftp/tsg_sa/TSG_SA/TSGS_24/Docs/PDF/SP-040428.pdf

    Vilermo, M. 2004. Audio Codecs. AES/RTI- Audiopivt 2004 Conference, Helsinki,

    25.26.5.2004. Audio Engineering Society Finnish Section. Available:

    http://www.aes.fi/audiopaivat2004/vilermo.pdf

  • 8/6/2019 Mobile Formats Explained

    21/21

    VoiceAge Corporation Licencing Service. AMR-WB+ FAQs . Technologies. Frequently

    Asked Questions (2004). [Referenced 25.11.2004]. Available:

    http://www.voiceage.com/amrsite/tech_wbplus_faqs.php

    Wales, J. 2004. Wikipedia - The Free Encyclopedia. Wikipedia Foundation. Electronic

    Encyclopedia (Online) 2004. [Referenced 25.11.2004]. Available:http://en.wikipedia.org/

    Wilden, L. H. Ogg Vorbis Player for Symbian OS phones. Technology Page (Online)

    2004. [Referenced 25.11.2004]. Available: http://symbianoggplay.sourceforge.net/

    www.3G.co.uk. aacPlus 2.5 / 3G License with Nokia. News service for 3G. News

    Service Site (online) July 2004. [Referenced 25.11.2004]. Available:

    http://www.3g.co.uk/PR/July2004/8026.htm

    Xiph.Org Foundation. The Ogg Vorbis CODEC project. Ogg. Developers Page

    (Online) 2004. [Referenced 25.11.2004]. Available:http://www.xiph.org/ogg/vorbis/