Top Banner
ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* RALF GEIGER, 1 RONGSHAN YU, 2 ** JU ¨ RGEN HERRE, 1 SUSANTO RAHARDJA, 2 ([email protected]) ([email protected]) ([email protected]) ([email protected]) SANG-WOOK KIM, 3 XIAO LIN, 2 and MARKUS SCHMIDT 1 ([email protected]) ([email protected]) ([email protected]) 1 Fraunhofer IIS, Erlangen, Germany 2 Institute for Infocomm Research, Singapore 3 Samsung Electronics, Suwon, Korea Recently the MPEG Audio standardization group has successfully concluded the standard- ization process on technology for lossless coding of audio signals. A summary of the scalable lossless coding (SLS) technology as one of the results of this standardization work is given. MPEG-4 scalable lossless coding provides a fine-grain scalable lossless extension of the well-known MPEG-4 AAC perceptual audio coder up to fully lossless reconstruction at word lengths and sampling rates typically used for high-resolution audio. The underlying innova- tive technology is described in detail and its performance is characterized for lossless and near lossless representation, both in conjunction with an AAC coder and as a stand-alone com- pression engine. A number of application scenarios for the new technology are discussed. 0 INTRODUCTION Perceptual coding of high-quality audio signals has ex- perienced a tremendous evolution over the past two de- cades, both in terms of research progress and in worldwide deployment in products. Examples for successful applica- tions include portable music players, Internet audio, audio for digital media (such as VCD, DVD), and digital broad- casting. Several phases of international standardization have been conducted successfully [1]–[3]. Many recent research and standardization efforts focus at achieving good sound quality at even lower bit rates [4]–[9] to ac- commodate storage and transmission channels with lim- ited quality (such as terrestrial and satellite broadcasting and third-generation mobile telecommunication). Never- theless, for other scenarios with higher transmission band- width available, there is a general trend toward providing the consumer with a media experience of extremely high fidelity [10], as it is frequently associated with the terms “high definition” and “high resolution” and dedicated me- dia types, such as DVD-Audio [11], SACD [12], HD- DVD [11], or Blu-Ray Disc [13]. In the realm of audio, this is achieved by employing lossless formats with high resolution (word length) and/or high sampling rate. There exist several proprietary lossless formats, the most prominent of which are MLP Lossless [14], [15], DTS-HD [16], and Apple Lossless [17]. Furthermore, sev- eral freeware coding systems have gained some promi- nence through the Internet. Among them are FLAC [18], Monkey’s Audio [19], and OptimFROG [20]. On the other end of the bit-rate scale, approaches for scalable perceptual audio coding have been developed in the recent years. Within the International Telecommuni- cation Union (ITU) scalability in wide-band audio coding has been adopted only recently [21]. Within MPEG-4 Au- dio [3], scalability has been developed and adopted some- time earlier [22]–[24]. This includes the realm of speech coding where scalability is provided both for harmonic vector excitation coding (HVXC) [25], [26] and code- excited linear prediction (CELP) [27], [28]. In this context the ISO/MPEG Audio standardization group decided to start a new work item to explore tech- nology for lossless and near-lossless coding of audio sig- nals by issuing a call for proposals for relevant technology in 2002 [29]. Three specifications emerged from this call *Manuscript received 2006 March 23; revised 2006 October 24 and November 27. **Currently with Dolby Laboratories, San Francisco, CA. PAPERS J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 27
17

ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... ([email protected]) ... and its combination with the

Jul 27, 2018

Download

Documents

tranxuyen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

ISO/IEC MPEG-4 High-Definition ScalableAdvanced Audio Coding*

RALF GEIGER,1 RONGSHAN YU,2** JURGEN HERRE,1 SUSANTO RAHARDJA,2([email protected]) ([email protected]) ([email protected]) ([email protected])

SANG-WOOK KIM,3 XIAO LIN,2 and MARKUS SCHMIDT1

([email protected]) ([email protected]) ([email protected])

1Fraunhofer IIS, Erlangen, Germany2Institute for Infocomm Research, Singapore

3Samsung Electronics, Suwon, Korea

Recently the MPEG Audio standardization group has successfully concluded the standard-ization process on technology for lossless coding of audio signals. A summary of the scalablelossless coding (SLS) technology as one of the results of this standardization work is given.MPEG-4 scalable lossless coding provides a fine-grain scalable lossless extension of thewell-known MPEG-4 AAC perceptual audio coder up to fully lossless reconstruction at wordlengths and sampling rates typically used for high-resolution audio. The underlying innova-tive technology is described in detail and its performance is characterized for lossless and nearlossless representation, both in conjunction with an AAC coder and as a stand-alone com-pression engine. A number of application scenarios for the new technology are discussed.

0 INTRODUCTION

Perceptual coding of high-quality audio signals has ex-perienced a tremendous evolution over the past two de-cades, both in terms of research progress and in worldwidedeployment in products. Examples for successful applica-tions include portable music players, Internet audio, audiofor digital media (such as VCD, DVD), and digital broad-casting. Several phases of international standardizationhave been conducted successfully [1]–[3]. Many recentresearch and standardization efforts focus at achievinggood sound quality at even lower bit rates [4]–[9] to ac-commodate storage and transmission channels with lim-ited quality (such as terrestrial and satellite broadcastingand third-generation mobile telecommunication). Never-theless, for other scenarios with higher transmission band-width available, there is a general trend toward providingthe consumer with a media experience of extremely highfidelity [10], as it is frequently associated with the terms“high definition” and “high resolution” and dedicated me-

dia types, such as DVD-Audio [11], SACD [12], HD-DVD [11], or Blu-Ray Disc [13]. In the realm of audio,this is achieved by employing lossless formats with highresolution (word length) and/or high sampling rate.

There exist several proprietary lossless formats, themost prominent of which are MLP Lossless [14], [15],DTS-HD [16], and Apple Lossless [17]. Furthermore, sev-eral freeware coding systems have gained some promi-nence through the Internet. Among them are FLAC [18],Monkey’s Audio [19], and OptimFROG [20].

On the other end of the bit-rate scale, approaches forscalable perceptual audio coding have been developed inthe recent years. Within the International Telecommuni-cation Union (ITU) scalability in wide-band audio codinghas been adopted only recently [21]. Within MPEG-4 Au-dio [3], scalability has been developed and adopted some-time earlier [22]–[24]. This includes the realm of speechcoding where scalability is provided both for harmonicvector excitation coding (HVXC) [25], [26] and code-excited linear prediction (CELP) [27], [28].

In this context the ISO/MPEG Audio standardizationgroup decided to start a new work item to explore tech-nology for lossless and near-lossless coding of audio sig-nals by issuing a call for proposals for relevant technologyin 2002 [29]. Three specifications emerged from this call

*Manuscript received 2006 March 23; revised 2006 October24 and November 27.

**Currently with Dolby Laboratories, San Francisco, CA.

PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 27

Page 2: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

as amendments to the MPEG-4 Audio standard. First thestandard on lossless coding of 1-bit oversampled signals[30] specifies the lossless compression of highly over-sampled 1-bit sigma–delta-modulated audio signals asthey are stored on the SACD media under the name DirectStream Digital (DSD) [31]. Second the audio lossless cod-ing (ALS) specification [32] describes technology for thelossless coding of PCM-coded signals at sampling rates upto 192 kHz as well as floating-point audio.

This paper provides an overview of the third descen-dant, the scalable lossless coding (SLS) specification [33],which extends traditional methods for perceptual audiocoding toward lossless coding of high-resolution audiosignals in a scalable way. Specifically, it allows to scale upfrom a perceptually coded representation (MPEG-4 ad-vanced audio coding, AAC) to a fully lossless representa-tion with high definition, including a wide range of inter-mediate near-lossless representations. The paper startswith an explanation of the general concept, discusses theunderlying novel technology components, and character-izes the codec in terms of compression performance andcomplexity. Finally a number of application scenarios arebriefly outlined.

1 CONCEPT AND TECHNOLOGY OVERVIEW

This section provides an overview of the principle andthe structure of the scalable lossless coding technologyand its combination with the AAC perceptual audio coder.This combination will be referred to as high-definitionadvanced audio coding (HD-AAC) in the following.

1.1 General System StructureFig. 1 shows a very high-level view of the structure of

the HD-AAC encoder. The input signal is first coded by anAAC (“core layer”) encoder. The SLS algorithm uses theoutput to enhance the system’s performance toward loss-less coding, resulting in an enhancement layer. The twolayers of information are subsequently multiplexed intoone high-definition bit stream. Decoding of an HD-AACbit stream is illustrated in Fig. 2. From the HD-AAC bitstream, decoders are able either to decode the perceptuallycoded AAC part only, or to use the additional SLS infor-

mation to produce losslessly/near-losslessly coded audiofor high-definition applications.

1.2 AAC BackgroundSince the MPEG-4 SLS coder has been designed to

operate as an enhancement to MPEG-4 AAC [3], [34], itsstructure is closely related to that of the underlying AACcore coder [35]. This section sketches the architecturalfeatures of MPEG-4 AAC as they are relevant to the SLSenhancement technology.

The underlying AAC codec provides efficient percep-tual audio coding with high quality and achieves broadcastquality at a bit rate of about 64 kbps per channel [36]. Fig.3 gives a very concise view of the AAC encoder’s struc-ture. The audio signal is processed in a blockwise spectralrepresentation using the modified discrete cosine trans-form (MDCT) [37]. The resulting 1024 spectral values arequantized and coded considering the required accuracy asdemanded by the perceptual model. This is done to mini-mize the perceptibility of the introduced quantization dis-tortion by exploiting masking effects. Several neighboringspectral values are grouped into so-called scale factorbands, sharing the same scale factor for quantization. Priorto the quantization/coding (Q/C) tool, a number of pro-cessing tools operate on the spectral coefficients in orderto improve coding performance for certain situations. Themost important tools are the following.

• The temporal noise shaping (TNS) tool [38] carries outpredictive filtering across frequency in order to achievea temporal shaping of the quantization noise accordingto the signal envelope and in this way optimize temporalmasking of the quantization distortion.

• The M/S stereo coding tool [39] provides sum/difference coding of channel pairs, exploits interchannelredundancy for near-monophonic signals, and avoidsbinaural unmasking.

1.3 HD-AAC/SLS EnhancementThe SLS scalable lossless enhancement works on top of

this AAC architecture. The structures of an HD-AAC en-coder and decoder are shown in Figs. 4 and 5.

In the encoder the audio signal is converted into a spectralrepresentation using the integer modified discrete cosinetransform (IntMDCT) [40], [41]. This transform representsan invertible integer approximation of the MDCT and iswell-suited for lossless coding in the frequency domain.Other AAC coding tools, such as mid/side coding or tempo-ral noise shaping, are also considered and performed on theIntMDCT spectral coefficients in an invertible integer fash-ion, thus maintaining the similarity between the spectral val-ues used in the AAC coder and in the lossless enhancement.Fig. 1. HD-AAC encoder structure.

Fig. 2. HD-AAC decoder structure. Fig. 3. Structure of AAC encoder (simplified).

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February28

Page 3: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

The link between the perceptual core and the scalablelossless enhancement layer of the coder [42] is providedby an error-mapping process. The error-mapping processremoves the information that has already been coded in theperceptual (AAC) path from the IntMDCT spectral coef-ficients such that only the resulting (IntMDCT) residualsare coded in the enhancement encoder, and in this way thecoding efficiency benefits from the underlying AAC layer.The error-mapping process also preserves the probabilitydistribution skew of the original IntMDCT coefficients andthus permits an efficient encoding of the residual by meansof two bit-plane coding processes, namely, bit-plane Golombcoding (BPGC) [43] and context-based arithmetic coding(CBAC) [44], and a so-called low-energy-mode encoder,all of which will be described later in further detail.

By using the bit-plane coding process the lossless en-hancement is performed in a fine-grain scalable way. Alossless reconstruction is obtained if all the bit planes ofthe residual are coded, transmitted, and decoded com-pletely. If only parts of the bit planes are decoded, a lossyreconstruction of the signal is obtained with a quality be-tween the AAC layer’s and lossless reconstruction. In or-der to achieve optimal perceptual quality at intermediatebit rates, the bit-plane coding is started from the mostsignificant bit (MSB) for all scale-factor bands, and pro-gresses toward the least significant bit (LSB) for all bands(see Fig. 6). In this way the bit-plane coding process pre-serves the overall spectral shape of the quantization noise,

as it results from the noise-shaping process of the AACperceptual coder, and thus takes advantage of the AACperceptual model.

The following sections will describe the principles un-derlying the design of SLS in greater detail by focusing onits IntMDCT filter bank, error-mapping strategy, and en-tropy coding parts.

2 FILTER BANK

2.1 IntMDCTThe IntMDCT, as introduced in [40], is an invertible

integer approximation of the MDCT, which is obtained byutilizing the “lifting scheme” [45] or “ladder network”[46]. It enables efficient lossless coding of audio signals[40] by means of entropy coding of integer spectral coef-ficients. Furthermore, as the IntMDCT closely approxi-mates the behavior of the MDCT, it allows to combine thestrategies of perceptual and lossless audio coding in thefrequency domain into a common framework [42].

Fig. 7 illustrates the close relationship between IntMDCTand MDCT for a small audio segment by displaying therespective magnitude spectra of both filter banks. The dif-ference between MDCT and IntMDCT values is visible asa small noise floor that is typically much lower than theerror introduced by perceptual coding. Thus the IntMDCTallows to code efficiently the quantization error of anMDCT-based perceptual codec in the frequency domain.

Fig. 4. Structure of HD-AAC encoder.

Fig. 5. Structure of HD-AAC decoder.

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 29

Page 4: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

The following section provides some detail on the deriva-tion of the IntMDCT from the MDCT.

2.2 Decomposition of MDCTThe MDCT, defined by

X!m" =!2N #

k=0

2N!1

w!k"x!k" cos!2k + 1 + N"!2m + 1"!

4N,

m = 0, . . . , N ! 1 (1)

with the time-domain input x(k) and the windowing func-tion w(k) can be decomposed into two blocks, namely,windowing and time-domain aliasing (TDA) and discretecosine transform of type IV (DCT-IV). This is illustratedin Fig. 8 for both forward MDCT and inverse MDCT.

In the forward IntMDCT, the windowing/TDA block iscalculated by 3N/2 so-called lifting steps:

"x!k"

x!N ! 1 ! k"# ! $1 !w!N ! 1 ! k" ! 1

w!k"

0 1%

" "1 0

!w!k" 1#$1 !w!N ! 1 ! k" ! 1

w!k"

0 1%

" "x!k"

x!N ! 1 ! k"#, k = 0, . . . ,N2

! 1. (2)

After each lifting step a rounding operation is applied tostay in the integer domain. Every lifting step can be in-verted by simply adding the subtracted value.

2.3 Integer DCT-IVFor the IntMDCT, the DCT-IV is calculated in an in-

vertible integer fashion, called the integer DCT-IV. Themultidimensional lifting (MDL) scheme [41], [47] is ap-plied in order to reduce the required rounding operationsin the invertible integer approximation as much as possibleand in this way minimize the approximation error noisefloor (see Fig. 7). The following block matrix decompo-sition for an invertible matrix T and the identity matrix Ishows the basic principle of the MDL scheme,

"T 0

0 T!1# = "!I 0

T!1 I #"I !T

0 I #"0 I

I T!1#. (3)

The three blocks in this decomposition are the so-calledMDL steps. Similar to the conventional lifting steps, theycan be transferred to invertible integer mappings by round-ing the floating-point values after being processed by T orT!1, and they can be inverted by subtracting the values thathave been added.

Fig. 6. Bit-plane scan process in SLS.

Fig. 7. IntMDCT and MDCT magnitude spectra.

Fig. 8. MDCT and inverse MDCT by windowing/TDA and DCT-IV.

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February30

Page 5: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

When coding stereo signals, this decomposition is usedto obtain an integrated calculation of the M/S matrix andthe integer DCT-IV for the left and right channels. Thenumber of required rounding operations is 3N per channelpair, or 3N/2 per channel, which is the same number as forthe windowing/TDA stage. As a whole, this so-called ste-reo IntMDCT requires only three rounding operations persample, including M/S processing. Concerning mono sig-nals, the same structure can be used, but has to be ex-tended by some additional lifting steps to obtain the inte-ger DCT-IV of one block; see [47]. This mono IntMDCTrequires four rounding operations per sample.

2.4 Noise ShapingThe lossless coding efficiency of the IntMDCT is fur-

ther improved by utilizing a noise-shaping technique, in-troduced in [48]. In the lifting steps where time-domainsignals are processed, the rounding operations include anerror feedback mechanism to provide a spectral shaping ofthe approximation noise. This approximation noise affectsthe lossless coding efficiency mainly in the high-frequency region where audio signals usually carry only asmall amount of energy, especially at sampling rates of 96kHz and above. Hence the low-pass characteristics of theapproximation noise improve the lossless coding effi-ciency. A first-order noise-shaping filter is employed inthe three stages of lifting steps in the windowing/TDAprocessing and in the first rounding stage of the integerDCT-IV processing. Fig. 9 compares the resulting ap-proximation error between the IntMDCT values and theMDCT values rounded to integer, when the IntMDCToperates both with and without noise shaping.

3 ERROR MAPPING ANDRESIDUAL CALCULATION

The objective of the error-mapping/residual calculationstage is to produce an integer enhancement signal thatenables lossless reconstruction of the audio signal while

consuming as few bits as possible. Rather than encodingall IntMDCT coefficients c[k] directly, in the lossless en-hancement layer, it is more efficient to make use of theinformation that has already been coded by the AAC layer.This is achieved by the error-mapping process, whichproduces a deterministic residual signal between theIntMDCT spectral values and their counterparts from theAAC layer.

In order to produce a residual signal e[k] with the small-est possible entropy, given the core AAC quantized valuei[k], it is sufficient in most cases to use the minimum meansquare error (MMSE) residual obtained by subtracting theIntMDCT coefficient c[k] from its MMSE reconstructionc[k] ! E{c[k]|i[k]},

given i[k], that is,

e$k% = c$k% ! c$k%. (4)

Here E{·} denotes the expectation operation. However, inSLS a somewhat different approach is adopted, where theresidual signal is given by

e$k% = &c$k% ! thr$k%, i $k% " 0

c$k%, i $k% = 0(5)

where thr (i[k]) is the next quantized value closer to zerowith respect to i[k], and is calculated via table lookup andlinear interpolation to ensure the deterministic behaviornecessary for lossless coding. This error-mapping processis illustrated in Fig. 10, where two different cases areshown. In the first case the IntMDCT coefficients c[k]belong to a scale-factor band that has been quantized andcoded at the AAC encoder (significant band). For thesecoefficients the residual coefficients are obtained by sub-tracting the quantization thresholds from their correspond-ing c[k], resulting in a residual spectrum with reducedamplitude. In the other case c[k] belongs to a band that isnot coded or has been quantized to zero in the AAC en-coder (insignificant band). In this case the residual spec-trum is simply the IntMDCT spectrum itself.

This error-mapping process offers many advantages thatpermit better coding efficiency in the enhancement layer.First, as illustrated in Fig. 11, we notice that if the ampli-tude of c[k] is distributed exponentially, the amplitude of

Fig. 9. Mean-squared approximation error of stereo IntMDCT(including M/S) with and without noise shaping. Fig. 10. Illustration of error-mapping process.

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 31

Page 6: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

e[k] will likewise be distributed approximately exponen-tially. Thus it can be coded very efficiently by using theBPGC/CBAC coding process described in the next sec-tion. Secondly, for a significant band we find the follow-ing property for the coefficients c[k]:

| thr!i$k%" | # |c$k% | $ | thr!i$k%" | + %!i$k%" (6)where

%!i$k%" = thr!| i$k% | + 1" ! thr!| i$k% |" (7)

is the quantization step size of i[k]. Clearly, the magnitudeof e[k] is bounded by the value of %(i[k]), and its sign isalso identical to that of i[k] for i[k] " 0. As a result, noadditional data transmission is needed for conveying thesign information. This is referred to as “implicit signaling”in the SLS specification.

This implicit signaling mechanism assumes that the in-put to the SLS layer and the AAC core quantization valueare “well synchronized.” This may not be true in all cases,given that AAC encoders have all the freedom to optimizethe encoding process and thus may produce an i[k] thatdoes not satisfy Eq. (6). Thus SLS also includes an “ex-plicit signaling” mechanism, which employs an MMSEerror mapping as defined in Eq. (4), where c[k] is approxi-mated by the AAC inverse quantization value. In this caseall the side information necessary for decoding e[k] issignaled explicitly from the encoder to the decoder.

4 ENTROPY CODING

To achieve best efficiency in the compression of the en-hancement layer information, several methods for entropycoding of the IntMDCT residual information are employed.

4.1 Combined BPGC/CBAC CodingThe bit-plane Golomb code (BPGC) coding process

used in SLS is basically a bit-plane coding scheme where

the bit-plane symbols are coded arithmetically with astructural frequency assignment rule. Considering an inputdata vector e ! {e[0], . . , e[N ! 1]} for which N is thedimension of e, each element e[k] in e is first representedin a binary format as

e$k% = !2s$k% ! 1" #j=0

M!1

b$k, j% & 2 j, k = 0, . . . , N ! 1

(8)

which consists of a sign symbol

s$k% != &1, e$k% ' 0

0, e$k% $ 0, k = 0, . . . , N ! 1 (9)

and bit-plane symbols b[k, j] # {0, 1}, i ! 1, . . . , k, andM is the MSB for e that satisfies 2M!1 # max { |e[k] |} <2M, k ! 0, . . . , N ! 1. The bit-plane symbols are thenscanned and coded from MSB to LSB over all the ele-ments in e, and coded by using an arithmetic code with astructural frequency assignment QJ

L given by

QL$ j% = '12

, j $ L

1

1 + 22 j!L , j ' L(10)

where the “lazy-plane” parameter L can be selected usingthe adaptation rule

L = min&L" # " |2L"+1N ' A' (11)

and A is the absolute sum of the data vector e.Although this BPGC coding process delivers excellent

compression performance for data that stem from an in-dependent and identically distributed (iid) source withLaplacian distribution [43], it lacks the capability to ex-plore the statistical dependencies between data samplesthat may exist in certain sources to achieve better com-pression performance. These correlations can be capturedvery effectively by incorporating context modeling tech-nology into the BPGC coding process, where the fre-quency assignment for arithmetic coding of bit-plane sym-bols is not only dependent on the distance of the current bitplane from the lazy-plane parameter as in the frequencyassignment rule [Eq. (10)], but also on other possible el-ements that may affect the probability distribution of thesebit-plane symbols.

In the context of lossless coding of IntMDCT spectraldata for audio, elements that possibly affect the distribu-tion of the bit-plane symbols include the frequency loca-tions of the IntMDCT spectral data, the magnitude of theadjacent spectral lines, and the status of the AAC corequantizer. In order to capture these correlations, severalcontexts are used in the context-based arithmetic code(CBAC) of SLS. The guide in selecting these contexts is totry to find those contexts that are “most” correlated to thedistribution of the bit-plane symbols.

In CBAC, three types of contexts are used, namely, thefrequency band (FB) context, the distance to lazy (D2L)

Fig. 11. Distribution of residual signal from error-mappingprocess.

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February32

Page 7: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

context, and the significant state (SS) context. The detailedcontext assignments are summarized in the following.

Context 1: Frequency Band (FB) It is found in [44]that the probability distribution of bit-plane symbols ofIntMDCT varies for different frequency bands. Thereforein CBAC the IntMDCT spectral data are classified intothree different FB contexts, namely, low band (0–4 kHz),midband (4–11 kHz), and high band (above 11 kHz).

Context 2: Distance to Lazy (D2L) The D2L context isdefined as the distance of the current bit plane j to theBPGC lazy-plane parameter L, as defined in the followingequation:

D2L = &3 ! j + L , j ! L ' !2

6, otherwise.(12)

This is motivated by the BPGC frequency assignment rule,Eq. (10), which is based on the fact that the skew of theprobability distribution of bit-plane symbols from a sourcewith a (near) Laplacian distribution tends to decrease asthe number of D2L decreases. To reduce the total numberof the D2L context, all the bit planes with j ! L < !2 aregrouped into one context where all the bit-plane symbolsare coded with probability 0.5.

Context 3: Significant State (SS) The SS context at-tempts to capture the factors that correlate with the distri-bution of the magnitude of the IntMDCT residual in oneplace. These include the magnitude of the adjacentIntMDCT spectral lines and the quantization interval ofthe AAC core quantizer if it has been quantized previouslyin the core encoder. Further detail on the SS context can befound in [49].

4.2 Low-Energy-Mode CodingThe BPGC/CBAC coding process described earlier

works well for sources with (near) Laplacian distribution,which usually is the case for most audio signals [50].However, it was also found that for some music itemsthere are time/frequency (T/F) regions with very low en-ergy levels where the IntMDCT spectral data are in factdominated by the rounding errors of the IntMDCT (seeFig. 7) with a distribution that is substantially differentfrom Laplacian. In order to encode those low-energy re-gions efficiently, the BPGC/CBAC coding process is re-placed by low-energy-mode coding, as shown in thefollowing.

The low-energy-mode coding is invoked for scale factorbands for which the BPGC parameter L is smaller than orequal to 0. Then the amplitude of the residual spectral datae[k] is first converted into a unitary binary string b !{b[0], b[1], . . . , b[pos], . . .}, as illustrated in Table 1,with M being the maximum bit plane. It can be seen thatthe probability distribution of these symbols is a functionof the position pos, and the distribution of e[k];

Pr&b$pos% = 1' = Pr&e$k% ( pos |e$k% ' pos' (13)

where 0 # pos < 2M.b[pos] is then coded arithmeticallydepending on its position pos and the BPGC parameter Lwith a trained frequency table.

4.3 Smart DecodingDue to their fine-grain scalability, HD-AAC bit streams

can be truncated at any bit rate lower than what would beneeded for a fully lossless reconstruction to produce near-lossless representations of the audio signal. In the contextof arithmetic decoding, the smart decoding method pro-vides a way to optimally decode such a truncated bitstream. It decodes additional symbols in the absence ofincoming bits when a decoding buffer still contains mean-ingful information for arithmetic decoding in the CBAC/BPGC mode and/or low-energy mode. Decoding contin-ues up to the point where no ambiguity exists in deter-mining a symbol [51].

5 OTHER CODING TOOLS

As counterparts to the underlying AAC perceptual au-dio coder, SLS provides a number of integerized versionsof AAC coding tools.

5.1 Integer M/SIn the AAC codec the M/S tool allows to choose indi-

vidually between mid/side and left/right coding modes ona scale-factor-band basis. As shown in Section 2.3 it canprovide either a global left/right or a global mid/side spec-tral representation. In order to make the integer spectralvalues fit the spectral values from the AAC core on ascale-factor-band basis, an invertible integer version of theM/S mapping is used. It is based on a lifting decomposi-tion of the normalized M/S matrix, that is, a rotationby !/4,

1

(2"1 !1

1 1 # = "1 1 ! (2

0 1#"1 0

1)(2 1#" "1 1 ! (2

0 1#. (14)

5.2 Integer TNSWhen the temporal noise-shaping (TNS) tool is used in

the AAC core, the resulting MDCT spectral values deviatefrom the IntMDCT spectral values. In order to compensatefor this, the same TNS filter as in the AAC core is appliedto the integer spectral values in the lossless enhancement.To assure lossless operation, the TNS filter is converted toa deterministic invertible integer filter.

Table 1. Binarization of IntMDCT error spectrum atlow-energy mode.

Amplitude of e[k] Binary String {b[pos]}

0 01 1 02 1 1 0. . . . . .2M !2 1 1 . . . . . . . . . 1 02M !1 1 1 . . . . . . . . . 1 1

pos 0 1 2 3 . . .

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 33

Page 8: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

6 BIT STREAM MULTIPLEXING

From a mechanical point of view, the SLS coded data,including the core layer AAC bit stream and the enhance-ment bit stream, can be carried in multiple elementarystreams (ES) in an MPEG-4 system [52]. As shown in Fig.12, the AAC bit stream is carried in the so-called baselayer ES, and the enhancement bit stream is carried in oneor more enhancement layer ESs. Each ES is thus com-posed of a sequence of access units (AUs), where one AUcontains one audio frame from the AAC bit stream or theenhancement bit stream.

From an application point of view, such a bit streamstructure provides great flexibility in constructing either alarge-step scalable system or a fine-grain scalable systemwith SLS. For example, in a scalable audio streaming ap-plication, the server stores multiple SLS ESs at predefinedbit rates, each assigned with a different stream priority.During the streaming, the ESs are transmitted in the orderof their stream priority, and ESs with lower stream priorityare dropped by either the streaming server or the networkgateway whenever the transmission bandwidth is insuffi-cient to stream a full-rate lossless bit stream. Alternativelyit is also possible to implement a lightweight bit streamtruncation algorithm in the streaming server or in the mul-timedia gateway to truncate the enhancement stream AUsdirectly according to the available bandwidth to achievethe fine granular bit rate scalability.

7 MODES OF OPERATION

So far the HD-AAC coder has been described as a com-bination of a regular AAC core layer coder (such as lowcomplexity AAC) and an SLS enhancement layer. Thissection introduces further features offered by the SLS en-hancement layer that concern its combination with the

AAC coder, and its ability to be used in combination withother types of AAC-based codecs, such as AAC scalableor AAC/BSAC.

7.1 Oversampling ModeIn the context of high-definition audio applications it is

frequently desirable to achieve lossless signal reconstruc-tion at sampling rates of 96 kHz, or even 192 kHz. Whilethe MPEG-4 AAC coder supports these sampling rates, ittypically achieves best coding efficiency for high-qualityperceptual coding at sampling rates between 32 and 48kHz. Thus in order to allow both efficient core layer cod-ing at common rates and lossless reconstruction at higherrates, MPEG-4 SLS includes an additional feature called“oversampling mode.” This refers to the possibility of let-ting the lossless enhancement operate at a sampling ratehigher than that of the AAC core codec. The ratio betweenthe SLS sampling rate and the AAC sampling rate is called“oversampling factor” and can be either 1, 2, or 4. Forexample, the lossless enhancement can operate at a rate of192 kHz, whereas the AAC core operates at 48 kHz, seeTable 2.

The mapping between the two coding layers is achievedby using a time-aligned framing and a correspondinglylonger IntMDCT in the lossless enhancement. For ex-ample, an IntMDCT of the size of 4096 spectral values is

Table 2. Example combinations of sampling rates for AACcore and lossless enhancement.

AAC @48 kHz

AAC @96 kHz

AAC @192 kHz

SLS @ 48 kHz "SLS @ 96 kHz " "SLS @ 192 kHz " " "

Fig. 12. Structure of MPEG-4 SLS bit stream.

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February34

Page 9: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

used in case of oversampling by a factor of 4, and the 1024MDCT values from the AAC core are mapped to the lower1024 IntMDCT values. In addition to allowing for opti-mum AAC performance, the lossless performance is im-proved when using longer IntMDCT transforms. (A trans-form length of 2048 or 4096 provides a better losslessperformance for stationary signals than a transform lengthof 1024.)

7.2 Combination with MPEG-4Scalable AAC

In order to account for varying or unknown transmis-sion capacity, the MPEG-4 AAC codec also provides sev-eral scalable coding modes [3], [34]. MPEG-4 scalableAAC allows perceptual audio coding with one or moreAAC mono or stereo layers and can be combined with anSLS enhancement layer. This results in an overall coderthat offers fine-grain scalability in the range between loss-less reconstruction and the AAC representation, and scal-ability in several steps within the perceptually coded AACrepresentation.

7.2 Combination with MPEG-4 AAC/BSACSimilar to the combination with scalable AAC, the SLS

enhancement can also be operated on top of MPEG-4AAC/BSAC as a core layer coder. This provides a fine-grain scalable representation both between lossless andperceptually coded audio and within the perceptuallycoded range. The latter has a granularity of 1 kbps perchannel.

7.4 Stand-Alone OperationFinally the SLS lossless enhancement can also operate

as a straightforward stand-alone codec, without any un-derlying core codec. Nonetheless, this operation mode of-fers both full lossless coding capability and fine-grainscalability.

8 PERFORMANCE

This section quantifies the performance of the HD-AACcodec in various operation modes. While the compressionratio can be seen as the sole measure of merit for losslessoperation scenarios, an evaluation of performance fornear-lossless operation requires audio quality measure-ments at various data rates and operating points.

8.1 Lossless Compression PerformanceThis section reports the performance of HD-AAC in

terms of its ability for lossless compression of variousaudio material. As a figure of merit, the compression ratiois defined as

compression ratio =original file size

compressed file size. (15)

Tables 3 and 4 show the lossless compression perfor-mance for two major sets of test material, that is, theMPEG-4 lossless audio coding test set (donated by Mat-sushita Corporation and containing in part recordings per-formed by the New York Symphonic Ensemble). For bothsets the compression results are given for an HD-AACconfiguration with an AAC core layer running at 128kbit/s stereo plus SLS enhancement, and for an SLS stand-alone configuration.

As could be expected from theory, it can be observed inTable 3 that an increase in word length reduces the aver-age compression ratio (due to the fact that the least sig-nificant bits of the PCM codewords are more random andthus less compressible). On the other hand, increasing thesampling rate improves compression because of the in-creased correlation between adjacent samples (assumingsound material with typical high-frequency characteristics).

For the MPEG-4 lossless audio test set, an average com-pression ratio of 2:1 can be achieved easily at a samplingrate of 48 kHz and 16-bit word length. This is competitivewith the best of other known lossless compression systems[53]. It can also be observed that for the AAC-based modean additional bit rate of only 30–40 kbps is required com-pared to the stand-alone mode for lossless representation.This reduces the bit rate consumption by 90–100 kbpscompared to simulcast solutions that transmit both an 128-kbps AAC bit stream and a stand-alone SLS lossless bitstream simultaneously.

8.2 Near-Lossless Compression PerformanceThe bit-plane coding of residual spectral values (that is,

of the AAC quantization error) allows to refine the initialAAC quantization successively as more bits from the SLSenhancement layer are decoded. With each additional de-coded bit plane the quantization error is reduced by 6 dB.Consequently an increasing safety margin with respect toaudibility is added as the bit rate increases.

Table 3. Lossless compression results for MPEG-4 lossless audio test set.

SLS + AAC @ 128 kbps/Stereo(AAC @ 48 kHz sampling rate) SLS Stand-Alone

CompressionRatio

AverageBit Rate(kbps)

CompressionRatio

AverageBit Rate(kbps)

48 kHz/16 bit 2.09 735 2.20 69848 kHz/24 bit 1.55 1490 1.58 145496 kHz/24 bit 2.09 2201 2.13 2160192 kHz/24 bit 2.60 3543 2.63 3509Overall 2.08 1992 2.12 1955

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 35

Page 10: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

8.2.1 Evaluation of Near-Lossless Audio QualityWhile it may seem sufficient for most purposes to pro-

vide perceptually transparent reproduction of audio signalsby using conventional perceptual audio coders (such asAAC at a sufficient bit rate), there are applications thatdemand still higher audio quality. This is especially thecase for professional audio production facilities, such asarchiving and broadcasting, in which audio signals mayundergo many cycles of encoding/decoding (tandem cod-ing) before being delivered to the consumer. This leads toan accumulation of introduced coding distortion and maylead to unacceptable final audio quality [54], unless sub-stantial headroom toward audibility is provided by eachcoding step, for example, by using coding algorithms withvery high quality or bit rate.

ITU-R BS.1548-1 [55] defines the requirements for au-dio coding systems for digital broadcasting, assuming acodec chain consisting of so-called contribution, distribu-tion, and emission codecs. According to this recommen-dation, and based on ITU-R BS.1116-1 [56], audio codecsfor contribution and distribution should fulfill the follow-ing requirements:

The quality of sound reproduced after a reference contribution/distribution cascade [. . .] should be subjectively indistinguish-able from the source for most types of audio programme ma-terial. Using the triple stimuli double blind with hidden referencetest, described in Recommendation ITU-R BS.1116 [. . .], thisrequires mean scores generally higher than 4.5 in the impair-ment 5-grade scale, for listeners at the reference listening po-sition. The worst rated item should not be graded lower than 4.

In accordance with these recommendations, tests wererun on signals encoded or decoded with HD-AAC. Instead

of running numerous listening tests for subjective qualityassessment at individual operating points, the evaluationemployed the PEAQ measurement (BS.1387-1) [57],which provides methods for objective measurements ofperceived audio quality in scenarios that are normally as-sessed by ITU-R BS.1116 testing. The most essential re-sults can be seen in Figs. 13–15; see also [58].

The graphs show the estimated subjective sound qualityexpressed as objective difference grade (ODG) values,which were computed by a PEAQ measurement. Theevaluation procedure consists of multiple cycles of tandemcoding/decoding with up to 16 cycles. The standard set ofcritical MPEG-4 audio items for perceptual audio codingevaluations was used. ODG values of 0, !1, !2, !3, !4correspond to a subjective audio quality of “indistinguish-able from original,” “perceptible but not annoying,” “slightlyannoying,” “annoying,” and “very annoying,” respectively.

Fig. 13 shows the achieved ODG values as a function oftandem cycles for a traditional AAC coder running at a bitrate of 128 kbps/stereo. As expected, it can be observedthat the audio quality degrades significantly with an in-creasing number of tandem cycles, depending on the testitem. For this reason tandem coding is not a recommendedpractice for such coders.

Fig. 14 displays the corresponding tandem coding re-sults for the HD-AAC combination running at 512 kbps/stereo (AAC at 128 kbps + SLS enhancement at 384 kbps).It can be noted that the audio quality remains consistentlyat a very high level, even after a total of 16 tandem cycles.This illustrates the high robustness of the HD-AAC rep-resentation against tandem coding. According to thesemeasurements, the aforementioned BS.1548-1 audio qual-ity requirement is fulfilled with a considerable safety mar-gin. Furthermore, when placed in tandem with AAC (such

Table 4. Lossless compression results for commercial CD test set.

CD Items (16 bit/44.1 kHz)

Compression Ratio

SLS+AAC @128 kbps/Stereo

SLSStand-Alone

ACDC—Highway to Hell (Sony 80206) 1.31 1.36Avril Lavigne—Let Go (Arista 14740) 1.36 1.41Backstreet Boys—Greatest Hits Chapter One (Jive 41779) 1.39 1.45Brian Setzer—The Dirty Boogie (Interscope 90183) 1.43 1.49Cowboy Junkies—Trinity Session (RCA-8568) 1.93 2.04Grieg—Peer Gynt, von Karajan (DG 439010) 2.63 2.83Jannifer Warnes—Famous Blue Raincoat (BMG 258418) 2.07 2.20Marlboro Music Festival—DISC A (Bridge 9108) 2.23 2.35Marlboro Music Festival—DISC B (Bridge 9108) 2.20 2.33Nirvana—Nirvana (Interscope 493523) 1.50 1.56Philip Jones—40 Famous Marches, CD1 (Decca 416241) 1.99 2.11Philip Jones—40 Famous Marches, CD2 (Decca 416241) 2.00 2.11Pink Floyd—Dark Side of the Moon (Capitol 46001) 1.76 1.85Rebecca Pidgeon—The Raven (Chesky 115) 1.88 1.97Ricky Martin (Sony 69891) 1.33 1.38Schubert Piano Trio in E-flat (Sony 48088) 2.74 2.90Spaniels—The Very Best Of (Collectables 7243) 2.41 2.62Steeleye Span—Below the Salt (Shanachie 79039) 1.85 1.95Suzanne Vega—Solitude Standing (A&M 5136) 1.74 1.83Westminster Concert Bell Choir—Christmas Bells (Gothic Records 49055) 2.55 2.71

Overall 1.85 1.94

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February36

Page 11: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

as for final audio distribution over a narrow-band chan-nel), the resulting audio quality is not degraded signifi-cantly by the preceding HD-AAC tandem cascade. Furtherdetails can be found in [59].

8.2.2 Stand-Alone SLS OperationThe SLS codec can also operate as a stand-alone loss-

less codec when the AAC core codec is not used, some-

Fig. 13. Test results: AAC tandem coding.

Fig. 14. Test results: HD-AAC tandem coding.

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 37

Page 12: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

times also referred to as the “noncore mode”. Despite of itssimple structure (only IntMDCT and BPGC/CBAC mod-ules are used), this mode allows efficient lossless coding[59]. Furthermore, fine-grain scalability by truncated bit-plane coding is also possible in this mode. Given that thestand-alone SLS codec does not include any perceptualmodel to estimate masking thresholds, it is interesting toinvestigate the audio quality resulting from a truncation ofthe SLS bit stream.

Due to the behavior of bit-plane coding in this mode, aconstant signal-to-noise ratio is achieved in each scale-factor band. With each additional bit plane the signal-to-noise ratio improves by 6 dB. While this behavior does notallow SLS to compete with efficient perceptual codecs atlow bit rates (for example, AAC at 128 kbps/stereo), thissimple approach works quite well at higher bit rates in thenear-lossless range. Fig. 15 shows tandem coding resultsfor the stand-alone SLS codec operating at 512 kbps/stereo. It reaches about the same near-lossless audio qual-ity as the AAC-based HD-AAC mode discussed in theprevious section.

At a constant bit rate of 768 kbps most test items stillrequire the truncation of some coder frames. Neverthelessthe corresponding PEAQ measurements indicate that nodegradation of subjective audio quality occurs in this tan-dem coding scenario for both the AAC-based mode andthe stand-alone mode; see [59]. This provides an interest-ing operating point for HD-AAC modes, corresponding toa guaranteed 2:1 compression. While other stand-alonelossless codecs can also provide an average compressionof 2:1 for suitable test material, their peak compression

performance can be much lower, depending on the audiomaterial to be encoded. In contrast, HD-AAC is able toguarantee a certain compression ratio while providinglossless or near-lossless signal representation, dependingon the input signal.

9 DECODER COMPLEXITY

The computational complexity for SLS decoding can beevaluated by counting the total number of standard in-structions (multiplications, additions, bit shifts, compari-sons, memory transfers) required for performing the de-coding process on a generic 32-bit fixed-point CPU.

The main components contributing to the computationalcomplexity of SLS are:

1) IntMDCT filter bank2) Bit-plane arithmetic decoder3) AAC Huffman decoding4) AAC + SLS inverse error mapping5) Integer M/S stereo coding6) Unpacking of tables.Items 3) to 5) are only required in the AAC-based mode,

item 6) only if the necessary tables are not precomputed.

9.1 Number of InstructionsTable 5 lists the number of instructions required for de-

coding in the AAC-based mode, with the AAC core operat-ing at 64 kbps per channel. Table 6 shows the correspond-ing numbers for the SLS in stand-alone mode (without theAAC core layer). For both tables, values are provided forboth implementations with and without table prepacking.

Fig. 15. Test results: stand-alone SLS tandem coding.

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February38

Page 13: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

9.2 ROM RequirementsFor an implementation of SLS in stand-alone mode, a

ROM size of 4 kbytes is required. For the AAC-basedmode, the ROM requirement is 45 kbytes. As can be seenfrom Tables 5 and 6, a tradeoff between ROM requirementand number of instructions can be made by precomputingthe necessary table values. More details on SLS compu-tational complexity can be found in [59]. The computa-tional complexity of AAC decoding is analyzed in [60].

10. APPLICATIONS

As the primary functionality of HD-AAC audio codingis lossless audio coding, it can be used in applications thatrequire bit-exact reconstruction, such as studio operation,music disc delivery, or audio archiving. Due to its inherentscalability, HD-AAC audio coding technology in fact fitsinto virtually every application that requires audio com-pression. Several potential application scenarios are listedhere.

Studio Operations HD-AAC audio coding technologyis useful for the storage of audio at various points in studiooperations such as recording, editing, mixing, and premas-tering as studio procedures are designed to preserve thehighest levels of quality. The scalability of the SLS layeralso provides a nice solution to situations in which the band-width is not sufficient to support fully lossless quality.

Archival Application Archives of sound recordingsare very common in studios, record labels, libraries, and soon. These archives are tremendously large and certainlycompression is essential. In addition, the scalability of the

SLS technology facilitates the possibility that lower bit-rate versions of the archive’s lossless audio items can beextracted at any time to allow applications such as remotedata browsing.

Broadcast Contribution/Distribution Chain In abroadcast environment HD-AAC audio coding technologycould be used in all stages comprising archiving, contri-bution/distribution, and emission. In the broadcast chainone main feature of the technology can be used: In everystage where lower bit rates are required, the bit stream ismerely truncated, and no reencoding is therefore required.

Consumer Disc-Based Delivery HD-AAC technologycan also be used in consumer disc-based delivery of musiccontent. It enables the music disc to deliver both losslessand lossy audio on the same medium.

Internet Delivery of Audio In such an application sce-nario the available transmission bandwidth can vary dra-matically across different access network technologies andover time. As a result, the same audio content at a varietyof bit rates and qualities may need to be kept ready at theserver side. HD-AAC technology provides a “one-file”solution for such a requirement.

Audio Streaming HD-AAC technology delivers thevital bit-rate scalability for streaming applications onchannels with variable quality of service (QoS) conditions.Examples for this kind of streaming applications includeInternet audio streaming and multicast streaming applica-tions that feed several channels of differing capacity.

Digital Home The idea of the digital home is to createan open and transparent home network platform that en-ables consumers to easily create, use, manage, and sharedigital content such as audio, video, or image. In a typical

Table 5. Maximum numbers of INT32 operations per sample for SLS decoding with AAC core.

Frame Length Muls Adds/Subs Ors Shifts Negs MovsCombined

All ! 1 Cycle

Tables Preunpacked4096 or 512 19.50 93.46 34.50 72.60 34.26 16.54 270.862048 or 256 20.25 91.22 31.50 73.54 32.25 16.29 265.051024 or 128 18.00 88.99 28.50 68.50 30.85 15.99 250.83

Tables Unpacked in Place4096 or 512 19.50 123.21 34.50 88.01 34.26 16.54 316.102048 or 256 20.25 117.47 31.50 85.54 32.25 16.29 303.301024 or 128 18.00 105.24 28.50 75.00 30.85 15.99 273.58

Table 6. Maximum number of INT32 operations per sample for SLS stand-alone decoding.

Frame Length Muls Adds/Subs Ors Shifts Negs MovsCombined

All ! 1 cycle

Tables Preunpacked4096 or 512 18 86.63 34.50 67.10 29.93 9.54 245.702048 or 256 18.75 84.39 31.50 68.04 27.92 9.29 239.891024 or 128 16.5 82.16 28.50 63.00 26.52 8.99 225.67

Tables Unpacked in Place4096 or 512 18.00 116.38 34.50 82.6 29.93 9.54 290.052048 or 256 18.75 110.64 31.50 80.04 27.92 9.29 278.141024 or 128 16.50 98.41 28.50 69.50 26.52 8.99 248.42

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 39

Page 14: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

setup for audio, the user can download the HD-AACcoded bit streams in lossless quality from the service pro-vider and archive them on the home music server. Thesebit streams are then streamed, or downloaded to differentaudio terminals at differing quality for playback.

11. CONCLUSIONS

The new ISO/MPEG specification for scalable losslesscoding extends the well-known perceptual coding schemeAAC toward lossless and near-lossless operation, and inthis way enables its use in the context of high-definitionapplications. The HD-AAC scheme offers competitivelossless compression rates at all relevant operating points(word length and sampling rate). For distribution on band-width-limited channels a perceptually coded compatibleAAC bit stream can simply be extracted from the com-posite HD-AAC stream. Alternatively, the SLS part canalso be used as a simple and versatile stand-alone com-pression engine. In both cases the fidelity of the signalrepresentation can be scaled with fine granularity within awide range of near lossless representations. This enableslossless or near lossless transmission of high-definitionaudio with a guaranteed maximum rate. We anticipate thatthis flexibility will make HD-AAC the technology ofchoice for many applications that call for both very highaudio quality and delivery over a wide range of transmis-sion channels.

12 ACKNOWLEDGMENT

The authors would like to thank all their colleagues atthe MPEG audio subgroup who supported the losslessstandardization activity, especially Takehiro Moriya(NTT) for inspiring these standardization activities, Til-man Liebchen (Technical University of Berlin) for chair-ing the ad hoc group on lossless audio coding, and YuriyReznik (Real Networks/Qualcomm) for his thorough com-plexity evaluations.

13 REFERENCES

[1] ISO/IEC 11172-3, “Coding of Moving Pictures andAssociated Audio for Digital Storage Media at up to about1.5 Mbit/s—Part 3: Audio,” International Standards Orga-nization, Geneva, Switzerland (1992).

[2] ISO/IEC 13818-3, “Information Technology—Generic Coding of Moving Pictures and Associated Au-dio—Part 3: Audio,” International Standards Organiza-tion, Geneva, Switzerland (1994).

[3] ISO/IEC 14496-3:2001, “Coding of Audio-VisualObjects—Part 3: Audio,” International Standards Organi-zation, Geneva, Switzerland (2001).

[4] ISO/IEC 14496-3:2001/Amd.1:2003, “Coding ofAudio-Visual Objects—Part 3: Audio, Amendment 1:Bandwidth Extension,” International Standards Organiza-tion, Geneva, Switzerland (2003).

[5] M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz,“Spectral Band Replication—A Novel Approach in Audio

Coding,” presented at the 112th Convention of the AudioEngineering Society, J. Audio Eng. Soc. (Abstracts), vol.50, pp. 509, 510 (2002 June), convention paper 5553.

[6] ISO/IEC 14496-3:2001/Amd.2:2004, “Coding ofAudio-Visual Objects—Part 3: Audio, Amendment 2:Parametric Coding for High Quality Audio,” InternationalStandards Organization, Geneva, Switzerland (2004).

[7] C. den Brinker, E. Schuijers, and W. Oomen, “Para-metric Coding for High-Quality Audio,” presented at the112th Convention of the Audio Engineering Society, J.Audio Eng. Soc. (Abstracts), vol. 50, p. 510 (2002 June),convention paper 5554.

[8] ISO/IEC FCD 23003-1, “MPEG-D (MPEG AudioTechnologies)—Part 1: MPEG Surround” InternationalStandards Organization, Geneva, Switzerland (2006).

[9] J. Breebaart, J. Herre, C. Faller, J. Roeden, F. My-burg, S. Disch, H. Purnhagen, G. Hotho, M. Neusinger, K.Kjoerling, and W. Oomen, “MPEG Spatial Audio Coding/MPEG Surround: Overview and Current Status,” pre-sented at the 119th Convention of the Audio EngineeringSociety, J. Audio Eng. Soc. (Abstracts), vol. 53, p. 1228(2005 Dec.), convention paper 6599.

[10] “Special Issue: High-Resolution Audio,” J. AudioEng. Soc., vol. 52, pp. 116–260 (2004 Mar.).

[11] “DVD Forum,” http://www.dvdforum.org/forum.shtml (2004).

[12] Royal Philips Electronics, “Super Audio CD Sys-tems,” http://www.licensing.philips.com/information/sacd/ (2006).

[13] Blu-Ray Disc Association, “Blu-Ray Disc,” http://www.blu-raydisc.com/ (2006).

[14] Dolby Laboratories Inc., “MLP Lossless” http://www.dolby.com/consumer/technology/mlp_lossless.html(2006).

[15] M. A. Gerzon, P. G. Craven, J. R. Stuart, M. J.Law, and R. J. Wilson, “The MLP Lossless CompressionSystem,” in Proc 17th AES Conf. (Florence, Italy, 1999Sept.), pp. 61–75.

[16] DTS Inc., “DTS HD,” http://www.dtsonline.com/consumer/dtshd.php (2006).

[17] Apple Computer Inc., “Apple Quicktime,” http://www.apple.com/downloads/macosx/apple/quicktime651.html (2006).

[18] J. Coalson, “FLAC—Free Lossless Audio Codec,”http://flac.sourceforge.net (2006).

[19] M. T. Ashland, “Monkey’s Audio—A Fast andPowerful Lossless Audio Compressor,” http://www.monkeysaudio.com (2004).

[20] F. Ghido, “OptimFROG,” http://www.losslessaudio.org (2006).

[21] ITU-T G.729.1, “G.729 Based Embedded VariableBit-Rate Coder: An 8–32 kbit/s Scalable Wideband CoderBitstream Interoperable with G.729,” International Tele-communication Union, Geneva, Switzerland (2006).

[22] B. Grill, “A Bit-Rate Scalable Perceptual Coder forMPEG-4 Audio presented at the 103rd Convention of theAudio Engineering Society, J. Audio Eng. Soc. (Ab-stracts), vol. 15, p. 1005 (1997 Nov.), preprint 4620.

[23] J. Herre, E. Allamanche, K. Brandenburg, M.

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February40

Page 15: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

Dietz, B. Teichmann, B. Grill, A. Jin, T. Moriya, N.Iwakami, T. Norimatsu, M. Tsushima, and T. Ishikawa,“The Integrated Filterbank-Based Scalable MPEG-4 Au-dio Coder,” presented at the 105th Convention of the Au-dio Engineering Society, J. Audio Eng. Soc. (Abstracts),vol. 46, p. 1039 (1998 Nov.), preprint 4810.

[24] S. H. Park, Y. B. Kim, S. W. Kim, and Y. S. Seo,“Multi-Layer Bit-Sliced Bit-Rate Scalable Audio Coding,”presented at the 103rd Convention of the Audio Engineer-ing Society, J. Audio Eng. Soc. (Abstracts), vol. 45, p.1005 (1997 Nov.), preprint 4520.

[25] M. Nishiguchi, “MPEG-4 Speech Coding,” inProc. 17th AES Conf. (Florence, Italy, 1999 Sept.), pp.139–146.

[26] M. Nishiguchi, A. Inoue, Y. Maeda, and J. Matsu-moto, “Parametric Speech Coding—HVXC at 2.0–4.0kbps,” presented at the IEEE Workshop on Speech Cod-ing, Porvoo, Finland (1999 June).

[27] M. S. Schroeder and B. S. Atal, “Code-ExcitedLinear Prediction (CELP): High-Quality Speech at VeryLow Bit Rates,” in Proc. IEEE ICASSP (Tampa, FL, 1985Mar.), pp. 937–940.

[28] T. Nomura, M. Iwadare, M. Serizawa, and K.Ozawa, “A Bitrate and Bandwidth Scalable CELP Coder,”in Proc. IEEE ICASSP (Seattle, WA, 1998 May), pp.341–344.

[29] ISO/IEC JTC1/SC29/WG11, “Final Call for Pro-posals on MPEG-4 Lossless Audio Coding,” MPEG2002/N5208, Shanghai, China (2002 Oct.).

[30] ISO/IEC 14496-3:2001/Amd.6:2005, “Coding ofAudio-Visual Objects—Part 3: Audio, Amendment 6:Lossless Coding of Oversampled Audio,” InternationalStandards Organization, Geneva, Switzerland (2005)

[31] E. Knapen, D. Reefman, E. Janssen, and F. Bruek-ers, “Lossless Compression of 1-Bit Audio,” J. Audio Eng.Soc., vol. 52, pp. 190–199 (2004 Mar.).

[32] ISO/IEC 14496-3:2005/Amd.2:2006, “Coding ofAudio-Visual Objects—Part 3: Audio, Amendment 2: Au-dio Lossless Coding (ALS), New Audio Profiles andBSAC Extensions,” International Standards Organization,Geneva, Switzerland (2006).

[33] ISO/IEC 14496-3:2005/Amd.3:2006, “Coding ofAudio-Visual Objects—Part 3: Audio, Amendment 3:Scalable Lossless Coding (SLS),” International StandardsOrganization, Geneva, Switzerland (2006).

[34] J. Herre and H. Purnhagen, “General Audio Cod-ing,” in The MPEG-4 Book, IMSC Multimedia Ser., F.Pereira and T. Ebrahimi, Eds. (Prentice-Hall, EnglewoodCliffs, NJ, 2002).

[35] M. Bosi, K. Brandenburg, S. Quackenbush, L.Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Dav-idson, and Y. Oikawa, “ISO/IEC MPEG-2 Advanced Au-dio Coding,” J. Audio Eng. Soc., vol. 45, pp. 789–814(1997 Oct.).

[36] ISO/IEC JTC1/SC29/WG11, “Report on theMPEG-2 AAC Stereo Verification Tests,” MPEG1998/N2006, San Jose, CA (1998 Feb.).

[37] J. Princen, A. Johnson, and A. Bradley, “Subband/Transform Coding Using Filter Bank Designs Based on

Time Domain Aliasing Cancellation,” in Proc IEEEICASSP (Dallas, TX, 1987), pp. 2161–2164.

[38] J. Herre and J. D. Johnston, “Enhancing the Per-formance of Perceptual Audio Coders by Using TemporalNoise Shaping (TNS),” presented at the 101st Conventionof the Audio Engineering Society, J. Audio Eng. Soc. (Ab-stracts), vol. 44, p. 1175 (1996 Dec.), preprint 4384.

[39] J. D. Johnston, J. Herre, M. Davis, and U. Gbur,“MPEG-2 NBC Audio—Stereo and Multichannel CodingMethods,” presented at the 101st Convention of the AudioEngineering Society, J. Audio Eng. Soc. (Abstracts), vol.44, p. 1175 (1996 Dec.), preprint 4383.

[40] R. Geiger, T. Sporer, J. Koller, and K. Branden-burg, “Audio Coding Based on Integer Transforms,” pre-sented at the 111th Convention of the Audio EngineeringSociety, J. Audio Eng. Soc. (Abstracts), vol. 49, p. 1230(2001 Dec.), convention paper 5471.

[41] R. Geiger, Y. Yokotani, and G. Schuller, “Im-proved Integer Transforms for Lossless Audio Coding,” inProc. 37th Asilomar Conf. on Signals, Systems and Com-puters (Pacific Grove, CA, 2003 Nov.).

[42] R. Geiger, J. Herre, J. Koller, and K. Brandenburg,“IntMDCT—A Link between Perceptual and Lossless Au-dio Coding,” in Proc. IEEE ICASSP (Orlando, FL, 2002May).

[43] R. Yu, C. C. Ko, S. Rahardja, and X. Lin, “Bit-Plane Golomb Code for Sources with Laplacian Distribu-tions,” in Proc. IEEE ICASSP (Hong Kong, China, 2003Apr.), pp. 277–280.

[44] R. Yu, X. Lin, S. Rahardja, C. C. Ko, and H.Huang, “Improving Coding Efficiency for MPEG-4 AudioScalable Lossless Coding,” in Proc. IEEE ICASSP (Phila-delphia, PA, 2005 May).

[45] I. Daubechies, and W. Sweldens, “Factoring Wave-let Transforms into Lifting Steps,” Tech. Rep. Bell Labo-ratories, Lucent Technologies (1996).

[46] F. Bruekers and A. Enden, “New Networks forPerfect Inversion and Perfect Reconstruction,” IEEE J.Selected Areas Comm., vol. 10, pp. 130–137 (1992 Jan).

[47] R. Geiger, Y. Yokotani, G. Schuller, and J. Herre,“Improved Integer Transforms Using Multi-DimensionalLifting,” in Proc. IEEE ICASSP (Montreal, Canada, 2004May).

[48] Y. Yokotani, R. Geiger, G. Schuller, S. Oraintara,and K. R. Rao, “Improved Lossless Audio Coding Usingthe Noise-Shaped IntMDCT,” presented at the IEEE 11thDSP Workshop, Taos Ski Valley, NM (2004 Aug).

[49] R. Yu, X. Lin, S. Rahardja, and H. Haibin, “Pro-posed Core Experiment for Improving Coding Efficiencyin MPEG-4 Audio Scalable Coding (SLS),” ISO/IECJTC1/SC29/WG11, M10683, Munich, Germany (2004Mar.)

[50] R. Yu, X. Lin, S. Rahardja, and C. C. Ko, “AStatistics Study of the MDCT Coefficient Distribution forAudio,” in Proc. ICME (Taipei, Taiwan, 2004 June).

[51] K. H. Choo, E. Oh, J. H. Kim, and C. Y. Son,“Enhanced Performance in the Functionality of Fine GrainScalability,” presented at the 119th Convention of the Au-dio Engineering Society, J. Audio Eng. Soc. (Abstracts),

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 41

Page 16: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

vol. 53, pp. 1227, 1228 (2005 Dec.), convention paper6597.

[52] ISO/IEC 14496–1:2004, “Coding of Audio-VisualObjects—Part 1: Systems,” International Standards Orga-nization, Geneva, Switzerland (2004).

[53] M. Hans and R. W. Schafer, “Lossless Compres-sion of Digital Audio,” IEEE Signal Process Mag., vol.18, pp. 21–32 (2001 July).

[54] AES Technical Committee of Coding of AudioSignals, “Perceptual Audio Coders: What to Listen For,”CD-ROM with tutorial information and audio examples,Audio Engineering Society, New York (2001).

[55] ITU-R BS.1548-1, “User Requirements for AudioCoding Systems for Digital Broadcasting,” InternationalTelecommunication Union, Geneva, Switzerland(2001–2002).

[56] ITU-R BS.1116-1, “Methods for the SubjectiveAssessment of Small Impairments in Audio Systems In-

cluding Multichannel Sound Systems,” International Tele-communication Union, Geneva, Switzerland (1994).

[57] ITU-R BS.1387-1, “Method for Objective Mea-surements of Perceived Audio Quality,” InternationalTelecommunication Union, Geneva, Switzerland (1998).

[58] R. Geiger, M. Schmidt, J. Herre, and R. Yu,“MPEG-4 SLS—Lossless and Near-Lossless Audio Cod-ing Based on MPEG-4 AAC,” presented at the Interna-tional Symposium on Communications, Control and Sig-nal Processing, Marrakech, Morocco (2006 Mar.)

[59] ISO/IEC JTC1/SC29/WG11, “Verification Reporton MPEG-4 SLS,” MPEG2005/N7687, Nice, France(2005 Oct.).

[60] ISO/IEC JTC1/SC29/WG11, “Revised Report onComplexity of MPEG-2 AAC Tools,” MPEG1999/N2957(Melbourne, Australia, 1999 Oct.), http://www.chiariglione.org/mpeg/working_documents/mpeg-02/audio/AAC_tool_complexity(rev).zip

THE AUTHORS

R. Geiger R. Yu J. Herre S. Rahardja

S.-W. Kim X. Lin M. Schmidt

Ralf Geiger received a diploma degree in mathematicsfrom the University of Regensburg, Regensburg, Ger-many, in 1997.

In 1998 he joined the Audio/Multimedia Department atthe Fraunhofer Institute for Integrated Circuits (IIS), Er-langen, Germany. From 2000 to 2004 he was with theFraunhofer Institute for Digital Media Technology(IDMT), Ilmenau, Germany. In 2005 he returned to Fraun-hofer IIS.

Mr. Geiger is working on the development and stan-dardization of perceptual and lossless audio coding

schemes. He is a coeditor of the ISO/IEC standard“MPEG-4 Scalable Lossless Coding (SLS).”

!

Rongshan Yu received a B.Eng. degree from ShanghaiJiaotong University, Shanghai, P. R. China, in 1995, andM. Eng. and Ph.D. degrees from the National Universityof Singapore in 2000 and 2005, respectively.

He was with the Centre for Signal Processing, School ofElectrical and Electronics, Nanyang Technological Uni-versity, Singapore, from 1999 to 2001, and with the Insti-

GEIGER ET AL. PAPERS

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February42

Page 17: ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio … · ISO/IEC MPEG-4 High-Definition Scalable Advanced Audio Coding* ... (rzyu@dolby.com) ... and its combination with the

tute for Infocomm Research (I2R), A*STAR, Singapore,from 2001 to 2005. He is currently with Dolby Laborato-ries, San Francisco, CA, USA. His research interests in-clude audio coding, data compression, and digital signalprocessing.

Dr. Yu received the Tan Kah Kee Young Inventors’Gold Award (Open Category) in 2003. He has been work-ing on the development and standardization of MPEGlossless audio coding schemes and coedited the ISO/IECstandard “MPEG-4 Scalable Lossless Coding (SLS).”

!

Jurgen Herre joined the Fraunhofer Institute for Inte-grated Circuits (IIS) in Erlangen, Germany, in 1989. Sincethen he has been involved in the development of percep-tual coding algorithms for high-quality audio, includingthe well-known ISO/MPEG-Audio Layer III coder (akaMP3). In 1995 he joined Bell Laboratories for a postdoc-toral term, working on the development of MPEG-2 ad-vanced audio coding (AAC). Since the end of 1996 he hasbeen back at Fraunhofer, working on the development ofadvanced multimedia technology, including MPEG-4,MPEG-7, and secure delivery of audiovisual content, cur-rently as the chief scientist for the audio/multimedia ac-tivities at Fraunhofer IIS, Erlangen.

Dr. Herre is a fellow of the Audio Engineering Society,cochair of the AES Technical Committee on Coding ofAudio Signals, and vice chair of the AES Technical Coun-cil. He also is an IEEE senior member, served as an as-sociate editor of the IEEE Transactions on Speech andAudio Processing, and is an active member of the MPEGaudio subgroup. Outside his professional life he is a ded-icated amateur musician.

!

Susanto Rahardja received a Ph.D. degree in electricaland electronic engineering from the Nanyang Technologi-cal University, Singapore.

He joined the Centre for Signal Processing, NanyangTechnological University, in 1996 and he has been a fac-ulty member at its School of Electrical and ElectronicEngineering since 2001. In 2002 he joined the Institute forInfocomm Research (I2R) and is currently the director ofits Media Division. He is overseeing research areas onsignal processing (audio coding, video/image processing),media analysis (text/speech, image, video), media security(biometrics, computer vision, and surveillance), and sen-sor network.

Dr. Rahardja has published in more than 150 interna-tional journals and at conferences in the areas of digital

communications, signal processing, and logic synthesis.He was the recipient of the IEE Hartree Premium Awardfor best journal paper published in the IEE Proceedings in2001 and the Tan Kah Kee Young Inventors’ Gold Award(Open Category) in 2003. He has served in various IEEEand SPIE-related professional activities in the area of mul-timedia and is actively involved in technical committeesof the IEEE Circuits and Systems and Signal Processingsocieties. He is associate editor of the Journal of VisualCommunication and Image Representation. Since 2002 hehas been an active participant of the MPEG and since thenhas been involved in the development of MPEG-4 Audio,where he is one of the contributors to the MPEG-4 SLSand ALS.

!

Sang-Wook Kim received a B.A. degree in electricalengineering from the Yonsei University, Korea, in 1989.He completed his master’s degree in 1991 at the KoreaAdvanced Institute of Science and Technology while re-searching digital signal processing.

!

Xiao Lin received a Ph.D. degree from the Electronicsand Computer Science Department of the University ofSouthampton, Southampton, UK, in 1993.

He worked at the Centre for Signal Processing,Nanyang Technological University, Singapore, as a re-search fellow and senior research fellow for about fiveyears. Subsequently he joined DeSOC Technology as atechnical director and then the Institute for Infocomm Re-search in 2002. There he was member of technical staff,lead scientist, and principal scientist and managed the Me-dia Processing Department until 2006. He is now withFortemedia Inc. as a senior director. He actively partici-pated in international standards such as MPEG-4,JPEG2000, and JVT.

Dr. Lin is a senior member of the IEEE.

!

Markus Schmidt received a Dipl.-Ing. degree in mediatechnology from the Technical University of Ilmenau,Germany, in 2004. During his studies he spent a year atthe University of Strathclyde in Glasgow, Scotland.

After completing an intership at the Fraunhofer Institutefor Digital Media Technology (IDMT) in Ilmenau, hejoined the Fraunhofer Institute for Integrated Circuits(IIS), Erlangen, Germany, in 2005. There his research in-terests include low-delay and lossless audio coding schemesand their implementation in real-time environments.

PAPERS ISO/IEC MPEG-4 SCALABLE AAC

J. Audio Eng. Soc., Vol. 55, No. 1/2, 2007 January/February 43