Top Banner
Roberta Eklund Roberta Eklund Consultant Consultant MPEG-4 AUDIO MPEG-4 AUDIO OVERVIEW OVERVIEW
30

MPEG-4 AUDIO OVERVIEW

Feb 10, 2016

Download

Documents

menora

MPEG-4 AUDIO OVERVIEW. Roberta Eklund Consultant. MPEG-4 Audio Overview. Natural Audio T/F CELP PARA Structured Audio SAOL SASL SASBF MIDI-DLS-version 2 TTS Cross Tool(Algorithm) Functionality Pitch/tempo change Bitrate scalability - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MPEG-4 AUDIO  OVERVIEW

Roberta EklundRoberta EklundConsultantConsultant

MPEG-4 AUDIO MPEG-4 AUDIO OVERVIEWOVERVIEW

Page 2: MPEG-4 AUDIO  OVERVIEW

MPEG-4 Audio OverviewMPEG-4 Audio Overview Natural Audio

T/F CELP PARA

Structured Audio SAOL SASL SASBF MIDI-DLS-version 2 TTS

Cross Tool(Algorithm) Functionality Pitch/tempo change Bitrate scalability Computation complexity scalability Error robustness Audio related effects Acoustic virtualization

Page 3: MPEG-4 AUDIO  OVERVIEW

Different Tools for Different Tools for Bitrates/ApplicationBitrates/Application

dSatellite

UMTS,Cellular

DAM,Internet

DCME ISDN

2 4 6 8 10121416 24 32 48 64

Parametrtriccoder

CELPcoder

ITU-Tcoder

4 kHz 8 kHz 20kHz

ty p i ca l A u d i oB a n d w i th

bit-rate(kbps)

T/Fcoder

Scalable Coder

Page 4: MPEG-4 AUDIO  OVERVIEW

MPEG-4 Audio Tools MPEG-4 Audio Tools PROFILESPROFILES

Object Profile - Profile - defines the syntax of defines the syntax of the bitstream for one single Object, that the bitstream for one single Object, that can represent a meaningful entity in the can represent a meaningful entity in the Audio or Visual scene. Elementary Audio or Visual scene. Elementary bitstreambitstream

Composition Profile - Profile - defines which defines which different Object Profiles can be different Object Profiles can be combined in the Audio or Visual scene. combined in the Audio or Visual scene. Combinations of Elementary bitstreamsCombinations of Elementary bitstreams..

Page 5: MPEG-4 AUDIO  OVERVIEW

OBJECT PROFILESOBJECT PROFILESProfile Hierarchy Tools supported: Object

ProfileID

reserved 0AAC Main contains AAC LC 13818-7 main profile

PNS1

AAC LC 13818-7 LC profilePNS

2

AAC SSR 13818-7 SSR profilePNS

3

T/ F 13818-7 LC PNS LTP 4T/ F Mainscalable

contains T/ F LCscalable

13818-7 main PNS LTP BSACtools for large step scalability(TLSS)core codecs: CELP, TwinVQ,HILN

5

T/ F LC scalable 13818-7 LC PNS LTP BSACtools for large step scalability(TLSS)core codecs. CELP, TwinVQ,HILN

6

TwinVQ core TwinVQ 7CELP CELP 8HVXC HVXC 9HILN HILN 10TTSI Text-To-Speech Interface 11Main Synthetic Superset Wavetable all structured audio tools 12WavetableSynthesis

SASBFMIDI

13

reserved 14reserved 15

Page 6: MPEG-4 AUDIO  OVERVIEW

Combination ProfilesCombination ProfilesCombination

ProfileHierarchy Audio Object Profiles

supportedMain Contains Scalable AAC Main, LC, SSR

Speech and T/ F, T/ F Main Scalable, T/ F LCLow Rate Synthetic Scalable

TwinVQ coreCELPHVXCHILNMain Synthetic TTSI

Scalable Contains Speech T/ F LC ScalableAAC-LC and/ or T/ FCELPHVXCTwin VQ coreHILNWavetable SynthesisTTSI

Speech CELPHVXCTTSI

Low Rate Wavetable Synthesis Synthesis TTSI

Page 7: MPEG-4 AUDIO  OVERVIEW

MPEG-4 Encoder MPEG-4 Encoder StructureStructure

multi-plex

PARAcore

signal analysisand control

sepa-ration

pre-processingaudio

signalbit

stream

CELPcore

T/Fcore

Page 8: MPEG-4 AUDIO  OVERVIEW

MPEG-4 T/F EncoderEncoderConfigurationConfiguration

PerceptualModel

Bark Scale toScalefactor

BandMapping

AACQuantizationand Coding Twin VQ

AACGain Control

Tool

Prediction

Filterbank

SpectralNormalization

TNS

Intensity /Coupling

M / S

WindowLength

Decision

BitstreamFormatter coded audio

stream

Legend:DataControl

input timesignal

Psychoacoustic Model Spectral Processing

Quantization and Coding

BSACQuantizationand Coding

Page 9: MPEG-4 AUDIO  OVERVIEW

MPEG-4 T/F MPEG-4 T/F DecoderDecoderConfigurationConfiguration Bitstream

Formattercoded audio

stream

Legend:DataControl

AACGain Control

Tool

Prediction

Filterbank

SpectralNormalization

TNS

Intensity /Coupling

M / S

Spectral Processing

AACQuantizationand Coding

Twin VQ

Decoding and Inverse Quantization

output timesignal

BSACQuantizationand Coding

Page 10: MPEG-4 AUDIO  OVERVIEW

coding

LPC synthesisfilter

excitation signalgenerator

LPC analysisand quant.

spectralweighting filter

errorminimization

audiosignal

bitstream

Block Diagram of CELP Encoder

Page 11: MPEG-4 AUDIO  OVERVIEW

decoding excitation signalgenerator

LPC synthesisfilter

bitstream

audiosignal

Excitation signal generator: codebook regular pulse excitation (RPE) multi-pulse excitation (MPE)

Block Diagram of CELP Decoder

Page 12: MPEG-4 AUDIO  OVERVIEW

perceptionmodel

model basedseparation

multi-plex

quantizationand coding

quantizationand coding

quantizationand coding

individualcomponents

noisecomponents

harmoniccomponents

audiosignal

bitstream

parameterestimation

parametercoding

Block Diagram of PARA Encoder

Page 13: MPEG-4 AUDIO  OVERVIEW

parameterdecoding

individualcomponents

noisecomponents

harmoniccomponents

audiosignal

bitstream

synthesis

Block Diagram of PARA Decoder

Page 14: MPEG-4 AUDIO  OVERVIEW

Two operating modes harmonic and noise components (HVXC)

– for speech coding at 2...4 kbps harm. & indiv. sinusoidal comp. + noise (HILN)

– for coding of music signals with low complexity content (e.g. single instruments) at 4...16 kbps

combination of both modes– support by syntax, defined transition– automatic mode selector– cross fade from one signal to another one

PARA is Two Codecs in One

Page 15: MPEG-4 AUDIO  OVERVIEW

Text-to-SpeechText-to-Speech

Phonemic (language-independent) Phonemic (language-independent) syntaxsyntax

Prosody, timing cuesProsody, timing cues Language, dialect, gender, age Language, dialect, gender, age

parametersparameters Automatic synchronization with FBAAutomatic synchronization with FBA Exact TTS synthesis non-normative; Exact TTS synthesis non-normative;

only interface is specifiedonly interface is specified

Page 16: MPEG-4 AUDIO  OVERVIEW

Structured AudioStructured Audio

Structured Audio - Structured Audio - Sound coding Sound coding using structured descriptionsusing structured descriptions

Structured Audio decoder - music Structured Audio decoder - music and sound-effect synthesisand sound-effect synthesis

MMA, Microsoft, EMU now MMA, Microsoft, EMU now collaborating on MIDI DLS-version 2 collaborating on MIDI DLS-version 2 in MPEG4in MPEG4

Page 17: MPEG-4 AUDIO  OVERVIEW

SAOLSAOL Downloadable BNF synthesis grammarDownloadable BNF synthesis grammar Header contains description of several Header contains description of several

synthesizers and effects processors synthesizers and effects processors control algorithms and routing control algorithms and routing instructions for audio flow of controlinstructions for audio flow of control

SAOL has 100 primitive processing SAOL has 100 primitive processing instructions, signal generators and instructions, signal generators and operators which fill wavetables with data.operators which fill wavetables with data.

Page 18: MPEG-4 AUDIO  OVERVIEW

SASL and MIDISASL and MIDI New format for describing control parametersNew format for describing control parameters

- Basically a scheduler of audio events- Basically a scheduler of audio events - Designed to interface well with SAOL- Designed to interface well with SAOL - New Control Language Similar to MIDI- New Control Language Similar to MIDI MIDI (Musical Instrument Digital Interface) MIDI (Musical Instrument Digital Interface)

– Simpler format for describing controlSimpler format for describing control– Included as alternate control methodIncluded as alternate control method– Leverages existing authoring toolsLeverages existing authoring tools– Gives “backwards compatibility” to SAGives “backwards compatibility” to SA

Page 19: MPEG-4 AUDIO  OVERVIEW

DLS Level 2DLS Level 2

Aims at Aims at consistentconsistent synthetic audio synthetic audio playback across wide range of platformsplayback across wide range of platforms

Defines a simple wavetable synthesizerDefines a simple wavetable synthesizer Bitstream Bitstream includesincludes sound samples sound samples Score expressed in MIDIScore expressed in MIDI Growing support from both software and Growing support from both software and

hardware developershardware developers– DLS Part of DirectMusic in Microsoft’s DirectX DLS Part of DirectMusic in Microsoft’s DirectX

6.06.0

Page 20: MPEG-4 AUDIO  OVERVIEW

DLS-2 synthesizer modelDLS-2 synthesizer model

Simple yet powerful structure much alike Simple yet powerful structure much alike to many existing synthesizers in the to many existing synthesizers in the market (eg in PC soundcards)market (eg in PC soundcards)– Uses loopable samples as sound sources Uses loopable samples as sound sources

(wavetable)(wavetable)– variable routing of control sourcesvariable routing of control sources

2 envelopes for amplitude control2 envelopes for amplitude control 2 low frequency oscillators2 low frequency oscillators 1-pole dynamic low-pass filter1-pole dynamic low-pass filter

– Standardized response to MIDI controllersStandardized response to MIDI controllers

mission.mss

Page 21: MPEG-4 AUDIO  OVERVIEW

Audio BifsAudio Bifs

AudioSource AudioSource

Piano (SA) Finger snaps (Parametric)

BIFSstuff

Audiochannels

Bass (SA)

AudioSource

AudioMixAudioFX

Synchronization with Visual!

AudioFX AudioFX AudioDelay

AudioMix

HRTF

Page 22: MPEG-4 AUDIO  OVERVIEW

Demo Audio BIFSDemo Audio BIFS

Page 23: MPEG-4 AUDIO  OVERVIEW

ConclusionConclusion

MPEG-4 Audio attempts to offer MPEG-4 Audio attempts to offer solutions to all spectra of sound.solutions to all spectra of sound.

Some of the tools are more stable, Some of the tools are more stable, while others are still in Research while others are still in Research and Development.and Development.

MPEG2-AAC is the best multi-MPEG2-AAC is the best multi-channel lossy audio compression channel lossy audio compression standard to date.standard to date.

Page 24: MPEG-4 AUDIO  OVERVIEW

AcknowledgementsAcknowledgements

I would like to thank I would like to thank the authors from the authors from the references for the references for providing the providing the material presented material presented here today. here today.

Page 25: MPEG-4 AUDIO  OVERVIEW

DefinitionsDefinitions T/F T/F Time/Frequency (MDCT transform)Time/Frequency (MDCT transform) AAC AAC Advanced Audio CodingAdvanced Audio Coding PARA PARA Parametric Parametric CELP CELP Code Excited Linear PredictionCode Excited Linear Prediction SASA Structured AudioStructured Audio PNS PNS Perceptual Noise SubstitutionPerceptual Noise Substitution HVXC HVXC Harmonic Vector eXcitation CodingHarmonic Vector eXcitation Coding HILN HILN Harmonic and Individual Line + NoiseHarmonic and Individual Line + Noise SAOL SAOL Structured Audio Orchestra LanguageStructured Audio Orchestra Language SASL SASL Structured Audio Score LanguageStructured Audio Score Language MIDI MIDI Musical Instrument Digital InterfaceMusical Instrument Digital Interface TTS TTS Text to SpeechText to Speech

Page 26: MPEG-4 AUDIO  OVERVIEW

More DefinitionsMore Definitions CD CD Committee DraftCommittee Draft IS13818-7 IS13818-7 Advanced Audio CodingAdvanced Audio Coding LC LC Low Complexity Low Complexity BSAC BSAC Bit Sliced Arithmetic CodingBit Sliced Arithmetic Coding SSRSSR Scalable Sample Rate Scalable Sample Rate PNS PNS Perceptual Noise SubstitutionPerceptual Noise Substitution VBRVBR Variable Bit RateVariable Bit Rate TLSSTLSS Tools for Large Step ScalabilityTools for Large Step Scalability SNHCSNHC Synthetic/Natural Hybrid CodingSynthetic/Natural Hybrid Coding DLSDLS Downloadable SamplesDownloadable Samples

Page 27: MPEG-4 AUDIO  OVERVIEW

Natural Audio Natural Audio ComplexityComplexity

1chanaudio

AAC AAC-LC

T/ Fmain

TwinVQ

HILN HVXC

NB-CELP

WB-CELP

RAM(Words)

4256 2232 4346 4240 3000 1500

650 830

ROM(Words)

3545 3545 3618 43000

4000 7700

2300 1000

min.WordLength

>=20 >=20 >=20 >=20 16 16 16 24(16)

Samp.Rate

48 48 48 48 8 8 8 16

MOPS/MIPS

5 3 6 3 4 typ.10max

2 2 4

Page 28: MPEG-4 AUDIO  OVERVIEW

AAC Decoder ComplexityAAC Decoder Complexity EvaluationEvaluation

MPEG AAC DecoderMPEG AAC Decoder ComplexityComplexity2-channel Main Profile2-channel Main Profile 40% of 133 MHz 40% of 133 MHz PentiumPentium

2-channel Low Complexity2-channel Low Complexity 25% of 133 MHz 25% of 133 MHz PentiumPentium5-channel Main Profile5-channel Main Profile 90 sq. mm die, 0.5 90 sq. mm die, 0.5 micron micron CMOSCMOS5-channel Low Complexity5-channel Low Complexity 60 sq.mm die, 0.5 60 sq.mm die, 0.5 micron micron CMOSCMOS

Page 29: MPEG-4 AUDIO  OVERVIEW

AAC Test ResultsAAC Test Results

Test at BBC and NHK according to ITU-R Test at BBC and NHK according to ITU-R BS.1116BS.1116– triple-stimulus/hidden-reference/double-triple-stimulus/hidden-reference/double-

blindblind– ITU-R 5-point impairment scaleITU-R 5-point impairment scale– 95% Confidence Intervals95% Confidence Intervals

MPEG AAC provides “indistinguishable” quality MPEG AAC provides “indistinguishable” quality at 320 kb/s per five channelsat 320 kb/s per five channels

MPEG AAC at 320 kb/s outperforms MPEG BC MPEG AAC at 320 kb/s outperforms MPEG BC Layer II at 640 kb/s per five channelsLayer II at 640 kb/s per five channels

Recent Stereo Tests at NHK Showed MPEG AAC Recent Stereo Tests at NHK Showed MPEG AAC provides “indistinguishable” quality at 128 provides “indistinguishable” quality at 128 kb/s per two channelskb/s per two channels

Page 30: MPEG-4 AUDIO  OVERVIEW

ReferencesReferences M. Bosi, E. Schrierer, B. Edler, Peter G. Schreiner MPEG-4 M. Bosi, E. Schrierer, B. Edler, Peter G. Schreiner MPEG-4

Seminar, Fribourg, Switzerland 1997Seminar, Fribourg, Switzerland 1997 S. Quackenbush, “Coding of Natural Audio in MPEG-4”, Proc S. Quackenbush, “Coding of Natural Audio in MPEG-4”, Proc

IEEE ICASSP, Seattle, 1998IEEE ICASSP, Seattle, 1998 B. Grill, B. Edler, I. Kaneko, Y. Lee, M. Nishiguichi, E. Scheirer, B. Grill, B. Edler, I. Kaneko, Y. Lee, M. Nishiguichi, E. Scheirer,

and M. Väänänen (Eds). ISO 14496-4(MPEG-4 Audio) and M. Väänänen (Eds). ISO 14496-4(MPEG-4 Audio) Committee Draft. MPEG document N1903Committee Draft. MPEG document N1903

E. Schrier, “The MPEG-4 Structured Audio Standard”, Proc E. Schrier, “The MPEG-4 Structured Audio Standard”, Proc IEEE ICASSP, Seattle, 1998IEEE ICASSP, Seattle, 1998

Juergen Herre, “Updated Description for Perceptual Noise Substitution Tool”, MPEG Document M2692

E. Scheirer, R. Väänänen, J. Huopaniemi, “AudioBIFS: The MPEG-4 Standard for Effects Processing”, AES, SF, 1998

Overview: http://www.cselt.it/mpeg/standards/mpeg-4/mpeg-4.htm