Dr. Jürgen Herre 11/07 Page 1 Fraunhofer Institut Integrierte Schaltungen Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Jürgen Herre Fraunhofer Institut für Integrierte Schaltungen (IIS) Erlangen, Germany
40
Embed
Efficient Representation of Sound Images: Recent Developments in Parametric … · 2007-11-12 · Dr. Jürgen Herre 11/07 Page 1 Fraunhofer Institut Integrierte Schaltungen Efficient
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dr. Jürgen Herre 11/07 Page 1
Fraunhofer InstitutIntegrierte Schaltungen
Efficient Representation of Sound Images:
Recent Developments in Parametric Coding
of Spatial Audio
Jürgen HerreFraunhofer Institut für Integrierte Schaltungen (IIS)
Erlangen, Germany
Dr. Jürgen Herre 11/07 Page 2
Fraunhofer InstitutIntegrierte Schaltungen
Introduction: Sound Images … ?
• Humans live in a world of sound
• … continuously listen to the soundsurrounding us
• … perceive the acoustic waves reachingthe ears and interpret them
• … (re)construct an acoustic scene
• In analogy to visual perception, we form a“sound image” . . .
This talk: How to efficiently represent and reproduce sound images!
Dr. Jürgen Herre 11/07 Page 3
Fraunhofer InstitutIntegrierte Schaltungen
Part I
Part II
Part III
Overview
• What constitutes a “sound image”?
• Efficient coding of spatial sound images– Perceptual audio coding– Coding of multi-channel / surround sound
⇒ Auditory system needs to “parse” sound into direct & ambient sound!
Dr. Jürgen Herre 11/07 Page 10
Fraunhofer InstitutIntegrierte Schaltungen
Spatial Sound Perception (6)
• Important role of Interaural Coherence (IC):Determines spatial extent (=width) of soundevent [Schuijers03] [Faller03]
• Perceived width of auditory event increasesas coherence decreases (1 → 4);eventually separates into 2 distinct events
• Max. of normalized cross-correlation:
!
IC =maxd
e1(n) " e
2(n + d)
n=#$
$
%
e1
2(n)
n=#$
$
% " e2
2(n + d)
n=#$
$
%[Faller 2004]
e1 e2
⇒ Key to source width and source/ambience perception!
Dr. Jürgen Herre 11/07 Page 11
Fraunhofer InstitutIntegrierte Schaltungen
Some A↔V Analogies in Scene Attributes
Visual Domain Auditory Domain (→ auditory cues)
Foreground object Sound sources (→ high IC) - directional
Object position - Object position (→ ILD/ITD/IPD for lat. position)
Object size - Object size (→ IC)
Background Ambience (→ low IC) - non-directional
Dr. Jürgen Herre 11/07 Page 12
Fraunhofer InstitutIntegrierte Schaltungen
Part II:
Efficient Coding Of Spatial Sound Images
Dr. Jürgen Herre 11/07 Page 13
Fraunhofer InstitutIntegrierte Schaltungen
Basics of Audio Coding
• Represent audio data as compactly aspossible while maintaining sound quality(ideally: “transparent” coding)
• Concept of perceptual audio coding
– Optimize subjective quality rather thanobjective distortion metrics (e.g. MSE/SNR)
– Use knowledge about signal receiver:Psychoacoustics gives limits of perception
– Keep coding distortion below limits!
– No universal source model available(unlike in speech coding)
Goal
Predominant Approach
Dr. Jürgen Herre 11/07 Page 14
Fraunhofer InstitutIntegrierte Schaltungen
Psychoacoustics
L
0
20
40
60
dB
80
0,02 0,05 0,1 0,2 0,5 1 2 5 10 20 kHz
Tf
Threshold
in quiet
T
Masking
Threshold
Inaudible
Signal
Masker
Masked
Sound
Dr. Jürgen Herre 11/07 Page 15
Fraunhofer InstitutIntegrierte Schaltungen
Demonstration: The "13 dB Miracle"
• Original signal
• Original + white noise, SNR = 13,6 dB
• Original + noise at threshold, SNR = 13,6 dB
• Difference signal: White noise
• Difference signal: Noise at threshold
Historic demonstration by James D. Johnston and Karlheinz Brandenburg at AT&T Bell Laboratories in 1990 using the best psychoacoustic modelavailable at that time ...
⇒ SNR does not adequately describe subjective sound quality!⇒ Putting psychoacoustics to work makes a huge difference!
Dr. Jürgen Herre 11/07 Page 16
Fraunhofer InstitutIntegrierte Schaltungen
Basic Paradigm of (Monophonic) Perceptual Audio Coding
Quantization & Coding
Encoding of Bitstream
Analysis Filterbank
Perceptual Model
bitstream
out
audio
in
Inverse Quantization
Synthesis Filterbank
audio
out
bitstream
in
Decoding of Bitstream
Dr. Jürgen Herre 11/07 Page 17
Fraunhofer InstitutIntegrierte Schaltungen
A Real Audio Coder (MPEG-2 AAC, 1997)In
pu
t ti
me
sig
nal In
ten
sity
/C
ou
pli
ng
BitstreamMultiplexer
Perceptual Model
GainControl
FilterBank
TNS M/S
Rate/Distortion Control
Nois
ele
ss
Codin
g
Bitstream Output
Scale
Facto
rs
Pre
dic
tion
Quant.
Dr. Jürgen Herre 11/07 Page 18
Fraunhofer InstitutIntegrierte Schaltungen
The Better Spatial Sound Image:Surround Sound / Multi-Channel Audio
• Significantly increased spatial realism over stereo, envelopment• Origins in movie sound (5.1); now also for music, broadcasting• Increasingly adopted in consumers’ homes
Dr. Jürgen Herre 11/07 Page 19
Fraunhofer InstitutIntegrierte Schaltungen
Traditional Delivery Formats For Surround Sound
• Downmix of 5.1 sound into stereo signal,upmix at the receiver side
– Efficient in terms of transmission bandwidth(same bitrate as stereo)
– Backward compatible to stereo delivery
– Limited computation necessary– Significant loss in subjective audio quality
• Separate transmission of each channel– Significantly higher bitrate than stereo– Moderate amount of computation– High subjective audio quality possible
Matrixed Surround(Prologic, Neo6, …)
Discrete Surround(AAC, AC-3, …)
Dr. Jürgen Herre 11/07 Page 20
Fraunhofer InstitutIntegrierte Schaltungen
A Major Step Ahead: “Spatial Audio Coding”
• Rather recent development
• Compression efficiency:Transmits multi-channel audio at bitratesused for 2-channel stereo (or even mono)
• Backward compatibility:SAC multi-channel audio is coded in abackward compatible way⇒ existing infrastructures can be seamlesslyupgraded to multi-channel / surround!
• High subjective audio quality
Heavily based on exploiting perception rather than waveform coding!
• Decorrelation by QMF-domain all-pass filters• Several tools for handling fine temporal
envelope structure (both without and withadditional side information)
Generalization
Spatial Parameters
Other Aspects
Dr. Jürgen Herre 11/07 Page 30
Fraunhofer InstitutIntegrierte Schaltungen
MPEG Surround: Additional Functionalities
• Externally created downmixes can be used
• Stereo downmix can be made compatiblewith common matrix surround decoders
• MPEG Surround decoder can decode matrixsurround signal (i.e. work without side info)
• Downmix can be generated as virtualsurround, or MPEG Surround can bedecoded directly into virtual surround veryefficiently
Artistic Downmix
Matrix SurroundCompatibility
Enhanced MatrixMode
Binaural Renderingfor headphone
⇒ Rich set of attractive features for practical application
Dr. Jürgen Herre 11/07 Page 31
Fraunhofer InstitutIntegrierte Schaltungen
MPEG Surround: Recent Verification Test
“Music-Store” test scenario: Stereo downmix coded using AAC@160kbit
MPEG Surround
MPEG Surroundwithout side-info
Matrixed Surround(Dolby Prologic II)
Dr. Jürgen Herre 11/07 Page 32
Fraunhofer InstitutIntegrierte Schaltungen
Part III:
Next Generation Interactive Coding /Rendering of Sound Images
From Spatial Audio Coding to Spatial Audio Object Coding
Dr. Jürgen Herre 11/07 Page 33
Fraunhofer InstitutIntegrierte Schaltungen
hierarchically multiplexed
downstream control / data
hierarchically multiplexed
upstream control / data
audiovisualpresentation
3D objects
2D background
voice
sprite
hypothetical viewer
projection
videocompositor
plane
audio
compositor
scene
coordinatesystem
x
y
zuser events
audiovisual objects
speakerdisplay
user input
scene
globe desk
person audiovisual
presentation
2D background furniture
voice sprite
Hierarchy of objects
Classic MPEG-4 Interactive Scene Composition (1996ff)
Scene is composed of multiple A/V objects and can be rendered interactively
Dr. Jürgen Herre 11/07 Page 34
Fraunhofer InstitutIntegrierte Schaltungen
MPEG-4 Object Based (De)Coding
Discrete approach comes at a rather high price:• Bitrate and decoding complexity grow with number of objects• Structural complexity
Dr. Jürgen Herre 11/07 Page 35
Fraunhofer InstitutIntegrierte Schaltungen
From Spatial Audio Coding (SAC) to SAOCRegular Spatial Audio Coding: Channel-oriented scheme(MPEG Surround)
Chan. #1Chan. #2Chan. #3Chan. #4 . . .
Downmixsignal(s)SAC
EncoderSideInfo
SACDecoder
Chan. #1Chan. #2Chan. #3Chan. #4 . . .
Dr. Jürgen Herre 11/07 Page 36
Fraunhofer InstitutIntegrierte Schaltungen
Alternative: Object-oriented Spatial Audio Coding
Obj. #1Obj. #2Obj. #3Obj. #4 . . .
Downmixsignal(s)SAOC
EncoderSideInfo
SAOCDecoder
Chan. #1Chan. #2 . . .
Renderer
User Interaction / Ctrl.
obj. #1
obj. #2
obj. #3
obj. #4
. . .
From SAC to SAOC (2)
• Processes object signals instead of channel signals• “Mixing”/rendering parameters vary according to user interaction• Combined obj. decoding & rendering ⇒ computationally efficient!• Previous work by Faller & Baumgarte [2001ff] and Faller [2006]
Dr. Jürgen Herre 11/07 Page 37
Fraunhofer InstitutIntegrierte Schaltungen
Real-time interactive rendering of audio objects from a mono audio downmix + SAOC side information
Dr. Jürgen Herre 11/07 Page 38
Fraunhofer InstitutIntegrierte Schaltungen
New MPEG Standardization Activities
• Work on “Spatial Audio Object Coding” (SAOC) started• Transcoding approach: “SAOC” + rendering info → MPEG Surround
• Reference model and working draft recently established (10/2007)
SAOC Decoder
Dr. Jürgen Herre 11/07 Page 39
Fraunhofer InstitutIntegrierte Schaltungen
Conclusions
• “Sound Images” carry some analogy to images in the visual world
• Spatial Audio Coding schemes code surround sound based onperception (rather than on waveform match)
– “Object positions” are represented by perceptual spatial parameters– “Audio Object Texture” is coded using regular mono/stereo coder
• Such schemes can bring surround sound into existing infrastructures– High compression factor (surround sound at 64kbps and below!)– Stereo / mono backward compatibility
• Extension towards efficient interactive, object-based scene coding /rendering (Spatial Audio Object Coding) is currently on its way …