Top Banner
Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories [email protected]
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Perceptual Audio Rendering

Nicolas TsingosDolby Laboratories

[email protected]

Page 2: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Motivation

Many applications require processinghundreds of audio streams in real-time

games/simulators, multi-track mixing, etc.

©Eden Games ©Steinberg

Page 3: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Massive audio processing

Often exceeds available resources

Limited CPU or hardware processing

Bus-traffic

Typically involves

individual processing

mix-down of all signals to outputs

3D audio rendering

Page 4: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Perceptual audio rendering

Perceptually-based processing

Many sources and efficient DSP effects

Level of detail rendering

Independent of reproduction system

Extended sound sources Sound reflections

sound sources

Page 5: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Leveraging limitations of human hearing

A large part of complex sound mixtures is likely to be perceptually irrelevant

e.g., auditory masking

Limitations of spatial hearing

e.g., localization accuracy, ventriloquism

Page 6: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

masking

clustering

progressiveprocessing

sources

listener

Perceptual audio rendering components

Page 7: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Masking

Page 8: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Real-time masking evaluation

Remove inaudible sources

Fetch and process only perceptually relevant input

Different from invisible or occluded sound sources

Estimate inter-source masking

Build upon perceptual audio coding work

Computing audibility threshold requires knowledge of signal characteristics

Page 9: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Signal characteristics

Pre-computed for short time-frames (20 ms)

power spectrum

tonality index in [0,1] (1 = tone, 0 = noise)

time

pre-recorded signal

Page 10: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Sort sources by decreasing loudness

Loudness relates to the sensation of sound intensity

Efficient run-time loudness evaluation

Retrieve pre-computed power spectrum for each source

Modulate by propagation effects

Convert to loudness using look-up tables [Moore92]

Greedy culling algorithm

Page 11: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

power[dB]

listener

1Candidatesources

Currentmix

Currentmasking

threshold

STOP !

2

3

4

Currentmasking

threshold

Currentmasking

threshold

Currentmasking

threshold

Masking evaluation

Page 12: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Clustering

Page 13: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Dynamic spatial clustering

Amortize (costly) 3D-audio processing over groups of sources

Leverage limited resolution of spatial hearing

Group neighboring sources together

Compute an “impostor” for the group

Perceptually equivalent but cheaper to render

Unique point source with a complex response (mixture of all source signals in cluster)

Page 14: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Dynamic spatial clustering

Limited spatial perception of human hearing [Blauert, Middlebrooks]

Static sound source clustering [Herder99]

non-uniform subdivision of direction space

use Cartesian centroid as representative

Page 15: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Group neighboring sources together

Uniform direction constraint

Log(1/distance) constraint

Weight by loudness

Hochbaum-Schmoy heuristic [Hochbaum85]

Fast hierarchical implementation

Dynamic spatial clustering

Page 16: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Mix signals of all sources in the cluster

create a single source with a complex response

Rendering clusters

Page 17: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Dynamic spatial clustering

Page 18: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Culling and masking are transparent

rated 4.4/5 avg. (5 = indistinguishable from reference)

Clustering preserves localization cues

74% success avg. (90% within 1 meter of true location)

no significant correlation with number of clusters

Pilot validation study

Page 19: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Progressive processing

Page 20: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Progressive signal processing

A scalable pipeline for filtering and mixing many audio streams

fetch & process only perceptually relevant input

continuously adapt quality vs. speed

remain perceptually transparent

use a “standard” representation of the inputs

Page 21: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Progressive signal processing

Uses Fourier-domain coefficients for processing

Degrade both signal quality and spatial cues

Combines processing and audio coding

Uses additional signal descriptors for decision making

Page 22: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Progressive processing pipeline

N input frames

importance

Process+

Reconstruct

MaskingImportance

sampling

1 output frame

Page 23: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Progressive signal processing

Page 24: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Progressive processing and sound synthesis

Sound synthesis from physics-driven animation

Modal models

Resonant modes can be synthesized in Fourier domain

numer of Fourier coefficients can be allocated on-the-fly

Balance processing costs for recorded and synthesized sounds at the same time

Page 25: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Conclusions

Perceptually motivated techniques for rendering and authoring virtual auditory environments

human listener only process a small amount of information in complex situations

Extend to

more complex auditory processing model

cross-modal perception

Efficient and Practical Audio-Visual Rendering for Games using Crossmodal Perception David Grelaud, Nicolas Bonneel, Michael Wimmer, Manuel Asselot, George Drettakis, Proceedings

of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games - 2009

other problems : dynamic range managemente.g., HDR audio approach of EA/Dice studio for Battlefield

Page 26: Perceptual Audio Rendering Nicolas Tsingos Dolby Laboratories nicolas.tsingos@dolby.com.

Additional references

www-sop.inria.fr/reves

This work was supported by RNTL project OPERA

http://www.inria.fr/reves/OPERA

EU IST Project CREATE EU IST Project CREATE http://www.cs.ucl.ac.uk/create

EU FET OPEN Project CROSSMOD EU FET OPEN Project CROSSMOD http://www-sop.inria.fr/reves/CrossmodPublic/