Multizone reproduction of speech soundfields a perceptually weighted approach - final

MULTIZONE REPRODUCTION OF SPEECH SOUNDFIELDS:A PERCEPTUALLY WEIGHTED APPROACHJacob Donley and Christian Ritz

School of Electrical, Computer and Telecommunications EngineeringICT Research Institute & Global ChallengesUniversity of Wollongong

2

Room

How can we perceptually enhance independent listening zones in a room?

Quiet Zone:No reproduced sound

Bright Zone:Listening to speech or music

Loudspeakers

Known as Multizone Reproduction of Soundfields

3

Aim: derive loudspeaker signals to reproduce desired sound field in each zone • Reproduced sound field modelled in the

(discrete) space (), time (), frequency domain () as:

𝑆𝑤 (𝐱 ,𝑛 ,𝑘 )=∑𝑙=1

𝐿

𝑑𝑙 (𝑛 ,𝑘 ,𝑤 )( 𝑗4 𝐻0❑(1 )❑ (𝑘‖𝐱 𝑙−𝐱‖))

is the mth order Hankel function of the first kind are the loudspeaker signals to be derived

[1] Donley, J. & Ritz, C., “An efficient approach to dynamically weighted multizone wideband reproduction of speech soundfields”, Proc. IEEE ChinaSIP 2015, pp. 60-64, 12-15 July 2015. [2] W. Jin, W. B. Kleijn, and D. Virette, “Multizone soundfield reproduction using orthogonal basis expansion,” Proc. IEEE ICASSP 2013, pp. 311–315

Solution is based on a weighted orthogonal basis expansion approach [1,2]

4http://bit.ly/WeightedMultizone

Weighting method controls leakage into quiet zone at cost of quality in bright zone

• Multizone Occlusion problem:

• Quiet zone in-line with desired bright zone

• Difficult to control leakage• Trade-off:

• Quality in Bright Zone vsQuietness in Quiet Zone

Small weight

Large weight

Discrete:Space

Time Frequency

(weighted actual soundfield function)

How quiet does the quiet zone need to be?

http://bit.ly/WeightedMultizone

http://bit.ly/WeightedMultizone

5

• Only need to suppress leakage in the quiet zone down to the threshold in quiet• Possible only if the acoustic contrast between zones is large

enough

Case 1: The Hearing Threshold

Speech

6

• Key idea: a masker in the quiet zone perceptually hides surrounding frequency components leaked from the bright zone

• Benefit: Less control via weighting needed – improve bright zone quality

Case 2: Spreading functions corresponding to local masking signal

2kHz MaskerSpeech

• Max. SPL - small weight, high bright zone quality

• Min. SPL – large weight, low bright zone quality

• Leaked SPL – masker allowed to remain in quiet zone

7

Considering masking - reduces spatial error in the bright zone and SPL in quiet zone

Benefit: Perceptually optimised trade-off between quality and leakage

• Weights chosen by comparing reproduced speech with spreading functions

(2)

reduction

Spatial error:Speech

Spreading function and hearing threshold

𝜖𝑏(𝑛 ,𝑘)

8

Experimental evaluation to validate proposed perceptual approach

Multizone Setup:• Full circle of 65 loudspeakers • Loudspeaker array diameter: 3m• Zone diameters: 60cm

(enough space for a human head)• Zone centres are 1.2m apart • Reproduction capable of wideband

speech• Direction of speech causes Multizone

Occlusion Problem ().

= Hearing threshold & Spreading function (as used in audio coding standards)

9

• 10dB improvement in MSE• Still high quality speech in the bright zone

Reduced bright zone error from psychoacoustic masking

Mean Squared Error (MSE):

No masking large weight

With masking variable weight

10

Reduced bright zone spatial error from psychoacoustic masking

Magnitude difference (A, B):

Phase difference (C, D):

Maximum spatial error reduction: 28dB

Consequence of smaller weighting:less loudspeaker power

(max. reduction = 65 %

11

Conclusion: Exploiting perceptual weighting within multizone soundfield reproduction results in significant advantages • Improved error in bright zones with no perceptual cost in

adjacent zones• MSE of speech: -69.8dB to -80.3dB (max)• Spatial error: -7.4dB to -31.5dB (max)

• Reduced loudspeaker power (up to 65%)

• Improved reproduction when occlusion problem is present

Questions?