Top Banner
On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan, and Ee-Leng Tan 24 th April 2015 [email protected] Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
12

On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Dec 26, 2015

Download

Documents

Dinah Hunt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

On the Preprocessing and Postprocessing of HRTF Individualization Based on

Sparse Representation of Anthropometric Features

Jianjun HE, Woon-Seng Gan, and Ee-Leng Tan

24th

April 2015

[email protected]

Digital Signal Processing Lab,

School of Electrical and Electronic Engineering,

Nanyang Technological University, Singapore

Page 2: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

WHY

2

1. Head-related transfer functions (HRTFs) are highly individualized;

2. HRTFs are closely related to Anthropometry (torso, head, pinna);

3. Anthropometry can be used for HRTF individualization.

IndividualizationAnthropometry

of a new person

HRTF of the

new person

Anthropometry database

HRTF database

In this paper, we aim to answer:

1. Whether the preprocessing and postprocessing methods affect the performance of HRTF individualization?

2. If so, what is the best preprocessing and postprocessing combination?

3. And, how good is it?

Page 3: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

CIPIC Anthropometric data (35 subjects)

3V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, Oct. 2001.

Page 4: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Methodology

4

IndividualizationA1 H1?

A

H

Anthropometry database A : S subjects * 1 set of Anthropometry (F features)

Anthropometry of a new person A1: 1 subjects * 1 set of Anthropometry

HRTF database H : S subjects * 1 set of HRTF (D directions * K points)

HRTF of a new person H1 : 1 subjects * 1 set of HRTF

P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy,

pp. 4501-4505, May 2014.

Page 5: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Methodology

5

Preprocessing of Anthropometry i

1. Direct

2. Min-max normalization

3. Standard score

4. Standard deviation normalization

0 0

0

0

, 1

-min , 2

max min

1,-mean , 3

std

, 4std

i

f i

f fi

f f

f ff fi

f

fi

f

A

A A

A A

A A A

A

A

A

0 1

2,..., .

where

F

A A A

Preprocessing of HRTF m

1. Magnitude

2. Log magnitude

3. Power

10

2

, , 11,2,..., ;

, 20 log , , 2 1, 2,...,

, , 3

m

d k md D

d k d k mk K

d k m

H

H H

H

Sparse representation j

1. Direct

2. Nonnegative

,

,, ,

,

1

, 1

. , 2

i j

i ji j l

Si j

s

l

lw s

A

AH

A

w

ww

, ,

, ,

,, , , 20

1

, ,

, , 1

ˆ , 10 , 2 .

, , 3

i j l m

i j l m

d ki j l m

i j l m

d k m

d k m

d k m

H

H

w H

H

w H

H

w H

2,11

2 1

2,21

2 1

arg min ,

arg min , s.t. 0.

i

i

i i i i i

i i i i i i

A

A

A A Aw

A A A Aw

w A w A w

w A w A w w

1

ii i AwA A Postprocessing of Anthropometry l

1. Direct

2. Normalized

In total, we have variants of methods! 484× 3× 2× 2 =

PreprocessingH

PreprocessingA

HRTF of the new person

HRTF databasePreprocessing

A

SynthesisSparse

representation

Anthropometry database

Anthropometry of a new person wA

(i,j)

Postprocessing A

A(i)

A1(i)

H(m)

wH(i,j,l)

A

A1

, , ,1

ˆ i j l mH

H

PreprocessingA

PreprocessingA

Anthropometry database

Anthropometry of a new person

A(i)

A1(i)

A

A1

Sparse representation

wA(i,j)

PreprocessingH

HRTF database

H(m)H

Postprocessing A

wH(i,j,l)

HRTF of the new personSynthesis

, , ,1

ˆ i j l mH

Page 6: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Evaluation

6

, , , ,

2, , , ,

101 1 1

Spectral distortion SD

ˆ ,1 1 120log dB

,

test

i j l m n

i j l m nS D Ks

s d ktest s

d k

S D K d k

H

H

• CIPIC HRTF database;

• Cross validation technique to selection the regularization

parameter;

• Stest = 35 test cases, all 1250 directions, and full frequency

range.

Performance varies among different preprocessing and postprocessing methods!

Sparse representation PostA PreH

PreA

Direct Min-max Standard score

Standard deviation

Direct

Direct

Mag 6.37 6.57 81.00 6.23

Log mag 6.40 6.50 21.61 6.17

Power 6.56 6.60 78.94 6.46

Normalized

Mag 6.36 6.35 15.97 6.25

Log mag 6.37 6.26 8.89 6.17

Power 6.60 6.77 25.21 6.52

Nonnegative

Direct

Mag 6.32 6.32 6.47 6.23

Log mag 6.38 6.47 6.79 6.17

Power 6.52 6.37 6.55 6.46

Normalized

Mag 6.31 6.26 6.10 6.25

Log mag 6.35 6.20 5.86 6.17

Power 6.53 6.54 6.54 6.52

1 2 3 45.6

5.8

6

6.2

6.4

6.6

6.8(a)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(b)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(c)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(d)

SD

(dB

)

Preprocessing method for A

Mag Log mag Power

Page 7: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

1 2 3 45.6

5.8

6

6.2

6.4

6.6

6.8(a)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(b)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(c)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(d)

SD

(dB

)

Preprocessing method for A

Mag Log mag Power

Results

7

Direct sparse representation

1. PreA: standard deviation best, standard score worst;

2. PreH: log mag best, power worst;

3. PostA, PostH: minimal effect for good PreA, PreH.

Nonnegative sparse representation

1. Better than corresponding direct sparse representation (especially for

standard score);

2. Trend in PreA/PreH not obvious;

3. Normalized PostA can improve the performance (especially for standard

score).

(a) Direct sparse; Direct PostA

(b) Direct sparse; Normalized PostA

(c) Nonnegative sparse; Direct PostA

(d) Nonnegative sparse; Normalized PostA

Page 8: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Method Specification SD (dB)

Single bestSelect one single set of HRTF

with the corresponding closest anthropometry

8.11

Tashev et alMin-max PreA

Magnitude PreHDirect sparse representation No reported postprocessing

6.57

Our bestStandard score PreALog magnitude PreH

Nonnegative sparse representationNormalized PostA

5.86

Lower boundLinear regression based HRTF

individualization 5.12

Comparison

8

opt 2 21

w H H

Page 9: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Conclusions

9

1. Introduced preprocessing and postprocessing in HRTF individualization based on sparse

representation of anthropometric features.

2. Investigated 48 variants of preprocessing and postprocessing methods, and found

a) Preprocessing and postprocessing methods do affect the performance of HRTF individualization, though the effects

differ in different combinations;

b) Adding nonnegative constraints in sparse representation improves the performance;

c) The best combination for HRTF individualization is

< standard score + log magnitude + nonnegative + normalized >.

3. Established the lower bound for this type of HRTF individualization and verified that “our best”

combination outperforms existing approaches and is quite close to the lower bound.

4. Future work: subjective evaluation of HRTF individualization.

Page 10: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

References

10

[1] D. R. Begault, 3-D Sound for Virtual Reality and Multimedia, Cambridge, MA: Academic Press, 1994.

[2] J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, The MIT Press, revised edition, 1996.

[3] H. Møller, “Fundamentals of Binaural Technology,” Applied Acoustics, vol. 36, 171-218, 1992.

[4] W. G. Gardner, and K. D. Martin, “HRTF Measurements of a KEMAR,” J. Acoust. Soc. Amer., vol., vol. 97, pp. 3907-3908, 1995. See also http://www.sound.media.mit.edu/KEMAR.html.

[5] H. Møller, M. F. Sørensen, D. Hammershøi, and C. B. Jensen, “Head-Related Transfer Functions of Human Subjects,” J. Aud. Eng. Soc., vol. 43, pp. 300-321, 1995.

[6] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, USA, Oct. 2001.

[7] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, “Localization Using Non-individualized Head-Related Transfer Functions,” J. Acoust. Soc. Amer., vol. 94, pp. 111-123, 1993.

[8] S. Xu, Z. Li, and G. Salvendy, “Individualization of Head-related transfer function for three-dimensional virtual auditory display: a review,” in R. Shumaker (Ed.): Virtual Reality, HCII 2007,

LNCS 4563, pp. 397–407, 2007.

[9] K. Sunder, J. He, E. L. Tan, and W. S. Gan, “Natural sound rendering for headphones,” IEEE Signal Processing Magazine, vol. 32, no.2, pp. 100-113, Mar. 2015.

[26] P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy, pp.

4501-4505, May 2014.

[28] S. J. Kim, K. Koh, M. Lusig, S. Boyd, and D. Gorinevsky, “An interior-point method for large-scale l1-regularized least squares,” J. Selected topics in signal processing, vol. 1, no. 4, pp. 606-

617, Dec. 2007.

[30] J. Breebaart, F. Nater, and A. Kohlrausch, “Spectral and spatial parameter resolution requirements for parametric, filter-bank-based HRTF processing,” J. Audio Eng. Soc., vol. 58, no. 3, pp. 126-

140, Mar. 2010.

Page 11: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Acknowledgement

11

THIS WORK IS SUPPORTED BY THE SINGAPORE MINISTRY OF EDUCATION ACADEMIC RESEARCH FUND

TIER-2, UNDER RESEARCH GRANT MOE2010-T2-2-040.

Page 12: On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

On the Preprocessing and Postprocessing of HRTF Individualization Based on

Sparse Representation of Anthropometric Features

Jianjun HE

[email protected]

Nanyang Technological University, Singapore

Thank you!