On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan, and Ee-Leng Tan 24 th April 2015 [email protected]Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
12
Embed
On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
On the Preprocessing and Postprocessing of HRTF Individualization Based on
1. Head-related transfer functions (HRTFs) are highly individualized;
2. HRTFs are closely related to Anthropometry (torso, head, pinna);
3. Anthropometry can be used for HRTF individualization.
IndividualizationAnthropometry
of a new person
HRTF of the
new person
Anthropometry database
HRTF database
In this paper, we aim to answer:
1. Whether the preprocessing and postprocessing methods affect the performance of HRTF individualization?
2. If so, what is the best preprocessing and postprocessing combination?
3. And, how good is it?
CIPIC Anthropometric data (35 subjects)
3V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, Oct. 2001.
Methodology
4
IndividualizationA1 H1?
A
H
Anthropometry database A : S subjects * 1 set of Anthropometry (F features)
Anthropometry of a new person A1: 1 subjects * 1 set of Anthropometry
HRTF database H : S subjects * 1 set of HRTF (D directions * K points)
HRTF of a new person H1 : 1 subjects * 1 set of HRTF
P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy,
pp. 4501-4505, May 2014.
Methodology
5
Preprocessing of Anthropometry i
1. Direct
2. Min-max normalization
3. Standard score
4. Standard deviation normalization
0 0
0
0
, 1
-min , 2
max min
1,-mean , 3
std
, 4std
i
f i
f fi
f f
f ff fi
f
fi
f
A
A A
A A
A A A
A
A
A
0 1
2,..., .
where
F
A A A
Preprocessing of HRTF m
1. Magnitude
2. Log magnitude
3. Power
10
2
, , 11,2,..., ;
, 20 log , , 2 1, 2,...,
, , 3
m
d k md D
d k d k mk K
d k m
H
H H
H
Sparse representation j
1. Direct
2. Nonnegative
,
,, ,
,
1
, 1
. , 2
i j
i ji j l
Si j
s
l
lw s
A
AH
A
w
ww
, ,
, ,
,, , , 20
1
, ,
, , 1
ˆ , 10 , 2 .
, , 3
i j l m
i j l m
d ki j l m
i j l m
d k m
d k m
d k m
H
H
w H
H
w H
H
w H
2,11
2 1
2,21
2 1
arg min ,
arg min , s.t. 0.
i
i
i i i i i
i i i i i i
A
A
A A Aw
A A A Aw
w A w A w
w A w A w w
1
ii i AwA A Postprocessing of Anthropometry l
1. Direct
2. Normalized
In total, we have variants of methods! 484× 3× 2× 2 =
PreprocessingH
PreprocessingA
HRTF of the new person
HRTF databasePreprocessing
A
SynthesisSparse
representation
Anthropometry database
Anthropometry of a new person wA
(i,j)
Postprocessing A
A(i)
A1(i)
H(m)
wH(i,j,l)
A
A1
, , ,1
ˆ i j l mH
H
PreprocessingA
PreprocessingA
Anthropometry database
Anthropometry of a new person
A(i)
A1(i)
A
A1
Sparse representation
wA(i,j)
PreprocessingH
HRTF database
H(m)H
Postprocessing A
wH(i,j,l)
HRTF of the new personSynthesis
, , ,1
ˆ i j l mH
Evaluation
6
, , , ,
2, , , ,
101 1 1
Spectral distortion SD
ˆ ,1 1 120log dB
,
test
i j l m n
i j l m nS D Ks
s d ktest s
d k
S D K d k
H
H
• CIPIC HRTF database;
• Cross validation technique to selection the regularization
parameter;
• Stest = 35 test cases, all 1250 directions, and full frequency
range.
Performance varies among different preprocessing and postprocessing methods!
Sparse representation PostA PreH
PreA
Direct Min-max Standard score
Standard deviation
Direct
Direct
Mag 6.37 6.57 81.00 6.23
Log mag 6.40 6.50 21.61 6.17
Power 6.56 6.60 78.94 6.46
Normalized
Mag 6.36 6.35 15.97 6.25
Log mag 6.37 6.26 8.89 6.17
Power 6.60 6.77 25.21 6.52
Nonnegative
Direct
Mag 6.32 6.32 6.47 6.23
Log mag 6.38 6.47 6.79 6.17
Power 6.52 6.37 6.55 6.46
Normalized
Mag 6.31 6.26 6.10 6.25
Log mag 6.35 6.20 5.86 6.17
Power 6.53 6.54 6.54 6.52
1 2 3 45.6
5.8
6
6.2
6.4
6.6
6.8(a)
SD
(dB
)
Preprocessing method for A1 2 3 4
5.6
5.8
6
6.2
6.4
6.6
6.8(b)
SD
(dB
)
Preprocessing method for A1 2 3 4
5.6
5.8
6
6.2
6.4
6.6
6.8(c)
SD
(dB
)
Preprocessing method for A1 2 3 4
5.6
5.8
6
6.2
6.4
6.6
6.8(d)
SD
(dB
)
Preprocessing method for A
Mag Log mag Power
1 2 3 45.6
5.8
6
6.2
6.4
6.6
6.8(a)
SD
(dB
)
Preprocessing method for A1 2 3 4
5.6
5.8
6
6.2
6.4
6.6
6.8(b)
SD
(dB
)
Preprocessing method for A1 2 3 4
5.6
5.8
6
6.2
6.4
6.6
6.8(c)
SD
(dB
)
Preprocessing method for A1 2 3 4
5.6
5.8
6
6.2
6.4
6.6
6.8(d)
SD
(dB
)
Preprocessing method for A
Mag Log mag Power
Results
7
Direct sparse representation
1. PreA: standard deviation best, standard score worst;
2. PreH: log mag best, power worst;
3. PostA, PostH: minimal effect for good PreA, PreH.
Nonnegative sparse representation
1. Better than corresponding direct sparse representation (especially for
standard score);
2. Trend in PreA/PreH not obvious;
3. Normalized PostA can improve the performance (especially for standard
score).
(a) Direct sparse; Direct PostA
(b) Direct sparse; Normalized PostA
(c) Nonnegative sparse; Direct PostA
(d) Nonnegative sparse; Normalized PostA
Method Specification SD (dB)
Single bestSelect one single set of HRTF
with the corresponding closest anthropometry
8.11
Tashev et alMin-max PreA
Magnitude PreHDirect sparse representation No reported postprocessing
6.57
Our bestStandard score PreALog magnitude PreH
Nonnegative sparse representationNormalized PostA
5.86
Lower boundLinear regression based HRTF
individualization 5.12
Comparison
8
opt 2 21
w H H
Conclusions
9
1. Introduced preprocessing and postprocessing in HRTF individualization based on sparse
representation of anthropometric features.
2. Investigated 48 variants of preprocessing and postprocessing methods, and found
a) Preprocessing and postprocessing methods do affect the performance of HRTF individualization, though the effects
differ in different combinations;
b) Adding nonnegative constraints in sparse representation improves the performance;
c) The best combination for HRTF individualization is
3. Established the lower bound for this type of HRTF individualization and verified that “our best”
combination outperforms existing approaches and is quite close to the lower bound.
4. Future work: subjective evaluation of HRTF individualization.
References
10
[1] D. R. Begault, 3-D Sound for Virtual Reality and Multimedia, Cambridge, MA: Academic Press, 1994.
[2] J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, The MIT Press, revised edition, 1996.
[3] H. Møller, “Fundamentals of Binaural Technology,” Applied Acoustics, vol. 36, 171-218, 1992.
[4] W. G. Gardner, and K. D. Martin, “HRTF Measurements of a KEMAR,” J. Acoust. Soc. Amer., vol., vol. 97, pp. 3907-3908, 1995. See also http://www.sound.media.mit.edu/KEMAR.html.
[5] H. Møller, M. F. Sørensen, D. Hammershøi, and C. B. Jensen, “Head-Related Transfer Functions of Human Subjects,” J. Aud. Eng. Soc., vol. 43, pp. 300-321, 1995.
[6] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, USA, Oct. 2001.
[7] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, “Localization Using Non-individualized Head-Related Transfer Functions,” J. Acoust. Soc. Amer., vol. 94, pp. 111-123, 1993.
[8] S. Xu, Z. Li, and G. Salvendy, “Individualization of Head-related transfer function for three-dimensional virtual auditory display: a review,” in R. Shumaker (Ed.): Virtual Reality, HCII 2007,
LNCS 4563, pp. 397–407, 2007.
[9] K. Sunder, J. He, E. L. Tan, and W. S. Gan, “Natural sound rendering for headphones,” IEEE Signal Processing Magazine, vol. 32, no.2, pp. 100-113, Mar. 2015.
[26] P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy, pp.
4501-4505, May 2014.
[28] S. J. Kim, K. Koh, M. Lusig, S. Boyd, and D. Gorinevsky, “An interior-point method for large-scale l1-regularized least squares,” J. Selected topics in signal processing, vol. 1, no. 4, pp. 606-
617, Dec. 2007.
[30] J. Breebaart, F. Nater, and A. Kohlrausch, “Spectral and spatial parameter resolution requirements for parametric, filter-bank-based HRTF processing,” J. Audio Eng. Soc., vol. 58, no. 3, pp. 126-
140, Mar. 2010.
Acknowledgement
11
THIS WORK IS SUPPORTED BY THE SINGAPORE MINISTRY OF EDUCATION ACADEMIC RESEARCH FUND
TIER-2, UNDER RESEARCH GRANT MOE2010-T2-2-040.
On the Preprocessing and Postprocessing of HRTF Individualization Based on