Daichi Kitamura Nobutaka Ono Hiroshi Sawada Hirokazu Kameoka Hiroshi Saruwatari Relaxation of Rank-1 Spatial Constraint in Overdetermined Blind Source Separation (SOKENDAI) (NII/SOKENDAI) (NTT) (The Univ. of Tokyo/NTT) (The Univ. of Tokyo) EUSIPCO 2015, 2 Sept.,14:30 - 16:10, SS30 Acoustic scene analysis using microphone array
23
Embed
Relaxation of rank-1 spatial constraint in overdetermined blind source separation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Daichi KitamuraNobutaka Ono
Hiroshi SawadaHirokazu KameokaHiroshi Saruwatari
Relaxation of Rank-1 Spatial Constraint in Overdetermined Blind Source Separation
(SOKENDAI)(NII/SOKENDAI)(NTT)(The Univ. of Tokyo/NTT)(The Univ. of Tokyo)
EUSIPCO 2015, 2 Sept.,14:30 - 16:10, SS30 Acoustic scene analysis using microphone array
Research Background• Blind source separation (BSS)
– Estimation of original sources from the mixture signal
– We only focus on overdetermined situations • Number of sources Number of microphones• Ex) Independent component analysis, independent vector analysis
• Applications of BSS– Acoustic scene analysis, speech enhancement, music
analysis, reproduction of sound field, etc.2/21
Original sources Observation (mixture) Estimated sources
Mixing system BSS
Unknown
Problems and Motivations• For reverberant signals
– ICA-based methods cannot separate sources well because Linear time-invariant mixing system is assumed
– When the number of microphones is grater than the number of sources, PCA is often applied before BSS
• Reverberation is also important information to analyze acoustic scenes– We should separate the sources with their own
reverberations.3/21
Original sources
Observed signals
Mixing
Estimated sources
BSS
Dimension-reduced signals
PCA
Instantaneous mixing in time-frequency domain
To remove weak (reverberant) components of all the sources
= Linear mixing assumption as well as IVAModeled by rank-1 matrices (constraint)
Cluster-indicator
• Rank-1 spatial constraint Linear mixing assumption– Instantaneous mixture in a time-frequency domain– Mixing system can be represented by mixing matrix
Rank-1 Spatial Constraint
8/21
1. Sources can be modeled as point sources2. Reverberation time is shorter than FFT length
Fr
eque
ncy
Time
Observed spectrogram
Time-invariant mixing matrix
Observed signal
Source signal
• When reverberation time is longer than FFT length,– the impulse response becomes long– reverberant components leak into the next time frame
Problem of Rank-1 Spatial Model
9/21
Mixing system cannot be represented by using only . The separation performance markedly degrades.
Fr
eque
ncy
Time
Observed spectrogramObserved
signalSource signal
Leaked components
Summary of Conventional methods• MNMF [Ozerov, 2010], [Sawada, 2013]
– Full-rank spatial model• does not use rank-1 spatial constraint
– much computational costs– strong dependence on initial values
– Rank-1 spatial constraint (linear mixing assumption)• Separation performance degrades for the reverberant signals
– Faster and more stable optimization
10/21
Relax the rank-1 spatial constraint while maintaining efficient optimization
To achieve good and stable separation even for the reverberant signals,
• Dimensionality reduction with principal component analysis (PCA)– remove reverberant components of all the sources by PCA– But the reverberant components are important!
• Utilize extra observations to model direct and reverberant components simultaneously.– microphones for sources, where
Proposed Approach
11/21
Original sources
Observed signals
Mixing
Estimated sources
BSS
Dimension-reduced signals
PCA
Ex. sources, microphones ( )
Proposed Approach
12/21
• Utilize extra observations to model direct and reverberant components simultaneously.– microphones for sources, where
Original sources
Observed signals
Mixing
Ex. sources, microphones ( )
Estimated sources
Reconstruction
Separated components
BSS
IVA or Rank-1 MNMF
Proposed Approach
13/21
• Utilize extra observations to model direct and reverberant components simultaneously.– microphones for sources, where
Original sources
Observed signals
Mixing
Ex. sources, microphones ( )
DirectReverb.
DirectReverb.
Estimated sources
Reconstruction
Separated components
BSS
• We assume the independence between not only sources but also the direct and reverberant components of the same sources.
• Permutation problem of separated components– Order of separated components depends on initial values
• We propose two methods to cluster the components– 1. Using cross-correlations for IVA– 2. Sharing basis matrices for Rank-1 MNMF
Clustering of Separated Components
14/21
Separated components
? Which separated components belong to which source?
• Permutation problem of separated components– Order of separated components depends on initial values
• We propose two methods to cluster the components– 1. Using cross-correlations for IVA– 2. Sharing basis matrices for Rank-1 MNMF
Clustering of Separated Components
15/21
Estimated source
Reconstruction
Separated components
Clustered components
Direct component of source 1
Clustering
Reverb. component of source 1
Direct component of source 2
Reverb. component of source 2
Clustering Using Spectrogram Correlation• Direct and reverberant components of the same
source have a strong cross-correlation.
• Cross-correlation of two power spectrograms
– Calculate for all combination of separated components– Merge the components in a descending order of
16/21
Power spectrogram of Power spectrogram of
・・・
• Direct and reverberant components can be modeled by the same bases (spectral patterns)
• Estimate signals with Basis-Shared Rank-1 MNMF
– Only for Rank-1 MNMF• because IVA doesn’t have NMF source model
– By imposing basis-shared source model, Rank-1 MNMF can automatically cluster the components.
Auto-Clustering by Sharing Basis Matrix
17/21
Separated components
Source model of Basis-Shared Rank-1 MNMF
Shared basis matrix for source 1
Reconstruction
Estimated sources
Shared basis matrix for source 2
Direct component of source 1
Reverb. component of source 1
Direct component of source 2
Reverb. component of source 2
• Conditions
– JR2 impulse response
Experiments
Original source Professionally-produced music signals from SiSEC database JR2 impulse response in RWCP database is used Two sources and four microphones
Sampling frequency Down sampled from 44.1 kHz to 16 kHz
FFT length in STFT 8192 points (128 ms, Hamming window)
Shift length in STFT 2048 points (64 ms)
Number of bases 15 bases for each source (30 bases for all the sources)
Number of iterations 200
Number of trials 10 times with various seeds of random initialization
Evaluation criterion Average SDR improvement and its deviation
18/21
Reverberation time: 470 ms 2 m
Source 1
80 60
Microphone spacing: 2.83 cm
Source 2
• Compared methods (7 methods)– PCA + 2ch IVA
• Apply PCA before IVA– PCA + 2ch Rank-1 MNMF
• Apply PCA before Rank-1 MNMF– 4ch IVA + Clustering
• Apply IVA without PCA, and cluster the components– 4ch Basis-Shared Rank-1 MNMF