Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 1 May 14th, 2006 ICASSP 2006, Toulouse, France 1 Defeating Ambient Noise: Practical Approaches for Noise Reduction and Suppression Ivan Tashev Microsoft Research Redmond, USA May 14th, 2006 ICASSP 2006, Toulouse, France 2 Introduction Why signal enhancement is important: Reducing the ambient noise from the captured audio signal is crucial for providing good sound in modern computing systems, critical for the needs of real time communication and speech recognition. Tutorial goal: To present the key theoretical aspects and share our practical experience in the area of noise suppression and reduction for application in sound capture and processing systems. Target audience: Engineers and researchers working in the area of audio signal processing planning or building audio systems for sound capturing.
35
Embed
ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 1
May 14th, 2006 ICASSP 2006, Toulouse, France 1
Defeating Ambient Noise:Practical Approaches for Noise Reduction and Suppression
Ivan TashevMicrosoft Research
Redmond, USA
May 14th, 2006 ICASSP 2006, Toulouse, France 2
IntroductionWhy signal enhancement is important:
Reducing the ambient noise from the captured audio signal is crucial for providing good sound in modern computing systems, critical for the needs of real time communication and speech recognition.
Tutorial goal:To present the key theoretical aspects and share our practical experience in the area of noise suppression and reduction for application in sound capture and processing systems.
Target audience:Engineers and researchers working in the area of audio signal processing planning or building audio systems for sound capturing.
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 2
May 14th, 2006 ICASSP 2006, Toulouse, France 3
Introduction (2)
Noise suppression as science and as art:It is a science, because uses mathematical models and hypotheses, it is repeatable, i.e. we get the same results with the same input dataIt is an art, because it is about human perception of the sound and requires evaluation from a human
For speech signals the process is part of more general term speech enhancement
May 14th, 2006 ICASSP 2006, Toulouse, France 4
Defeating ambient noise: tutorial agenda
BasicsNoise suppressionDirectional microphonesMicrophone arraysAdvanced techniquesFree joke and conclusions
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 3
May 14th, 2006 ICASSP 2006, Toulouse, France 5
Basics
Noise: definition and propertiesSignal: definition and propertiesNoise suppression and reduction, speech enhancementAudio processing in frequency domain: weighting, transformation, synthesisBandpass filtering
May 14th, 2006 ICASSP 2006, Toulouse, France 6
Basics: noise properties
Statistical model: Zero mean Gaussian random processRight: airplane noise PDF vs. Gaussian PDF
In frequency domain: White noise spectrumPink noise: 6 dB/oct decreaseColored noise – with given spectrum Hoth noise: typical room noise model
Temporal characteristics: Pseudo stationary compared to speechSpecific noises may be different: wind noise
In most of the cases the signal is speechStatistical model (in long term):
Zero mean random Gaussian (Laplace, Gamma) process
Frequency domain (in short term):Voiced – e.g. vowels (harmonic structure) Unvoiced – e.g. fricatives (noise type)
Temporal: Speech and nonspeech segments
Spatial: Point sound source (mouth or loudspeaker)
-4 -3 -2 -1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4Probability distribution function
Times standard deviation
Pro
babi
lity
SpeechGaussLaplaceGamma
May 14th, 2006 ICASSP 2006, Toulouse, France 8
Basics: classification
Noise suppression: removing the noise based on statistical models of the noise and signal, spectral subtractionNoise reduction or cancellation: removing the noise based on knowledge or estimation of the corrupting signalSignal (speech) enhancement: more general term for any type of processing aiming improving some property of the signalActive noise cancellation: decreasing the noise level in certain area by sending opposite phase sound with loudspeakers – not discussed in this tutorial
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 5
May 14th, 2006 ICASSP 2006, Toulouse, France 9
Basics: processing flow
Processing in frequency domainAudio frames:
80-1024 samples, 5-25 ms
Frequency domain transformations:Fourier (FFT): symmetric spectra, zero Fs/2 bin, process the first halfMCLT (Malvar, 1992): shifts bins ½ frequency binOther: Hartley, wavelet, cepstra; no re-synthesis
May 14th, 2006 ICASSP 2006, Toulouse, France 10
Basics: processing flow (2)
Overall process (typical):Extract the frame
Weighting TransformProcessInverse transformSynthesis (overlap-add) using ½ of the previous frame
Move one half frame forward, repeat
+ +. . .
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 6
May 14th, 2006 ICASSP 2006, Toulouse, France 11
Basics: processing flow (3)
Weighting function:Keeps the spectral peaks less smearedCommonly used:
Bartlett (triangle)Hann or Hanning (cos-shaped)Modified Hann – sqrt(cos)-shaped, to be applied twice
If re-synthesis is not requiredNatural, Bartlett, Parsen: sinc, sinc2
and sinc4 in frequency domainMax-Fauque-Bertier (sinc): rectangular in frequency domainBlackman and further generalization as Taylor sequence
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.80
0.2
0.4
0.6
0.8
1
Weight windows
Time
Wei
ght
BartlettHannModified Hann
May 14th, 2006 ICASSP 2006, Toulouse, France 12
Basics: bandpass filtering
Bandpass filtering:Do not process frequency bins below and above certain frequencies – zero themTypical low limit: 100-300 Hz for speechTypical high limit: 0.45Fs, reduces aliasingDynamic bandpass filtering
Measure SNR per binAdjust the low and high slopesApply the filter
No kidding!Increases speech intelligibilitySaves artifacts and distortionsSaves efforts and some CPU time
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 7
May 14th, 2006 ICASSP 2006, Toulouse, France 13
Basics: summary
Noise and signal properties: statistical, frequency, temporal, and spatialSuppression vs. reduction vs. enhancement vs. cancellationProcessing in frequency domain
Break in 50% overlapping frames – most commonWeighting function is important, sqrt(cos)-shaped most commonOverlap-add processing
Bandpass filtering: increases intelligibility, reduces artifacts and saves efforts
May 14th, 2006 ICASSP 2006, Toulouse, France 14
Noise suppressionGain based noise suppressiona priori and a posteriori SNRSuppression rulesML and Decision Directed approach for a priori SNR estimationUncertain presence of signalVoice activity detectorsAccounting for the temporal characteristicsOverall architectureDemos
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 8
May 14th, 2006 ICASSP 2006, Toulouse, France 15
Noise suppression: gain based processing
Given signal xn(t) and noise dn(t) mixed in yn(t)Observed in frequency domain, n-th frame, k-th frequency bin: Yk = Xk + DkNoise suppression:
Gk – time varying, non-negative, real value gain (or suppression rule)The estimator keeps the same phase as Yk: under Gaussian assumptions the best phase estimator is observed phase
The goal of noise suppression is for each frame to estimate Gk vector optimal in certain way
( ) .kk k k k k
k
YX G Y G YY
= =
May 14th, 2006 ICASSP 2006, Toulouse, France 16
Noise suppression: a priori and a posteriori SNR
Signal and noise: statistically independent Gaussian processesSignals variances a priori and a posteriori SNRs
The suppression rule is now function of two parameters:
( ), ( ), ( )X D Yk k kλ λ λ
( , )k k kG ξ γ
2( )( )
( )D
Y kk
kγ
λ( )( )( )
X
D
kkk
λξλ
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 9
May 14th, 2006 ICASSP 2006, Toulouse, France 17
Noise suppression:suppression rules
Wiener (1945):
MMSE spectral amplitude estimator
DerivationGoal Solution
Problems:Musical noises in the pausesDistortion in the speech segments
Smooth (in time and/or frequency):Noise modelsGains
Simplify:Do not use more complex models than necessarySimpler model with more precise or faster parameters estimation usually works better
May 14th, 2006 ICASSP 2006, Toulouse, France 26
Noise suppression:demonstrations
Input file Wiener MMSE SPEMcAulay/Malpass Ephraim/Malah
10.722.5-44.8-22.2MMSE SPE
13.225.0-47.0-22.0Ephraim-Malah
2.614.4-36.0-21.6McAulay-Malpass
18.230.2-52.3-22.1Wiener filtered
11.8-33.3-21.5Not processed
ImprovementSNRNoiseSignalAlgorithm
Note: All measurement units are dB
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 14
May 14th, 2006 ICASSP 2006, Toulouse, France 27
Noise suppression:summary
Noise suppression as time varying, real value, non-negative gain (or suppression rule) based operationa priori and a posteriori SNRs estimation is essential – the decision-directed approachSignal may or may not be present – voice activity detectors are criticalEstimation of precise noise model is with high importanceSmoothing in time improves listening results
May 14th, 2006 ICASSP 2006, Toulouse, France 28
Directional microphones
Microphone typesPressure gradient microphoneParameters for directional microphonesFirst order directional microphonesClassification and parametersBottom line
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 15
May 14th, 2006 ICASSP 2006, Toulouse, France 29
Directional microphones:microphone types
Microphone is a device that converts the air pressure to a electric signalMicrophone types:
Carbon – in first phonesCrystal – piezoelectric effect basedDynamic – inverted loudspeakerCondenser – measurement grade micsElectret – the most common today
Directivity patternDirectivity index Sensitivity, -45 dBV/Pa typicalSNR, 60 dB typicalFrequency response: front/back
10 2
0 0
( , , )( ) 10.log1 ( , , )
4
T TP fDI f
d d P fπ π
ϕ θ
θ ϕ ϕ θπ
⎛ ⎞⎜ ⎟⎜ ⎟
= ⎜ ⎟⎜ ⎟⋅⎜ ⎟⎜ ⎟⎝ ⎠
∫ ∫2
0( , , ) ( , ) , constantP f U f cϕ θ ρ ρ= = =
( , )U f c
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 18
May 14th, 2006 ICASSP 2006, Toulouse, France 35
Directional microphones:summary
In the Noise suppression section we learned that 6 dB noise suppression is a good achievementAn cardioid microphone gives 4.8 dB noise reduction without distortions and artifactsIn real systems design using directional microphones is importantThe microphone directivity pattern is further denoted as U(f,c), f – frequency, c – look-up direction { , , }c θ ϕ ρ=
linear, planar, 3Dcompact and largeuniform, nonuniform and random spacingnear field and far field
Advantage: allow spatial filtering, reducing the noises and reverberationDisadvantage: require more microphones and more processing time
May 14th, 2006 ICASSP 2006, Toulouse, France 38
Microphone arrays:delay-and-sum beamformer
The most intuitive approachShift the signals to align them and sumAdvantages:
Simple and efficientProblems:
Variable directivityBig sidelobesLow efficiency
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 20
May 14th, 2006 ICASSP 2006, Toulouse, France 39
Microphone arrays:terminology
Beamforming: making the microphone array to listen to given look-up directionBeamsteering: electronically change the look-up direction the microphone array listens toNullsteering: suppressing the sounds coming from given directionSound source localization: techniques to detect, localize and track one or multiple sound sources using microphone array
May 14th, 2006 ICASSP 2006, Toulouse, France 40
Microphone arrays:general parameters
Generalized form:M – number of microphonesXi(f) – spectrum of i-th channelW(f,i) – weight coefficients matrixY(f) – output signal
Parameters:Directivity pattern B:
Main Response Axis – direction towards max sensitivity, look-up directionBeamwidth: area -3 dB around MRA
1
0( ) ( , ) ( )
M
ii
Y f W f i X f−
=
= ∑
2
( , ) ( ) ( , ),
( , ) ( , )
m
H
c pj f
m
B f W f D f
eD f U f cc p
πν
θ θ
θ
−−
= ⋅
=−
maxθ
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 21
May 14th, 2006 ICASSP 2006, Toulouse, France 41
Microphone arrays:general parameters (2)
Ambient noise gain: isotropic noise reduction
Non-correlated (sensor) noise gain
Total noise gain: combination of the two above
The beamformer design is to find weight matrix to satisfy certain criteria & constrains
Max noise suppression: highly non-linearReplaced with directivity pattern matching – reducing the optimization dimensionsIsotropic noise assumption
Constrains:Unit gain and zero phase shift towards MRAFrequently: in the beamwidth area
Two controversial trends: decreasing the ambient noise gain increases the non-correlated noise gain. Optimum? – Minimize the total gain
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 22
May 14th, 2006 ICASSP 2006, Toulouse, France 43
Microphone arrays:time invariant beamformer (2)
Superidirective beamformer (Cox, 1986)
is the power spectral density matrix of the input signals assuming isotropic noiseConstrained LMS algorithm, antenna arrayAchieves maximum directivityChu, 1997; Elko, 2000
min( ) 1H HXXW
W W subject to W DΦ =
XXΦ
May 14th, 2006 ICASSP 2006, Toulouse, France 44
Microphone arrays:time invariant beamformer (3)
0 2000 4000 6000 80000
5
10
15
20
25
Frequency (Hz)
Dire
ctiv
ity (l
inea
r)
SuperdirectiveDelay-and-sum
0 2000 4000 6000 8000-60
-50
-40
-30
-20
-10
0
10
Frequency (Hz)
Whi
te n
oise
gai
n (d
B)
SuperdirectiveDelay-and-sum
-20
-10
0
90
270
180 0
Superdirective beamformer (f=3000 Hz)
-20
-10
0
90
270
180 0
Delay-and-sum beamformer (f=3000 Hz)
Comparison:Delay and sum and Superdirective array
Simulation:5 element linear array,3 cm distance
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 23
May 14th, 2006 ICASSP 2006, Toulouse, France 45
Microphone arrays:time invariant beamformer (4)
Design example (Tashev/Malvar, 2005)Four element linear arrayBeamwidth vs. Frequency vs. Total Noise GainDirectivity pattern vs. FrequencyDirectivity pattern in 3D for 1000 Hz
No VAD required Stable, reliable, predictable, measurableGuaranteed parametersFast switching to different speakerLow CPU requirement
Real-world problems: Requires Sound Source Localizer to find and track the desired sound sourceSensor’s & equipment’s noises limit the performance Microphones manufacturing tolerances:
Calibration during manufacturingAuto calibration during use (Tashev, 2004)
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 24
May 14th, 2006 ICASSP 2006, Toulouse, France 47
Microphone arrays:source localization
Time delay estimates basedCross-correlation functionWeighting: ML, PHAT (Knap/Carter, 1976)Combining the pairs
Brandstein et. all., 1996Burchfield et. all., 2001 – uses optimization, works in 2DRui/Florencio, 2003 – sum or cross-correlation functions towards hypothesis
Beamsteering basedCompute the output energy of set of beamsFind the maximumDo interpolation for increased precisionVariant: two dimensional search
May 14th, 2006 ICASSP 2006, Toulouse, France 48
Microphone arrays:source localization (2)
Problems: noise and reverberation Post-processing the raw SSL results
Camera-assisted approachFace detection softwareFusion SSL and video data
•Real SSL results: raw, post-processed, snapped to 10 degrees beams.•Two persons talking at 6 and -38 degrees, distance 12 feet, conference room.•Four element linear array.
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 25
May 14th, 2006 ICASSP 2006, Toulouse, France 49
Microphone arrays:adaptive algorithms
Frost algorithm (Frost, 1972)
is the power spectral density matrix of the input signalsGradient descent optimization, i.e. constrained LMS algorithmDesigned for antenna array
min( ) 1H HXXW
W W subject to W DΦ =
XXΦ
May 14th, 2006 ICASSP 2006, Toulouse, France 50
Microphone arrays:adaptive algorithms (2)
Generalized Side Lobe Canceller (Griffiths/Jim, 1982)Time-invariant beamformerNulls are sharper than beamsBlocking matrix – place null towards the sound sourceAdaptive filters to minimize residual in the beamformer output
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 26
May 14th, 2006 ICASSP 2006, Toulouse, France 51
Microphone arrays:adaptive algorithms (3)
AdvantagesUse fully the geometry under the specific noiseVery good with point noise sources No calibration required
Real-world problemsHigher requirement for CPU, memoryMore complex for implementationSlower adaptation and switching to next sound sourceNon-predictable and non-guaranteed parametersSimilar to fixed beamformers performance with ambient type of noise
May 14th, 2006 ICASSP 2006, Toulouse, France 52
Microphone arrays:non-linear spatial filtering
Implemented as non-linear post-processorBased on Instantaneous Direction Of Arrival (IDOA) estimation per bin
where Compute the probability and apply in the same way as in noise suppression under uncertain presence of signal
Generalized suppression with spatial information and known look-up directionDemo:
Recording conditions:Human speaker at 0 degrees, 1.5 mRadio at -45 degrees, 2 mOffice: normal noise and reverberationFour element linear microphone array
Same audio recording, two sequences:video: direction-frequency-power; audio: one microphonevideo: direction-power for SSL; audio: array output
Concept:More energy removed -> more musical noises and distortionsMasking effects in frequency and time domains in human perception of soundWhy remove noises we can’t hear?
Real-life issuesNeeds MOS tests for evaluationDuplicates codec functionality – the new audio codecs use the same effect
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 30
May 14th, 2006 ICASSP 2006, Toulouse, France 59
Advanced techniques:noise suppressor for ASR
General idea: optimize parameterized suppression rule for best recognition rate (Tashev/Droppo/Acero, 2006)
More training data improves average recognition, harms clean speech recognitionRprop optimization algorithm: enhanced version of gradient descent Objective function: Maximum Mutual Information (MMI) from ASR, closely related to the recognition accuracyOptimization parameters: the suppression ruleStarting point: MMSE Spectral Power Estimator rule
General idea:Detect and parse the speech signal: fricatives, vowels, glides, nasals, stopsMeasure the parametersSynthesize clean speech signal
Real-world issues:If we can do reliably the parsing – we solved the noise robust ASR problems ☺Even text-to-speech systems do not have very good pronunciation, doing this without language model is more difficult
May 14th, 2006 ICASSP 2006, Toulouse, France 62
Advanced techniques:using speech model (2)
Drucker (1968):Detect and parse the speech signal: fricatives, vowels, glides, nasals, stopsUse separate enhancing filters for each categoryHard decision for presence and class
McAulay/Malpass (1980) introduced soft decision rules and using several filters in parallelSome techniques:
Use the harmonic structure of vowels, time warping to make them flat, clean, un-warpUse vocal tract model for generating fricatives and other consonantsUsing language model (too specific)
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 32
May 14th, 2006 ICASSP 2006, Toulouse, France 63
Advanced techniques:spatial noise suppression
Microphone array for headset (Tashev/Seltzer/Acero, 2005)
3-element microphone arrayBone sensor for reliable VADWorking in IDOA space
Multidimensional generalization of classic noise suppression
Building position-dependent noise modelsApply suppression rule
[ ]1 2 1( ) ( ), ( ), , ( )Mf f f fδ δ δ −∆ …
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
Phase differences for 1000 Hz
D12
D13
1 1( ) arg( ( )) arg( ( ))j jf X f X fδ − = −
May 14th, 2006 ICASSP 2006, Toulouse, France 64
Advanced techniques:spatial noise suppression (2)
0 2000 4000 6000 8000-10
-8
-6
-4
-2
0
2
Frequency, Hz
Mag
nitu
de, d
B
Mic1Mic2
•General architecture•Beamformerdirectivity•Diffraction around the head correction
16.411.16.43.2Car, 90 dB22.817.512.37.2Café, 75 dB 34.729.422.525.2Office, 55 dB SRNSBFBM
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 34
May 14th, 2006 ICASSP 2006, Toulouse, France 67
Advanced techniques:summary
Improving further the noise suppression and reduction increases complexity, requires more information.The algorithms become more specialized: for car, for speech, for ASR, for specific noises.Use good judgment when use or design them:
Do I need this? How specific is the application?
Remember: more complex model with more parameters means slower computation and adaptation. Use with caution.Still very exciting new algorithms, solving problems unsolved so far.
May 14th, 2006 ICASSP 2006, Toulouse, France 68
Defeating ambient noise: final remarks
The art of noise suppression is to know when to stop.None of the methods is universal, use cascading and make sure not to destroy important properties.Build processing blocks, think the whole system: well balanced suppression across the processing chain.Noise suppression is about human perception: use your ears and MOS tests.
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 35
May 14th, 2006 ICASSP 2006, Toulouse, France 69
Finally
Thank you for choosing this tutorial!Thank you for the attention!