Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 1 1. Robustness and Separation 2. An Academic Journey 3. Future Dan Ellis Columbia / ICSI [email protected]http://labrosa.ee.columbia.edu / C OLUMBIA U NIVERSITY IN THE CITY OF NEW YORK or Morgan, Me & Pitch Robustness, Separation & Pitch
16
Embed
or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 1999: “Size Matters” • All you need is
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /161
1. Robustness and Separation2. An Academic Journey3. Future
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
1953: How To Separate Speech?• The “Cocktail Party Problem” [Cherry ’53]
• Spatial information: ATC over a single speaker• Pitch differences via gender differences
• Auditory Scene Analysis [Bregman ’90]
2
• Grouping cues• Onset• Harmonicity• Common Fate• “Schema”
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
The Usefulness of Pitch• Common pitch can link energy
from a single source
3
!"#$%&$'()#*"&+$,-"#$'(!$./-#01232'(4-**2&2#52.
1232'(6-**2&2#52(5$#(72'8(-#(.82257(.20&20$,-"#
1-.,2#(*"&(,72(9%-2,2&(,$'/2&:
Brungart et al.’01
Normal mix
“Pitchless”
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
1984: Perception-Inspired Separation• Model the periodicity information
in the auditory nerve
4
Weintraub 1985
Lyon 1984
Mix
Female
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
1996: An Academic Journey• Dan in 1996: The “weft”
5
5: Results 115
the energy visible in the fu ll signa l qu ite closely; note, however , the holes inthe weft envelopes, par t icu la r ly a round 300 Hz in the second weft ; a t thesepoin ts, the tota l signa l energy is fu lly expla ined by the background noiseelement , and, in the absence of st rong evidence for per iodic energy from theindividua l cor relogram channels, the weft in tensity for these channels hasbeen backed off to zero.
Figu re 5.7: The “bad dog” sound example, represen ted in the top panes by it st ime-frequency in tensity envelope and it s per iodogram (summaryautocor rela t ions for every t ime step). The noise and click elements expla in ing theexample a re displayed as their in tensity envelopes; weft elements addit iona llydisplay their per iod-t rack on axes match ing the per iodogram.
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
1999: “Size Matters”• All you need is a BDNN
• .. and the data (and patience) to train it
6
500
1000
2000
4000
9.25
18.5
37
74
32
34
36
38
40
42
44
Hidden layer / unitsTraining set / hours
WER
%
WER for PLP12N nets vs. net size & training data
1 2 5 10 20 50
178 GCUP
356 GCUP
712 GCUP
1.42 TCUP
2.8 TCUP
5.7 TCUP
11.4 TCUP
100 200 50032
34
36
38
40
42
44WER vs. frames/weight
WER
%
frames/weight
Ellis & Morgan’99
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
2001: Overlap Remains
• Meeting Recorder Project• natural speech interactions• ~10% of speech frames have overlaps
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
2003: EARS• “Pushing the envelope (aside)”
8
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
2004: Pitch Based Separation• Literal implementations of the process
described in Bregman 1990:• compute “regularity” cues:
- common onset- gradual change- harmonic patterns- common fate
9
Hu & Wang 2004
Original v3n7
Brown 1992
Ellis 1996
Hu & Wang 2004
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
2004: Model-Based Separation
• Data-driven separation• Learn codebooks for individual speakers• Find best combination of sources
• Pitch gives the “grist”
10
Roweis ’01Kristjansson, Attias, Hershey ’04
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
2006: Pitch for VAD• Pitch is the most robust perceptual cue to
speech
11
Lee & Ellis’06
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
2004-2011: The Epic
12
Processing and Perception of Speech and Music
Speech and Audio Signal Processing
Ben Gold Nelson Morgan Dan Ellis
S E C O N D E D I T I O N
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16
2012: Project Babel• Noisy speech is a challenge:
• How to disentangle speech and interference?• Energy peaks are speech (spectral subtraction)• Energy troughs are noise (Wiener, log-mmse)• Speech has a known form (Factorial HMM)• Voiced speech is periodic (Pitch-based)