EE 225D LECTURE ON PITCH DETECTION AND VOCODERS N.MORGAN / B.GOLD LECTURE 24 23.1 University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Pitch Detection &Vocoders Lecture 24
33
Embed
EE225D Spring,1999 Pitch Detection &Vocoders · but we saw in Chapter 16 that pitch would be “perceived” even if the stimulus was a harmonic for that frequency. (example - shift
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.1
PE Spring,1999
25D LEC
ORGAN / B.GOLD LECTURE 24
University of CaliforniaBerkeley
College of EngineeringDepartment of Electrical Engineering
and Computer Sciences
rofessors : N.Morgan / B.GoldE225D
Pitch Detection &Vocoders
Lecture 24
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.2
)
resentation?
e] and
25D LEC
ORGAN / B.GOLD LECTURE 24
Major Question
How to make a “Perfect” Vocoder? (Can it be done?
What limitations are encountered for low bit rate rep
Today’s Topic
Traditional 2400bps systems [or at least in that rang
Pitch &Voicing detection.
NEXT
Very low rate systems [600bps]
Higher quality more rubust systems at 5-30Kbps
NEXT
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.3
resu lt
the sam e stim u lus.
n ta l frequency estim ato r”
ived” even if
ve descrip tion
25D LEC
ORGAN / B.GOLD LECTURE 24
D ifficu lties E ncountered in P itch D etection
* Purpose o f p itch detection is to au tom atica lly obta in a
that is in agreem ent w ith a psychoacoustic resu lt for
* E arly researchers p referred to use the term “ fundam e
but w e saw in C hap ter 16 that p itch w ou ld be “perce
the stim u lus w as a harm on ic for that frequency.
(exam p le - sh ift o f v irtua l p itch )
* W hat w e’re really after is the N AT U R E and quantita ti
A nd a lso to m ake a vocoder sound natu ra l.
o f the exc ita tion function.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.4
ive.
bove.
nalysis d iff icu lt.
s
m e adu lts ch ildren 16 :1 range
ting track ing the period .
l tract constric tion .
25D LEC
ORGAN / B.GOLD LECTURE 24
3 . R epresen tation o f the transien t exc ita tion du ring p los
4 . R epresen tation o f the no ise for a w h ispered vow el
5 . R epresen tations o f various com b itnations o f a ll the a
E xpam p les o f speech w aveform s that m akes the above a
* D ynam ic range of quasi-period ic vocal co rd v ib ration
as low as 50H z fo r soas h igh as 800H z fo r
* R ap id varia tion in g lo tta l period* S udden change in vocal treat shape [e.g . nasal]* Transition from unvoiced to vo iced .* E nv ironm enta l transm ission p rob lem s.
* T h is m eans: 1 . D etection o f the tim e w hen the vocal cords are v ibra
in a [perhaps rap id ly vary ing ] quasi-period ic w ay and
2 . R epresen tation o f the fr ic tion no ise caused by a voca
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.5
W ted as the
lter function .
* and E ω( )
ned as fo llow s.
*may be different.
* are time-varying.
25D LEC
ORGAN / B.GOLD LECTURE 24
ith linear assum ptions, the speech w ave can be rep resen
C onvo lu tion of an exc ita tion function w ith a vocal tract f i
In spectra l term s : S ω( ) E ω( )H ω( )=
In a channel vocoder analyzer, m easurem ents , H ω( )S ω( )
are N O T com puted separate ly.
* In the channel vocoder syn thesizer, the spectrum is ob ta i
If E ω( ) is a FLAT SPECTRUM , S ω( ) S ω( ),≅ although the phases
The situation is complicated by the fact that E ω( ) and H ω( )
W hite N o ise
Pu lse G enerato rR outer
e t( )
S ω( ) E ω( )H ω( )=
G enerator
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.6
tion s ignal.
n
th
sate by being
)
25D LEC
ORGAN / B.GOLD LECTURE 24
In L PC , w e start o ff w ith
change o f nom enclatu re is the m odel o f the speech excita
L PC derives an all-pole m odel
It would be nice if Hˆ ω( ) was really a good representatio
Speech can be perfectly reconstructed by carvolving h n( ) wi
the error signal e n( ).
correspondingly different than Ex ω( ) .
if H ω( ) differs greatly from H ω( ), E ω( ) will compen
s n( ) e n( ) h n( )×=
S ω( ) E ω( ) H ω( )⋅ Ex ω( ) H ω(⋅= =
S n( ) Ex ω( ) H ω( )⋅=s ω( ) ex n( ) h n( )⋅=
ex n( )
of H ω( ) , the real vocal tract function.
H ω( ) h n( )→
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.7
f ilter separation
tra l dom ain ,
y, the m odel assum es
on , the exc ita tion
nted and then
ow er
eparation .
25D LEC
ORGAN / B.GOLD LECTURE 24
H om om orph ic analys is has the hypo thesis that source -
T he m odel a lso assum es that these are m u ltip lied in the spec
so that tak ing the log turns the product in to a sum . F inall
that the tw o are separab le w ith lif tering. G iven th is separati
function and the vocal tract f ilter function can be represe
C onvo lved to g ive the syn thesized speech.
H ω( )
* M any L P C system s [m u lti pu lse, ce lp , e tc .] derive their p
by search ing for an erro r s ignal that com pensates for .
is m anifested as spectrum envelope - spectra l f ine structu re s
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.8
), a ll system s re lay on the
b le period pu lse source
c tra and take few b its
0bpss
25D LEC
ORGAN / B.GOLD LECTURE 24
In o rder to ach ieve low transm ission rates (e.g . 2400bps
excita tion m odel consisting o f a no ise source and a varia
- B oth sources are reasonab le approx im ations to f la t spe
buzz-h iss sw itch - 1b it every 10m sec. 10pitch tracker - 6b its every 10m sec. 600bp
to transm it.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.9
M resent, F u ture .
cam e a standard.
e quality at 2400 bps.
o liferate,
nger the so le criterian .
tradeo ff.
m ethodsPC M ,A D PC Mictive ab ility
2
25D LEC
ORGAN / B.GOLD LECTURE 24
a jor M otiva tion for D orry R esearch on Vocoders : P ast, P
P ast - S ecrecy - W W II - D ata rates w ere lim ited . 2400bps be
N early a ll fund ing cam e from D O D to try to im prov
P resen t - M odem s are m uch better. A s cellu la r phones p r
date rate l im itations sti ll app ly but 2400bps is no lo
M ain d irection is sti ll quality (robustness) - b it ra te
F uture - G reater robustness - e ffic ient sto rage o f
speech (and m usic) - coding -recognition tie-in .
Tw o S ides of the C o in
B asic M odels fo r A nalysis & S yn thesis
channelLP CH om om orph ic
Wave fo rmPC M ,A
S om e p red
1
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.10
ction and
System
l Tract
menting a
.
s.
25D LEC
ORGAN / B.GOLD LECTURE 24
Complete Channel Vocoder
* Synthetic speech is the convolution of an excitation fun
a vocal tract filter function.
* Assumption : Synthesizer is a Time variable Linear
If this assumption was wrong and excitation and Voca
Interacted in some Non-Linear Way, problem of imple
“transparent” system probably becames intractable
Working Hypothesis for 2400bps (and lower) system
Excitation is either buss [variable pulse generator]
Remember basic assumption for all vocoders.
or hiss [white noise generator]
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.11
hat tracks perfectly and
ator at the synthesizer is
ents.
pectral distortionintroduced bypitch jitter.
envelope caused by jitter
ω
25D LEC
ORGAN / B.GOLD LECTURE 24
Now, assume that you have built a great pitch detector t
records
Now, this information is transmitted and the buzz gener
forced to produce pulses based on the above measurem
Most of the time,
T1 T2 T3 etc., , ,
Spitch dose NOT behave this badly.
Consider the spectrum of a jittered pulse train.
E ω( )T4T1 T3T2
time
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.12
20msec.. Analyzer
a period of 20msec.
not as bad as
bove E ω( ) and H ω( ).
ω) E ω( ) H ω( )⋅=
Spectral distortionintroduced bypitch jitter.
25D LEC
ORGAN / B.GOLD LECTURE 24
In real life, lets assume that analysis takes place every
generates a single pitch number, so at synthesizer, for
actual excitation during voicing. ~
20ms. 20msec.
time
time
T1 T2 T3 T4
S ω( ) is the product of the a
S(
e t( ) s t( ) S ω( ) E2 ω( ) H ω( )⋅=
at the synthesizer
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.13
oise.
e
ls
Synthetic speech
) H ω( )⋅
with Spectrally flattened
excitation.
25D LEC
ORGAN / B.GOLD LECTURE 24
Spectral Flattering
Turn the excitation signal into a white signal or white n
xcitation
Modulation signa
Model of S ω( ) E ω(=
+
B.P.Filter
Hard Limitor
B.P.Filter
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.14
elope function
ctrally flattered.
rmine perceived quality.
spectral flattering
sizer.
ants”.
nvelope function?
25D LEC
ORGAN / B.GOLD LECTURE 24
Major Question
Does all-pole synthesizer model the Vocal tract env
- if the former is true, excitation should NOT be spe
- if the latter is true, spectral flattering may help.
* Joe Tierrey and I did an informal experiment to dete
The result was ambiguous.
* In general, existing LPC systems (low rate) do NOT use
It may depend on the ORDER of the pedictor & synthe
a 10th order predictor correspends to five “form
or the complete speech e
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.15
el vocoders &LPC.
hink] been tried but
PC).
pe could be completely
e
lope
ght to be perfect.
25D LEC
ORGAN / B.GOLD LECTURE 24
Homomorphic Vocoder
* Excitation is modelled in the same way as for chann
Spectral flattering of the exciation signal has never [I t
it should work ( in the same ballpark as channel &L
Point C is Cepstrum.
Vocal tract
Low time High time cepstrumNote - if excitation and envelo
tim
High time comjienent of enve
Envelope
Speech Spectrum Envelope
or
cepstrumseparated, synthesis ou
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.16
ion Spectral
Pattern
e Function
Recognition for Pitch
25D LEC
ORGAN / B.GOLD LECTURE 24
512-Point Log Separat
Excitation
FFT Magnitude in Tim512-Point
FFT
(a) (b) (c) (d)
Figure 20.1 : Cepstral analysis.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.17
orrelation Function
ed Speech.
ections with 15ms
25D LEC
ORGAN / B.GOLD LECTURE 24
TIM
E
L A G 15m s
Figure 30.8 : Autoc
of Spectrally Flatten
Successive 30ms s
overlap.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.18
O u tpu tΣ
n An⁄
1 A1⁄
peech S ignal.
F la ttened
O rig inal
25D LEC
ORGAN / B.GOLD LECTURE 24
Input
B PF
FW R Sm ooth
D elay ÷F1
Fn
S1
Sn
A1
An
Cn S=
C1 S=
B PF
FW R Sm ooth
D elay ÷
F igure 30.7 : Spectra l F la tten ing and its E ffect on the S
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.19
ntial Decayime)
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.3 Extraction of the Period
Variable Blanking Time Variable Expone(Rundown T
Firings
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.20
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.6 : Low-Pass filtered speech signal.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.21
h Detection.
ge of Pitch
ns in Time
ariations in Time
iced Transition
peech
ise Background
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.4 : Six Examples of Difficulties in Pitc
Dynamic Ran
Pitch Variatio
Vocal Tract V
Voiced-Unvo
Telephone S
Acoustic No
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.22
D etection.
25D LEC
ORGAN / B.GOLD LECTURE 24
F igu re 30 .10 : C epstra l A nalys is for P itch
40dB
time
time
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.23
el.
h
S
Pitch Detection
Stage 2
Based on Correlation
with Reference Patterns
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 16.9 : Block Diagram of the Periodicity Mod
Figure 16.10 : Block Diagram of the Place Model.
SignalG lobal
Pitc
Filter Bank Elementary
ignal 4096-PT Detection
Pitch Detection
Stage 1
F1
F2
FM
EPD1
EPD2
EPDM
Pitch Detectors
P itch D etec tion A lgo r ithm
FFT Spectral Peaks
Based on Separation
Between Peaks
of
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.24
inner
Freuency
p5 p6 Quanta
ude
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.13 :
Frequency
1050Hz
W
p1
p1
p1 p1p2
p2
p2
p2
p2
p3
p3
p3 p3
p3
p3
p3 p3
p4
p5
p5
p6
p1 p2 p3 p4
p n( )
Spectral Magnit
12
345 67
Armonic Pitch Detection Algorithm
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.25
ndidate (P.C.) 1
P.C. 2frequency (Hz)
P.C. 3
P.C. n
lgorithm
ctrum
25D LEC
ORGAN / B.GOLD LECTURE 24
Pitch Ca
Figure 30.14 : Goldstein-Duifhuis Optimum Processor A
Spe
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.26
sizer
Magnitude
Signals
Signals
Encode
Encode
Signals
TRANSMIT
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.2 : Channel Vocoder Analyzer and Synthe
V/UV
PitchPitch
Voicing
Detector
Detector
Bandpass Magnitude
Lowpass DecimateFilter N Filter N
x n( )
Bandpass Magnitude
Lowpass DecimateFilter 1 Filter 1
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.27
sizer
Vocoder
Output+
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.2 : Channel Vocoder Analyzer and Synthe
Switch
Pulse Generator Noise Generator
Bandpass Filter 1
Bandpass Filter 2
Bandpass Filter N
RECEIVE
V/UV Signals
Pitch Signals
Magnitude Signals
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.28
Estimate
Half-Wave Rectifier.
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.4 : Effect of Pitch Ripple in a Spectral
Figure 31.3 : Example of Energy Measurement With a