Top Banner
 Frequency-Domain Speech Analysis Short-Time Fourier Analysis Windowed (short-time) Fourier Transform Spectrogram of speech signals Filter bank implementation* Cepstral Analysis (Real) cepstrum and complex cepstrum Complex cepstrum for speech  Pitch detection  Echo hiding 
50

FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

May 22, 2018

Download

Documents

lynhan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Frequency­Domain Speech Analysis

Short­Time Fourier Analysis Windowed (short­time) Fourier Transform Spectrogram of speech signals Filter bank implementation*

Cepstral Analysis (Real) cepstrum and complex cepstrum Complex cepstrum for speech  Pitch detection 

Echo hiding 

Page 2: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Fourier Transform

Joseph Fourier(1768­1830) 

F w =∫−∞

∞f t e− jwt

f t =12π

∫−∞

∞F w e jwtdw

It’s a simple and powerful idea:Can any signal be represented by linear combination of sines and cosines?

Page 3: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Deep Impact of FT

A versatile tool for solving many problems in science and engineering Mathematics: functional/harmonic analysis Physics: thermodynamics, Fourier optics Astronomy: radar imaging, FT Spectrometer  Biomedical engineering: MRI, FT infrared 

spectrography  Electrical engineering: frequency­domain 

analysis, wireless communication, signal processing

Page 4: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Spectral Analysis

http://130.191.21.201/multimedia/jiracek/dga/spectralanalysis/examples.html

Page 5: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Discrete Fourier Transform (DFT)

X e jw=∑

−∞

x n e− jwn

Fourier Transform of a Sequence

w=2kπN

X k =∑n=0

N−1

x n e− j 2kπ

Nn

DFT

Page 6: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Properties of DFT

Linearity Periodicity

Shift 

Convolution

Page 7: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

DFT Under MATLAB

>X=fft(x) For a length­N real signal x, output X will be 

length­N complex sequence with low­frequency clustered around 1 and N

X=fftshift(fft(x)) will put low­frequency to the center instead of boundary of X.

>X=fft(x,N) pad with zeros if x has less than N points and 

truncated if it has more

Page 8: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

DFT of Simple Waveforms

Time Frequency

Page 9: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Time­Frequency Localization*

Time

FrequencyHeisenberg Box 

 Heisenberg’s uncertainty  principle

Page 10: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Implication into Signal Analysis

You CANNOT arbitrarily improve both the resolution of time analysis and frequency analysis. 

Time­domain representation Frequency­domain representation

FT

FT

How do we define “Instantaneous  frequency”? 

Page 11: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Speech Signal Analysis Why (long­term) FT is not appropriate for speech 

signals?  FT is the ideal tool for analyzing periodic or stationary 

signals – frequency domain representation greatly helps the analysis

Like many other phenomena we observe in the natural worlds, speeches are transient or nonstationary signals whose properties change markedly as a function of time

Due to Heisenberg’s uncertainty principle, we can only find some compromised solution between time and frequency localization

Page 12: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Short­Time (Windowed) FT(Normal) Fourier Transform of a Sequence

Time­Dependent Fourier Transform

X e jw=∑

−∞

x n e− jwn

Xn ejw=∑

−∞

wn−m x m e− jwm

window (typically Hamming)time frequency

Page 13: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Definition of Spectrogram (Sonogram) Windowed speech

window

speech

time freq.

Page 14: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Interpretation of Spectrogram

Time

Frequency

STFT at time nn

Evolution ofST spectrumat frequency walong time

w

Page 15: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

History of Sonagraph (Visual Speech) Spectrogram has been 

used in almost every phase of speech research for over 70 years (DSP has been around for 57 years)

Before DSP, a device called sound spectrograph (also called wave analyzer) was widely used  

Page 16: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Example of Spectrogram: Chirps

Chirps are analytic signals which have a particular instantaneous frequency 

Page 17: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Example of Speech Spectrogram

Page 18: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Spectrogram Reading

We will use MATLAB demo to test your spectrogram reading capability

Page 19: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Spectrogram via Filter Bank*

Filter Bank

Page 20: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Spectrogram Calculation Under MATLABMethod 1: Use COLEA toolbox (it has a nice GUI)

Method 2: Use the demo program on the right

Cut­and­paste it and saveIt as specgram_demo.m

>[x,fs]=wavread(filename);>specgram_demo(x,fs);

% function specgram_demo(y,fs)% display the spectrogram of speech signal% demo for EE493Q Fall 2006

function specgram_demo(y,fs)

% calculate the table of amplitudes[B,f,t]=specgram(y,1024,fs,256,192);% calculate amplitude 50dB down from maximumbmin=max(max(abs(B)))/300;% plot top 50dB as imageimagesc(t,f,20*log10(max(abs(B),bmin)/bmin));% label plotaxis xy;xlabel('Time (s)');ylabel('Frequency (Hz)');% build and use a grey scalelgrays=zeros(100,3);for i=1:100    lgrays(i,:) = 1­i/100;endcolormap(hot);

Page 21: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

/Hello/

Page 22: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

/Don’t ask me/

Page 23: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Impact of Window on Spectrogram

Page 24: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Frequency­Domain Speech Analysis

Short­Time Fourier Analysis Windowed (short­time) Fourier Transform Spectrogram of speech signals Filter bank implementation*

Cepstral Analysis (Real) cepstrum and complex cepstrum Complex cepstrum for speech  pitch detection 

Echo hiding 

Page 25: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

From Spectrum to Cepstrum Recall fundamental assumption about 

speech signal: speech can be represented as the output of a linear filtering system whose excitation and system response vary slowly with time To separate excitation e(n) from the system 

response h(n), we need to perform some kind of deconvolution in the frequency domain: X(w)=E(w)H(w)

Multiplication is not as easy as addition to deal with; can we convert product into sum?

Page 26: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

The Power of Logarithm

log()X e jw =Ee jw He jw X e jw = Ee jw He jw

Note on Complex Logarithm

Since X(w) (FT of x(n)) is typically a complex signal, we need to definecomplex logarithm as follows (i.e., take the logarithm of magnitude)

X e jw =log[ X e jw ]=log∣X e jw ∣ jarg [X e jw ]

Page 27: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Complex Cepstrum

Page 28: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Phase Unwrapping Problem*

Note that 

Page 29: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

(Real) Cepstrum

cn =12π∫−π

πlog∣X e jw ∣e jwndw

F() Logmagnitude F­1()x(n) X(ejw) log|X(ejw)| c(n)

Can show that cepstrum c(n) is the even part of complex cepstrum, i.e.

cn =12[ x n x −n ]

Hint: |X(w)| and arg[X(w)] are even and odd functions respectively 

Page 30: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Complex Cepstrum Examplex n =δ n aδ n−N ,0a1

X e jw =1ae− jwN

X e jw =log[ X e jw ]=log[1ae− jwN]

¿∑n=1

∞ −1 n1

nane− jwnN

x n =∑k=1

∞ −1 k1

kδn−kN

Conclusion: the complex cepstrum of a train of uniformly spaced impulsesis also a uniformly spaced impulse train with the same spacing

Page 31: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Complex Cepstrum of Speech

Model of speech  Voiced speech is produced by quasi­periodic 

pulse train exciting slowly time varying linear system, i.e. e(n)=p(n)

Unvoiced speech is produced by random noise exciting slowly time varying linear system, i.e. e(n)=u(n)

Glottis, vocal tract and radiation at the lip can all be modeled by slowly time varying linear systems

Page 32: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Full Model for Voiced Speech

Page 33: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Look Insidetransform function for voiced speech

Glottal pulse model

Vocal tract model

Radiation model

Note: the combination of glottal pulse, vocal tract and radiation will correspond to low­time part of cepstrum (around the origin)

Page 34: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Complex Cepstrum for Voiced Speech

ARG(X(ejw))

arg(X(ejw))

Page 35: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Full Model for Unvoiced Speech

Transfer function for unvoiced speech

Note the two differences from voiced speech: 2) Excitation is no longer an impulse train but random noise3) No glottal pulse model is involved

Page 36: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Complex Cepstrum for Unvoiced Speech

Page 37: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Pitch Detection in Cepstrum Domain

Page 38: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Short­Time Fourier Analysis Windowed (short­time) Fourier Transform Spectrogram of speech signals Filter bank implementation*

Cepstral Analysis (Real) cepstrum and complex cepstrum Complex cepstrum for speech  pitch detection 

Echo hiding 

Frequency­Domain Speech Analysis

Page 39: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Background on Music Piracy

An estimate of $4.3 billion loss each year due to piracy of digital music content

That is what the whole napster story about – no more free music

Watermarking was proposed as one possible technical solution to copyright protection but its future remains uncertainty

Page 40: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

What is Watermarking?

Page 41: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Echo Hiding

Proposed by Gruhl, Bender and Lu of MIT Media Lab in 1999  Since then, various audio data hiding techniques have 

been proposed   Basic idea is to exploit the masking property of human 

auditory system – when an echoed signal is placed close to the host signal, it is inaudible to human ears but detectable by machine

Decoding is done in cepstrum domain That is why we are interested in it here! 

Page 42: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Demo of Speech with Echoes 

Original speech

Modified speech with severe echo

Modified speech with slight echo

Modified speech with five echoes

Conclusion: as long as echoes are inserted to the right place and 

Page 43: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

How to Hide One Bit?

Page 44: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Toy Example

Page 45: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Encoding Multiple Bits

Page 46: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Encoding Multiple Bits (Con’t)

Page 47: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Cepstrum Decoding

Page 48: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Decoding Examples

Bit “1”

Bit “0”

Page 49: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

Echo Hiding Summary

Information is embedded by adding echoes located at different positions

By controlling the amplitude and distance of echoes, we can achieve perceptual transparency

Embedded information can be extracted by detecting echoes in the cepstrum domain

The down side of this approach is lack of security – i.e., hacker can easily remove echoes or make it undetectable

Page 50: FrequencyDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis2.pdf · FrequencyDomain Speech Analysis ShortTime Fourier Analysis Windowed

 

One­Minute Survey

What is the muddiest point in this week’s lecture?

What is the difference between short­time FT and FT?

What is the use of cepstrum in echo hiding applications?