Top Banner
Pitch Tracking + Prosody January 17, 2012
44

Pitch Tracking + Prosody

Jan 20, 2016

Download

Documents

nibaw

Pitch Tracking + Prosody. January 17, 2012. The Plan for Today. One announcement: On Thursday, we’ll meet in the Craigie Hall D 428 We’ll be working on intonation transcription… The plan for today: Wrap up A-to-D conversion Automatic Pitch Tracking (Brief) suprasegmentals review - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pitch Tracking + Prosody

Pitch Tracking + Prosody

January 17, 2012

Page 2: Pitch Tracking + Prosody

The Plan for Today• One announcement:

• On Thursday, we’ll meet in the Craigie Hall D 428

• We’ll be working on intonation transcription…

• The plan for today:

1. Wrap up A-to-D conversion

2. Automatic Pitch Tracking

3. (Brief) suprasegmentals review

4. The basics of English intonation

Page 3: Pitch Tracking + Prosody

Sample Size Demo

• 11k 16 bits

• 11k 8 bits

• 8k 16 bits

• 8k 8bits (telephone)

• Note: CDs sample at 44,100 Hz and have 16-bit quantization.

• Also check out bad and actedout examples in Praat.

• Also: look at Praat’s representation of a .sound file.

Page 4: Pitch Tracking + Prosody

Quantization Range• With 16-bit quantization, we can encode 65,536 different possible amplitude values.

• Remember that I(dB) = 10 * log10 (A2/r2)

• Substitute the max and min amplitude values for A and r, respectively, and we get:

• I(dB) = 10 * log10 (655362/12) = 96.3 dB

• Some newer machines have 24-bit quantization--

• = 16,777,216 possible amplitude values.

• I(dB) = 10 * log10 (167772162/12) = 144.5 dB

• This is bigger than the range of sounds we can listen to without damaging our hearing.

Page 5: Pitch Tracking + Prosody

Problem: Clipping• Clipping occurs when the pressure in the analog signal exceeds the sample size range in digitization

• Check out sylvester and normal in Praat.

Page 6: Pitch Tracking + Prosody

A Note on Formats• Digitized sound files come in different formats…

• .wav, .aiff, .au, etc.

• Lossless formats digitize sound in the way I’ve just described.

• They only differ in terms of “header” information and specified limits on file size, etc.

• Lossy formats use algorithms to condense the size of sound files

• …and the sound file loses information in the process.

• For instance: the .mp3 format primarily saves space by eliminating some very high frequency information.

• (which is hard for people to hear)

Page 7: Pitch Tracking + Prosody

AIFF vs. MP3

.aiff format

.mp3 format

(digitized at 128 kB/s)

• This trick can work pretty well…

Page 8: Pitch Tracking + Prosody

MP3 vs. MP3.mp3 format

(digitized at 128 kB/s)

.mp3 format

(digitized at 64 kB/s)

• .mp3 conversion can induce reverb artifacts, and also cut down on temporal resolution (among other things).

Page 9: Pitch Tracking + Prosody

Sound Digitization Summary• Samples are taken of an analog sound’s pressure value at a recurring sampling rate.

• This digitizes the time dimension in a waveform.

• The sampling frequency needs to be twice as high as any frequency components you want to capture in the signal.

• E.g., 44100 Hz for speech

• Quantization converts the amplitude value of each sample into a binary number in the computer.

• This digitizes the amplitude dimension in a waveform.

• Rounding off errors can lead to quantization noise.

• Excessive amplitude can lead to clipping errors.

Page 10: Pitch Tracking + Prosody

The Digitization of Pitch

• The blue line represents the fundamental frequency (F0) of the speaker’s voice.

• Also known as a pitch track

• How can we automatically “track” F0 in a sample of speech?

• Praat can give us a representation of speech that looks like:

Page 11: Pitch Tracking + Prosody

Pitch Tracking• Voicing:

• Air flow through vocal folds

• Rapid opening and closing due to Bernoulli Effect

• Each cycle sends an acoustic shockwave through the vocal tract

• …which takes the form of a complex wave.

• The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.

Page 12: Pitch Tracking + Prosody

Voicing Bars

Page 13: Pitch Tracking + Prosody

Voicing Bars

Individual glottal pulses

Page 14: Pitch Tracking + Prosody

Voicing = Complex Wave

• Note: voicing is not perfectly periodic.

• …always some random variation from one cycle to the next.

• How can we measure the fundamental frequency of a complex wave?

Page 15: Pitch Tracking + Prosody

• The basic idea: figure out the period between successive cycles of the complex wave.

• Fundamental frequency = 1 / period

duration = ???

Page 16: Pitch Tracking + Prosody

Measuring F0• To figure out where one cycle ends and the next

begins…

• The basic idea is to find how well successive “chunks” of a waveform match up with each other.

• One period = the length of the chunk that matches up best with the next chunk.

• Automatic Pitch Tracking parameters to think about:

1. Window size (i.e., chunk size)

2. Step size

3. Frequency range (= period range)

Page 17: Pitch Tracking + Prosody

Window (Chunk) Size

Here’s an example of a small window

Page 18: Pitch Tracking + Prosody

Window (Chunk) Size

Here’s an example of a large(r) window

Page 19: Pitch Tracking + Prosody

Initial window of the waveform is compared to another window (of the same duration) at a later point in the waveform

Page 20: Pitch Tracking + Prosody

Matching

The waveforms in the two windows are compared to see how well they match up.

Correlation = measure of how well the two windows match

???

Page 21: Pitch Tracking + Prosody

Autocorrelation• The measure of correlation =

• Sum of the point-by-point products of the two chunks.

• The technical name for this is autocorrelation…

• because two parts of the same wave are being matched up against each other.

• (“auto” = self)

Page 22: Pitch Tracking + Prosody

Autocorrelation Example• Ex: consider window x, with n samples…

• What’s its correlation with window y?

• (Note: window y must also have n samples)

• x1 = first sample of window x

• x2 = second sample of window x

• …

• xn = nth (final) sample of window x

• y1 = first sample of window y, etc.

• Correlation (R) = x1*y1 + x2* y2 + … + xn* yn

• The larger R is, the better the correlation.

Page 23: Pitch Tracking + Prosody

By the NumbersSample 1 2 3 4 5 6

x .8 .3 -.2 -.5 .4 .8

y -.3 -.1 .1 .3 .1 -.1

product -.24 -.03 -.02 -.15 .04 -.08

Sum of products = -.48

• These two chunks are poorly correlated with each other.

Page 24: Pitch Tracking + Prosody

By the Numbers, part 2Sample 1 2 3 4 5 6

x .8 .3 -.2 -.5 .4 .8

z .7 .4 -.1 -.4 .1 .4

product .56 .12 .02 .2 .04 .32

Sum of products = 1.26

• These two chunks are well correlated with each other.

(or at least better than the previous pair)

• Note: matching peaks count for more than matches close to 0.

Page 25: Pitch Tracking + Prosody

Back to (Digital) Reality

The waveforms in the two windows are compared to see how well they match up.

Correlation = measure of how well the two windows match

???

These two windows are poorly correlated

Page 26: Pitch Tracking + Prosody

Next: the pitch tracking algorithm moves further down the waveform and grabs a new window

Page 27: Pitch Tracking + Prosody

The distance the algorithm moves forward in the waveform is called the step size

“step”

Page 28: Pitch Tracking + Prosody

Matching, again

The next window gets compared to the original.

???

Page 29: Pitch Tracking + Prosody

Matching, again

The next window gets compared to the original.

???

These two windows are also poorly correlated

Page 30: Pitch Tracking + Prosody

The algorithm keeps chugging and, eventually…

another “step”

Page 31: Pitch Tracking + Prosody

Matching, again

The best match is found.

???

These two windows are highly correlated

Page 32: Pitch Tracking + Prosody

The fundamental period can be determined by the calculating the length of time between the start of window 1 and the start of (well correlated) window 2.

period

Page 33: Pitch Tracking + Prosody

period

• Frequency is 1 / period

• Q: How many possible periods does the algorithm need to check?

• Frequency range (default in Praat: 75 to 600 Hz)

Mopping up

Page 34: Pitch Tracking + Prosody

Moving on

• Another comparison window is selected and the whole process starts over again.

Page 35: Pitch Tracking + Prosody

*

**********************

*******************

*************

****** ********************

************* ************** ***********************

**********************

*********** ****************** *******

****************

F0 (Hz)

1 2 3 4 (s)

200300400

Time

would

Uhm

I

like

A flight to Seattle from Albuquerque

• The algorithm ultimately spits out a pitch track.

• This one shows you the F0 value at each step.

Thanks to Chilin Shih for making these materials available

Page 36: Pitch Tracking + Prosody

Pitch Tracking in Praat• Play with F0 range.

• Create Pitch Object.

• Also go To Manipulation…Pitch.

• Also check out:

Page 37: Pitch Tracking + Prosody

Summing Up• Pitch tracking uses three parameters

1. Window size

• Ensures reliability

• In Praat, the window size is always three times the longest possible period.

• E.g.: 3 X 1/75 = .04 sec.

2. Step size

• For temporal precision

3. Frequency range

• Reduces computational load

Page 38: Pitch Tracking + Prosody

Deep Thought Questions• What might happen if:

• The shortest period checked is longer than the fundamental period?

• AND two fundamental periods fit inside a window?

• Potential Problem #1: Pitch Halving

• The pitch tracker thinks the fundamental period is twice as long as it is in reality.

• It estimates F0 to be half of its actual value

Page 39: Pitch Tracking + Prosody

Pitch Halving

pitch is halvedCheck out normal file in Praat.

Page 40: Pitch Tracking + Prosody

More Deep Thoughts• What might happen if:

• The shortest period checked is less than half of the fundamental period?

• AND the second half of the fundamental cycle is very similar to the first?

• Potential Problem #2: Pitch doubling

• The pitch tracker thinks the fundamental period is half as long as it actually is.

• It estimates the F0 to be twice as high as it is in reality.

Page 41: Pitch Tracking + Prosody

Pitch Doubling

pitch is doubled

Page 42: Pitch Tracking + Prosody

Microperturbations• Another problem:

• Speech waveforms are partly shaped by the type of segment being produced.

• Pitch tracking can become erratic at the juncture of two segments.

• In particular:

• voiced to voiceless segments

• sonorants to obstruents

• These discontinuities in F0 are known as microperturbations.

• Also: transitions between modal and creaky voicing tend to be problematic.

Page 43: Pitch Tracking + Prosody

Back to Language• F0 is important because it can be used by languages to signal differences in meaning.

• Note:

• Acoustic = Fundamental Frequency

• Perceptual = Pitch

• Linguistic = Tone

Page 44: Pitch Tracking + Prosody

A Typology• F0 is generally used in three different ways in language:

1. Tone languages (Chinese, Navajo, Igbo)

• Lexically determined tone on every syllable

• “Syllable-based” tone languages

2. Accentual languages (Japanese, Swedish)

• The location of an accent in a particular word is lexically marked.

• “Word-based” tone languages

3. Stress languages (English, Russian)

• It’s complicated.