Top Banner
Lecture 7 Pitch Analysis
29

Lecture 7 - ece.rochester.edu

Oct 16, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 7 - ece.rochester.edu

Lecture 7

Pitch Analysis

Page 2: Lecture 7 - ece.rochester.edu

What is pitch?

• (ANSI) That attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high. Pitch depends mainly on the frequency content of the sound stimulus, but also depends on the sound pressure and waveform of the stimulus.

• (operational) A sound has a certain pitch if it can be reliably matched to a sine tone of a given frequency at 40 dB SPL.

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 2

Page 3: Lecture 7 - ece.rochester.edu

What is pitch?

• A perceptual attribute, so subjective

• Only defined for (quasi) harmonic sounds

– Harmonic sounds are periodic, and the period is 1/F0.

• Can be reliably matched to fundamental frequency (F0)

– In audio signal processing, people do not often discriminate pitch from F0

• F0 is a physical attribute, so objectiveECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 3

Page 4: Lecture 7 - ece.rochester.edu

Harmonic Sound

• A complex sound with strong sinusoidal components at integer multiples of a fundamental frequency. These components are called harmonics or overtones.

• Sine waves and harmonic sounds are the sounds that may give a perception of “pitch”.

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 4

Page 5: Lecture 7 - ece.rochester.edu

Classify Sounds by Harmonicity

• Sine wave

• Strongly harmonic

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

Oboe

Clarinet

5

Page 6: Lecture 7 - ece.rochester.edu

Classify Sounds by Harmonicity

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

• Somewhat harmonic (quasi-harmonic)

Marimba

0 5 10 15 20 25-0.2

-0.1

0

0.1

0.2

0.3

Time (ms)

Am

plit

ude

0 1000 2000 3000 4000 5000-40

-20

0

20

40

Frequency (Hz)

Magnitude (

dB

) Human voice

6

Page 7: Lecture 7 - ece.rochester.edu

Classify Sounds by Harmonicity

• Inharmonic

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

Gong

7

Page 8: Lecture 7 - ece.rochester.edu

How do we perceive pitch?

• Complex tones

– Strongest frequency?

– Lowest frequency?

– Greatest common divisor of the harmonics?

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

Oboe

8

Page 9: Lecture 7 - ece.rochester.edu

The Missing Fundamental

Fre

quency

(lin

ear)

Time

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 9

Page 10: Lecture 7 - ece.rochester.edu

Another Example

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

300Hz300:200:2100 Hz

300 500 700 900 110013001500170019002100

200Hz

Freq (Hz)

Amp

10

Page 11: Lecture 7 - ece.rochester.edu

Pitch perception is affected by

• The loudest frequency component

• The greatest common divisor of partials

• The regular frequency space between partials

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 11

Page 12: Lecture 7 - ece.rochester.edu

Shepard Tones

Continuous Risset scale

Barber’s pole

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 12

Page 13: Lecture 7 - ece.rochester.edu

• Well defined for harmonic and quasi-harmonic sounds

• Estimate the fundamental frequency in each frame of the signal

• Quick facts

– Human speech: from 40 Hz for low-pitched male to 600 Hz for children or high-pitched female

– Piano: 27 Hz – 4,186 Hz

– Human hearing range: 20 Hz – 20,000 Hz

Pitch Detection

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 13

Page 14: Lecture 7 - ece.rochester.edu

Why is pitch detection important?

• Harmonic sounds are ubiquitous

– Music, speech, bird singing

• Pitch (F0) is an important attribute of harmonic sounds, and it relates to other properties

– Music melody key, scale (e.g., chromatic, diatonic,

pentatonic), style, emotion, etc.

– Speech intonation word disambiguation (for tonal

language), statement/question, emotion, etc.

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

What scales are used? What emotion?

14

Page 15: Lecture 7 - ece.rochester.edu

General Process of Pitch Detection

• Segment audio into time frames

– Pitch changes over time

• Detect pitch (if any) in each frame

– Need to detect whether the frame contains pitch or not

• Post processing to consider context info

– Pitch contours are often continuous

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 15

Page 16: Lecture 7 - ece.rochester.edu

An Example

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 16

Page 17: Lecture 7 - ece.rochester.edu

How long should the frame be?

• Too long:

– Contains multiple pitches (low time resolution)

• Too short

– Can’t obtain reliable detection (low freq resolution)

– Should be longer than 3 periods of the signal

– For speech or music, how long should the frame be?

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

0.74 0.745 0.75 0.755 0.76 0.765 0.77 0.775 0.78

-0.2

-0.1

0

0.1

0.2

Time (s)

Ampli

tude

waveform

3 periods

17

Page 18: Lecture 7 - ece.rochester.edu

Pitch-related Properties

• Time domain signal is periodic.

– F0 = 1/period

• Spectral peaks have harmonic relations.

– F0 is the greatest common divisor.

• Spectral peaks are equally spaced.

– F0 is the frequency gap.

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 18

Page 19: Lecture 7 - ece.rochester.edu

Pitch Detection Methods

• Time domain signal is periodic.

– F0 = 1/period

• Spectral peaks have harmonic relations.

– F0 is the greatest common divisor.

• Spectral peaks are equally spaced.

– F0 is the frequency gap.

• Time domain

– Detect period

• Frequency domain

– Detect the divisor

• Cepstrum domain

– Detect the gap

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 19

Page 20: Lecture 7 - ece.rochester.edu

• Autocorrelation

– Basis: the time-domain signal is periodic

– A periodic signal correlates strongly with itself when offset by the fundamental period.

– Autocorrelation shows peaks at multiples of pitch period.

Pitch detection in time domain

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 20

Page 21: Lecture 7 - ece.rochester.edu

Pitch detection in frequency domain

• Calculate the magnitude spectrum

• For each pitch hypothesis, calculate salience by

– Counting the number of peaks located at its harmonic positions, or

– Summing spectral energy at harmonic positions, or

– ……

• Choose the hypothesis with the highest salience

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 21

Page 22: Lecture 7 - ece.rochester.edu

Pitch detection in cepstrum domain

• Idea: find the frequency gap between adjacent spectral peaks

– The log-amplitude spectrum looks pretty periodic

– The gap can be viewed as the period of the spectrum

– How to find the period then?

– Cepstrum’s idea: Fourier transform!

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 22

Page 23: Lecture 7 - ece.rochester.edu

Pitch detection in cepstrum domain

• Cepstrum = |IFT{log|FT(X)|}|

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 23

Page 24: Lecture 7 - ece.rochester.edu

How to evaluate pitch detection?

• Choose some recordings (speech, music)

• Get ground-truth

– Listen to the signal and inspect the spectrum to manually annotate (time consuming!)

– Automatic annotation using simultaneously recorded laryngograph signals for speech (not quite reliable!)

• Pitched/non-pitched classification error

• Calculate the difference between estimated pitch and ground-truth

– Threshold for speech: 10% or 20% in Hz

– Threshold for music: 1 quarter-tone (about 3% in Hz)

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 24

Page 25: Lecture 7 - ece.rochester.edu

Different Methods vs. Ground-truth

frame 65frame 25

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 25

Page 26: Lecture 7 - ece.rochester.edu

0 500 1000 1500 2000 2500 3000-20

-10

0

10

20

30

40

Frequency (Hz)

Lo

g M

ag

nitu

de

(d

B)

• Has clear harmonic patterns

• Different methods give close results, and consistent to the ground-truth 196 Hz.

Frame 65 – Pitched (Voiced)

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 26

Page 27: Lecture 7 - ece.rochester.edu

• No clear harmonic patterns

• Different methods give inconsistent results.

Frame 25 – Non-pitched (Unvoiced)

0 500 1000 1500 2000 2500 3000-20

-10

0

10

20

30

40

Frequency (Hz)

Lo

g M

ag

nitu

de

(d

B)

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 27

Page 28: Lecture 7 - ece.rochester.edu

Pitch Detection with Noise

• Can we still hear pitch if there is some background noise, say in a restaurant?

• Will pitch detection algorithms still work?

• Which domain is less sensitive to what kind of noise?

• How to improve pitch detection in noisy environments?

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018

Violin + babble noise

28

Page 29: Lecture 7 - ece.rochester.edu

Summary

• Pitch detection is important for many tasks

– Time domain: find the period of waveform

– Frequency domain: find the divisor of peaks

– Cepstrum domain: find the frequency gap between spectral peaks

• Single pitch detection is mature in noiseless conditions.

• Pitch detection in noisy environments (also called robust pitch detection, noise-resilient pitch detection) is an active research topic.

– BaNa [Yang et al., 2014]; PEFAC [Gonzales & Brookes, 2014];

• Multi-pitch Estimation is extremely challenging!

ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 29