University of Sciences and Technologies of Hanoi ICT Department GROUP PROJECT REPORT Pitch detection algorithms and application in musical key detection Group members NGUYEN Dang Hoa USTHBI4-055 NGUYEN Gia Khang USTHBI4-072 NGUYEN Thi Thu Linh USTHBI4-085 NGUYEN Duc Thang USTHBI4-139 NGUYEN Minh Tuan USTHBI4-155 Supervisor Dr. TRAN Hoang Tung University of Science and Technology of Hanoi February, 2016
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
and state of the art) provides a literature review on PDAs and basic music theories. Section
4 (Scientific methods and materials) describes the tools and step-by-step methods used dur-
ing the project. Section 5 (Results and discussion) explains the results obtained from our
implementations and our comments on them, and Section 6 concludes the report.
2 Project management status
From the start of this project, our group and the supervisor held weekly meetings in ICT Lab
to discuss the project goals and overall progress. Initially, as we had no prior experience with
mobile programming, work division between the five members was 3 on literature review and
2 on the development of an Android application. We aimed for a pitch detection application
Page 4 of 21
GROUP PROJECT REPORT Pitch detection algorithms
at first, but decided to expand into a wider scope after doing additional research on music
processing and the need of a related application. Adjustments were made on the way, and
eventually the group set down to a key detection application, since it is possible to develop
with our knowledge then and we would have difficulties trying to achieve a more complex
objective.
Details on the tasks and achievements are described in Table 2.1.
Task In charge Outcome
General research on digital sound
processing
Everyone Basic comprehension of digital sig-
nals, sampling, filtering, etc.
Develop basic Android application
for sound recording
Thang,
Tuan
Runnable application
Research on pitch detection algo-
rithms
Khang,
Linh, Hoa
Proposed three suitable algorithms
In-depth research on proposed al-
gorithms and key detection
Khang,
Linh, Hoa
Pseudocodes and MATLAB tests
Implement proposed algorithms on
JAVA
Thang,
Tuan
Done
Research on musical key detection Hoa, Linh Proposed a method to detect keys
Develop Android application for
key detection with user interface
Thang,
Tuan
Runnable application with simple
GUI
Putting the report together Linh,
Khang
Done
Table 2.1. Project management and progress.
Page 5 of 21
GROUP PROJECT REPORT Pitch detection algorithms
3 Theoretical background and state of the art
In this section, we provide an overview of the types of PDA chosen for investigation, their
characteristics, basic knowledge on music theories and some most prominent research regard-
ing these matters.
3.1 Pitch detection algorithms
Accurate and reliable pitch measurement is often extremely difficult for many reasons: the
voice sequence is often not a perfect train of periodic pulses, one sequence can be composed
from a variety of PPs which are hard to separate, etc. Therefore, it is necessary to have a
grasp on current studies so we can use them into this project.
PDAs are most commonly classified in three categories: time-domain, frequency-domain
or hybrid. Time-domain methods run directly on the speech waveform, frequency-domain
methods take advantage of the impulse series that arise in the frequency spectrum, and
hybrid ones incorporates properties of both domains. From each category, we picked one
signature PDA as follows:
• YIN Estimator (Time-domain): Autocorrelation method, a prominent representative
in this category, attempts to find PP by evaluating primary peaks of the input’s au-
tocorrelation. It is good with mid to low frequencies, but makes too many errors in
various applications. YIN estimator - developed by De Cheveigne and Kawahara in
the early 2000s - is based on the basic principles of autocorrelation method but with
several modifications in order to solve the problem. It minimizes the difference between
the input and its delayed copy, thus reduce the errors. Cheveigne and Kawahara have
theorized that YIN can be implemented efficiently with low latency, and has no upper
boundary in the pitch search range.
• Cepstrum Analysis (Frequency-domain): Cepstrum - a word play on spectrum first
defined by Bogert et al in a 1963 paper [1] - is essentially the inverse discrete Fourier
transform (IDFT) of the log magnitude of the spectrum of a signal. In 1967, Schroeder
and Noll proposed an application of cepstrum analysis in pitch detection, which is
based on the fact that the Fourier transform of a signal usually has regular peaks
Page 6 of 21
GROUP PROJECT REPORT Pitch detection algorithms
representing its harmonic spectrum [3]. Taking the cepstrum of a signal eliminates
those peaks, thus remove the effects of overtones in human voice and make it much
easier to define the pitch.
• Simplified Inverse Filter Tracking (Hybrid): This was first proposed by Markel in 1972
[5] as a simple algorithm, possible to be realized in real time yet covered the positive
traits of both autocorrelation and cepstral methods. This algorithm suggests fast
runtime with a composition of elementary computations, while also offers to classify
between voiced/unvoiced regions of an input. Its core operations were based on a
simplified version of digital inverse filtering, hence the name “Simplified Inverse Filter
Tracking” (from here on referred to as SIFT).
3.2 Musical key detection
In music, the term note is used to specify frequencies within a certain range of pitch which
human ear has similar perception and can hardly distinguish. Any two notes whose ratio is
a power of two are grouped into a pitch class. Generally, we divide pitches into 12 classes:
C, C](D[), D, D](E[), E, F, F](G[), G, G](A[), A, A](B[), and B. In each pitch class, notes
can be distinguished by adding a number after the notation of its class name. For instance,
C3 has lower frequency than C4, C5 and so on.
A piece of music is an ordered set of notes. However, in order to create a good music, this
set is often limited to less than twelve pitch classes. In most cases, this number is around
seven. These specific classes in the song, which can be denoted as its scale, forms an abstract
concept called tonality. Tonality is mostly derived from human sense over a song rather than
any exact definition, which means that two pieces of same-tonic music will be perceived
relatively similar.
Figure 3.1. Example of main pitch classes within C scale.
Page 7 of 21
GROUP PROJECT REPORT Pitch detection algorithms
Most music is composed in a major or minor scale, each scale has a ”key” note (for example,
C major scale is in key C major) which means there are a total of 24 major/minor scales.
Determining the key of a song is crucial to musician, yet also extremely difficult because
there is no mathematical formula to define or even guess it after capturing the set of notes
in the song.
3.3 State of the art
Throughout the history of pitch tracking, few thorough studies to compare different types
of detection methods have been conducted. Most research focus on the properties and
applications of one method alone, due to the difficulties in selecting algorithms to evaluate,
setting a reasonable standard of comparison and compiling a comprehensive database. For
the fundamental part of our study, we decided to look at papers which introduced the
concepts of chosen PDAs as follows:
• YIN, a fundamental frequency estimator for speech and music [2]
• Cepstrum Pitch Determination [6]
• The SIFT Algorithm for Fundamental Frequency Estimation [5]
Musical key detection using pitch class profiles (PCPs) on the other hand was under extensive
research, with different dataset of various genres generating different base key profiles [7].
The general goal for such research tends to be to shape the principle of key detection in
human brain. For practical purpose, we focused on one algorithm proposed in a 2007 Master
thesis from the University of Vienna [8].
4 Scientific methods and materials
We discuss in this section our approach to PDA implementation, to key detection and to
mobile application development. The step-by-step process we propose might not be optimal,
but is simple enough to deploy using our current skills and tools.
Page 8 of 21
GROUP PROJECT REPORT Pitch detection algorithms
4.1 Tools
For this study, the following softwares and tools were used:
• IntelliJ 14.1.5 / Eclipse 4.5.1
• Android Studio 1.5
• Audacity
• Android phones
4.2 Pitch detection
Initial experiments were conducted using JAVA. We implemented the three PDAs according
to their proposed formulas on a set of pre-recorded sound samples to see the margin of
difference in their pitch estimates.
The samples used are of a female voice singing ‘ah’ at pitches from G]3 to B3.
The steps for each of the PDAs are described as follows:
4.2.1 YIN Estimator
First, a difference function is applied on the input signal xt:
dt(τ) =W∑j=1
(xj − xj+τ )2
dt(τ) is zero at zero lag and often nonzero at the period because of the imperfect periodicity,
therefore a cumulative mean normalized difference function is applied to avoid the zero lag
dip, normalize the function for the next step and reduce too-high errors.
d′t(τ) =
1 τ = 0
dt(τ)1τ
∑τ1 dt(j)
otherwise
Page 9 of 21
GROUP PROJECT REPORT Pitch detection algorithms
An absolute threshold is applied to reduce the too-low errors, then each local minimum d′tis subjected to parabolic interpolation in order to define the PP estimate.
Finally, for each index t, we search for a minimum of d′θ(Tθ) for θ within [t − Tmax/2, t +
Tmax/2] where Tθ is the estimate at time θ and Tmax is 25ms. The best local estimate
obtained is the pitch of xt.
4.2.2 Cepstrum Analysis
The cepstrum of a signal is defined with the following formula:
cn = F−1{log(|F (xn)|)}
For our purpose of pitch detection, the cepstrum of a windowed frame of signal is necessary
an is defined through the Fourier series:
cn =N−1∑n=0
log(|N−1∑n
xne−jk 2π
Nn|)ejk
2πNn
The pitch can then be estimated by picking the peak of the resulting signal.
DFT log IDFTxn Xk Xk cn
Figure 4.1. Block diagram of Cepstrum analysis.
4.2.3 SIFT
First, the input signal sn with sampling frequency 10kHz is low-pass filtered with a cutoff
at fc = 0.8kHz. The filter output xn is downsampled by a 5:1 ratio to reduce the number of
operations in later steps but still retains correctness.
The signal is then analyzed frame-by-frame, with a 64-sample frame length and 32-sample
frame shift. A 4th-order linear predictive analysis is then performed to obtain a set of
coefficients, then the frame is inverse filtered using said set to produce a residual signal.
Page 10 of 21
GROUP PROJECT REPORT Pitch detection algorithms
Consequently, the autocorrelation of that signal is searched for the primary peak which is used
to determine f0. Finally, the autocorrelation function is interpolated in the neighborhood of
the calculated pitch to increase the resolution of f0.
LPF 0.8kHz
5:1
Inverse Filter Autocorrelation Interpolationsn xn wn yn rn f0
Figure 4.2. Block diagram of the SIFT algorithm.
Full details on the formulas involved can be found in Appendix A.
4.3 Musical key detection
The basic process is in three steps: pitch detection, pitch class profile (PCP) generation and
PCP comparison.
4.3.1 Generating a PCP
A pitch class profile (PCP) is a 12-dimension vector whose each parameter represents the
intensity of a pitch class. Generating a PCP is the first step in the key detection process
since it then will be compared to the referenced profile to find the most suitable key that fit
the generated PCP.
4.3.2 PCP comparison
A generated PCP will be compared to 24 standard PCPs of 24 keys to find the closest one.
In this project, we used the linear comparison algorithm, which was proved in several related
papers to give the closest result.
The base key profile used is one derived by Krumhansl and Kessler in 1982 [4].
Page 11 of 21
GROUP PROJECT REPORT Pitch detection algorithms
C C] D D] E F F] G G] A A] B1
2
3
4
5
6
7In
tensi
tyFigure 4.3. Example: C minor key profile of Krumhansl and Kessler.
4.3.3 JAVA implementation/Android application
Before moving on to Android, a test version on JAVA is developed. The key detection part
of the program runs basically as follows:
• After obtaining the pitch array from the buffers created, the output will be put through
the function intensityNote() to generate a PCP vector (an array of Note objects) of
the whole song:
1 public Note[] intensityNote(List <Note > noteList){2 Note[] notes = Note.copy(Note.NOTES);3 for(int i = 0; i < notes.length; i++ ) {4 double intensity = 0;5 for (Note item: noteList) {6 if(notes[i]. equals(item)){7 intensity += item.getIntensity ();8 }9 }