Top Banner
Sparseness Criteria of F0-Frequencies Selection for Specmurt-Based Multi-Pitch Analysis without Modeling Harmonic Structure Daiki Nishimura, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University, Kobe 657-8501, Japan E-mail: {nishimura, nakashika}@me.cs.scitec.kobe-u.ac.jp, {takigu, ariki}@kobe-u.ac.jp Abstract This paper introduces a multi-pitch analysis method using specmurt analysis without modeling the common harmonic structure pattern. Specmurt analysis is based on the idea that the fundamental frequency distribution is expressed as a deconvolution of the observed spectrum by the common harmonic structure pattern. To analyze the fundamental frequency distribution, the common harmonic structure needs to be modeled accurately because it is often unknown while the observed spectrum is known. It is considered impossible, however, to obtain a highly accurate model of the structure since it can vary slightly depending on the pitch. Therefore we propose a method to analyze the fundamental frequency distribution without modeling the harmonic structure. We note that each peak of the observed spectrum indicates the fundamental frequency or the harmonic tone. Hence, the fundamental frequency distribution can be regarded as the set which only has the peaks corresponding to fundamental frequencies. To find the set, we prepare many sets of the peaks of the spectrum, and obtain a large number of common harmonic structures. We evaluate the sparseness of these structures using L1 or L2 norm, and then select the set that has derived the sparsest structure as a solution. The experimental result shows the effectiveness of the proposed method. Keywords: multi-pitch analysis, sparseness criteria, specmurt analysis 1. Introduction In recent years, music information processing technology has improved dramatically. This gives us many chances for creating music. For example, in the past only those who had specific musical skills could compose or arrange music, but now, anyone can en- joy these activities by using various music-related soft- ware. However, there still remain some fields that rely on people with specific skills, such as perfect-pitch. This ability is necessary when attempting to repro- duce or score music by simply hearing it, and consi- derable experience and effort are needed in order to acquire this skill. In particular, it is difficult to ana- lyze the signal that has tones of a different pitch at the same time. Therefore, a technology for analyzing multi-pitch signals is required. Monophonic music can be analyzed with relatively a high accuracy [1]-[4]. However, multi-pitch music is more difficult to analyze than a single tone. An acoustic signal has information of fundamental fre- quencies and harmonic frequencies, but in the case of multi-pitch sounds, it is unknown which peak corre- sponds to the fundamental frequency or the harmonic frequency. Moreover, the number of fundamental fre- quencies is not always known. This is one reason for the difficulty of multi-pitch analysis. Many techniques have been tried in multi-pitch analysis in the past, such as a comb filter [5], sta- tistical information of chords and their progression [6, 7], iterative estimation and separation [8], linear models for the overtone series [9], parameter estima- tion of superimposed spectrum models [10, 11], acou- stic object modeling using GMM and estimation with an EM algorithm [12]-[14]. Specmurt analysis [15]-[21] is another method of multi-pitch analysis. The met- hod defines the observed spectrum as a convolution of the fundamental frequency distribution and instru- mental information, and it differs from those listed above in terms of the introduction of the specmurt domain while [5] is processed in the time domain and [6]-[14] are processed in the spectrum domain. The conventional specmurt analysis is based on the approach that first obtains instrumental information Journal of Signal Processing, Vol. 17, No. 2, March 2013 29 PAPER Journal of Signal Processing, Vol.17, No.2, pp.29-38, March 2013
10

Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

Apr 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

PAPER

Sparseness Criteria of F0-Frequencies Selection for Specmurt-Based

Multi-Pitch Analysis without Modeling Harmonic Structure

Daiki Nishimura, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

Graduate School of System Informatics, Kobe University, Kobe 657-8501, Japan

E-mail: nishimura, [email protected], takigu, [email protected]

Abstract This paper introduces a multi-pitch analysis method using specmurt analysis without modeling

the common harmonic structure pattern. Specmurt analysis is based on the idea that the fundamental

frequency distribution is expressed as a deconvolution of the observed spectrum by the common harmonic

structure pattern. To analyze the fundamental frequency distribution, the common harmonic structure

needs to be modeled accurately because it is often unknown while the observed spectrum is known. It

is considered impossible, however, to obtain a highly accurate model of the structure since it can vary

slightly depending on the pitch. Therefore we propose a method to analyze the fundamental frequency

distribution without modeling the harmonic structure. We note that each peak of the observed spectrum

indicates the fundamental frequency or the harmonic tone. Hence, the fundamental frequency distribution

can be regarded as the set which only has the peaks corresponding to fundamental frequencies. To find the

set, we prepare many sets of the peaks of the spectrum, and obtain a large number of common harmonic

structures. We evaluate the sparseness of these structures using L1 or L2 norm, and then select the set

that has derived the sparsest structure as a solution. The experimental result shows the effectiveness of

the proposed method.

Keywords: multi-pitch analysis, sparseness criteria, specmurt analysis

1. Introduction

In recent years, music information processingtechnology has improved dramatically. This gives usmany chances for creating music. For example, in thepast only those who had specific musical skills couldcompose or arrange music, but now, anyone can en-joy these activities by using various music-related soft-ware. However, there still remain some fields that relyon people with specific skills, such as perfect-pitch.This ability is necessary when attempting to repro-duce or score music by simply hearing it, and consi-derable experience and effort are needed in order toacquire this skill. In particular, it is difficult to ana-lyze the signal that has tones of a different pitch atthe same time. Therefore, a technology for analyzingmulti-pitch signals is required.

Monophonic music can be analyzed with relativelya high accuracy [1]-[4]. However, multi-pitch musicis more difficult to analyze than a single tone. Anacoustic signal has information of fundamental fre-quencies and harmonic frequencies, but in the case of

multi-pitch sounds, it is unknown which peak corre-sponds to the fundamental frequency or the harmonicfrequency. Moreover, the number of fundamental fre-quencies is not always known. This is one reason forthe difficulty of multi-pitch analysis.

Many techniques have been tried in multi-pitchanalysis in the past, such as a comb filter [5], sta-tistical information of chords and their progression[6, 7], iterative estimation and separation [8], linearmodels for the overtone series [9], parameter estima-tion of superimposed spectrum models [10, 11], acou-stic object modeling using GMM and estimation withan EM algorithm [12]-[14]. Specmurt analysis [15]-[21]is another method of multi-pitch analysis. The met-hod defines the observed spectrum as a convolutionof the fundamental frequency distribution and instru-mental information, and it differs from those listedabove in terms of the introduction of the specmurtdomain while [5] is processed in the time domain and[6]-[14] are processed in the spectrum domain.

The conventional specmurt analysis is based on theapproach that first obtains instrumental information

Journal of Signal Processing, Vol. 17, No. 2, March 2013 29

PAPER

Journal of Signal Processing, Vol.17, No.2, pp.29-38, March 2013

Page 2: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

2 Journal of Signal Processing, Vol. , No. ,

ω∆ ω∆2 ω∆3 ω∆4ω

ω

(a) Linear frequency dom-ain

x∆ x∆ x∆ x∆x

x

(b) Log-frequency domain

Fig. 1 Positional relationship between fundamentaland harmonic frequencies

)(xu

1xFundamental frequency

distribution

Generated multi-pitch

spectrum

2x 2x1x

Common harmonic

structure pattern

)(xv)(xh

Fig. 2 Generation of a multi-pitch spectrum by con-volution of a common harmonic structure and a fun-damental frequency distribution [18]

by iteratively generating a model called the “commonharmonic structure” and then gives a fundamental fre-quency distribution based on the model. This methodbuilds a common harmonic structure, and the appro-ach is based on the premise that the relative powersof the harmonic components are common and do notdepend on the fundamental frequency. However, it isconsidered impossible to obtain a highly accurate mo-del of the structure since it can vary slightly dependingon the pitch. Because of the dependency of the harmo-nic structure on the pitch, the data-driven approachto select the harmonic structure without assuming thecommon harmonic structure is needed. Therefore wepropose a new method based on the sparseness criteriato analyze the fundamental frequency distribution.

2. Specmurt Analysis

2.1 Multi-pitch spectrum in log frequency

In our study, the acoustic signals having harmonicsare analyzed and percussive signals such as drums arenot targeted. The n-th harmonic frequency is equal ton multiple of the fundamental frequency in the linear-frequency scale. Therefore, when the fundamental fre-quency shifts by ∆ω, the n-th harmonic frequency alsoshifts by n × ∆ω (Fig. 1(a)). Meanwhile, in the log-frequency scale, the n-th harmonic frequency is lo-cated at log n away from the fundamental frequency.

This means that all harmonic frequencies shift by ∆xwhen the fundamental frequency shifts by ∆x in thelog-frequency scale (Fig. 1(b)).

In specmurt analysis, it is assumed that the relativepowers of the harmonic components are common anddo not depend on the fundamental frequency. Thisis called common harmonic structure h(x), where xrepresents log-frequency. The fundamental frequencyis located at the origin, and the power is normalizedto be 1. All pitch spectra can be expressed by a shiftof h(x) along the x-axis in the log-frequency domainwhen a fundamental frequency in the log-axis is given.

It is considered that a multi-pitch spectrum can begenerated by addition of a common harmonic struc-ture h(x) multiplied by the power corresponding tothe fundamental frequency. If the distribution of thepower of fundamental frequencies is defined as a fun-damental frequency distribution u(x), a multi-pitchspectrum v(x) is a convolution of h(x) and u(x), asshown in Fig. 2.

v(x) = h(x) ∗ u(x) (1)

2.2 Analysis of fundamental frequency distribution

If a common harmonic structure h(x) is known, afundamental frequency distribution u(x) can be esti-mated by the deconvolution of an observed multi-pitchspectrum v(x) by h(x)

u(x) = h(x)−1 ∗ v(x) (2)

According to the convolution theorem, Eq. (2) can beexpressed as

U(y) =V (y)

H(y)(3)

where U(y), H(y) and V (y) are the inverse Fouriertransform of u(x), h(x) and v(x), respectively. Wecan obtain u(x) using the Fourier transform of U(y)in the y domain as follows:

u(x) = F [U(y)] (4)

As described above, the method to estimate the fun-damental frequency distribution by deconvolution inthe log-frequency domain is called specmurt analysis[15]-[21], and the y domain (defined as the inverse Fou-rier transform of the log-frequency spectrum) is calledthe specmurt domain. In practical calculation, the ydomain may be regarded as the Fourier transform.

In specmurt analysis, a wavelet transform that canperform an analysis in the log-frequency is used to ex-tract spectra instead of the short-term Fourier trans-form since the observed spectrum v(x) is dealt within the log-frequency domain.

One characteristic of specmurt analysis is that itcan analyze music signals where pitch changes occur

30 Journal of Signal Processing, Vol. 17, No. 2, March 2013

Page 3: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

3

Fig. 3 Example of observed spectrum (piano triad)

in a short time. Therefore, the analysis result of a pi-ano roll, for example, can be obtained as visual infor-mation, where the horizontal axis represents the timeindex and the vertical axis represents the pitch.

2.3 Conventional approach with specmurt

The fundamental frequency distribution u(x) canbe obtained using Eq. (2) if the observed spectrumv(x) is given, and the common harmonic structureh(x) is known. However, h(x) is generally unknown.For this reason, h(x) has been modeled in some waysso far, and the model is assumed as follows. In [15, 16],the common harmonic structure whose power ratio ofthe n-th harmonic frequency component is 1/n of thefundamental frequency component is defined. Thisis based on the previous knowledge that a naturalsound spectrum commonly has such a shape. Howe-ver, the optimal fundamental frequency distributionu(x) is not always obtained by such an approach sincethe common harmonic structure varies depending onthe tone. In [17, 18, 21], a quasi-optimization with aniterative algorithm is used for estimating h(x) but amore accurate modeling method is required.

3. Sparseness-Based F0 Selection

3.1 Problem with modeling of the common harmonicstructure

As mentioned in the previous chapter, the conven-tional multi-pitch analysis with specmurt focuses onhow to model the common harmonic structure h(x).However, it is considered difficult to obtain the strictlycorrect model of the structure since it is known thatthe harmonic structure slightly varies depending onthe pitch. Therefore, we propose the method to ana-lyze the fundamental frequency distribution withoutmodeling the harmonic structure.

Observed Spectrum

Generation of Candidates for the

Fundamental Frequency Distribution

Calculation of Harmonic Structure

Using Specmurt

Finding the Optimal Harmonic

Structure Based on the Sparseness

Correct Fundamental Distribution

Rejection of Non-Harmonic Structure

Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitchanalysis

3.2 Outline of proposed method

If there is no noise in the observed spectrum v(x),it is believed that each peak corresponds to the fun-damental frequency or the harmonic frequency. Fig. 3shows the example of a spectrum of a piano triad. Itis considered possible to obtain the fundamental fre-quency distribution u(x) by selecting the set of thepeaks in the observed spectrum correctly. Hence, ourmethod focuses on finding the fundamental frequencydistribution u(x) that has all peaks correspondingto the fundamental frequencies of multiple tones anddoes not have any peaks corresponding to the harmo-nic frequency components. Fig. 4 shows the flowchartof our method. First, some candidates for the funda-

Fig. 5 Example of generation ui(x) from the observedspectrum

Journal of Signal Processing, Vol. 17, No. 2, March 2013 31

Page 4: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

4 Journal of Signal Processing, Vol. , No. ,

Fig. 6 Examples of ui(x) generated from Fig. 3 (upper row) and hi(x) corresponding to each ui(x) (lower row)

Fig. 7 Example of harmonic structure (single toneA3 of piano)

mental frequency distribution are generated from theobserved spectrum. Second, using specmurt, the har-monic structures corresponding to the candidates arecalculated, respectively. In the obtained structures,non-harmonic structures are rejected, and the optimalharmonic structure is found based on the sparsenessamong the remaining structures. Finally, the candi-date corresponding to the optimal harmonic structureis selected as the correct fundamental frequency dis-tribution.

3.3 Generation of candidates for fundamental fre-quency distribution

It is difficult to extract the peaks of fundamen-tal frequencies exclusively from the observed spectrumsince it cannot be said which peak corresponds tothe fundamental frequency or the harmonic frequencycomponents. Therefore, we will first discuss the can-didates for u(x).

It is known that the peaks corresponding to thefundamental frequencies have a certain level of po-wer. We consider M major peaks from the observedspectrum and obtain some sets of u(x) by selectingsome combinations from the different M major peaks.If the observed signal consists of L tones, the num-ber of candidates is calculated by MCL because thenumber of peaks of u(x) should be equal to the num-ber of tones. However, the number of tones is oftenunknown. Thus, the number of candidates λ is ex-pressed as

∑Ll=1 MCl so that it can calculate up to L

tones from single tone.Fig. 5 shows an example of generation ui(x) from

the observed spectrum. The candidates of u(x), ui(x)(i = 1, 2, . . . , λ) have a combination of peaks obtainedfrom the observed spectrum. Each peak is processedas an impulse that has the power equal to the corre-sponding peak.

3.4 Selection of optimal harmonic structure

3.4.1 Calculation of harmonic structure using spe-cmurt

A harmonic structure is obtained according toEq. (1) as follows:

h(x) = u(x)−1 ∗ v(x) (5)

One solution for the harmonic structure, hi(x), isobtained by substituting u(x) in Eq. (5) with the can-didate ui(x).

In this section, we discuss how to select an optimalharmonic structure h(x) among hi(x) (i = 1, 2, . . . ,

32 Journal of Signal Processing, Vol. 17, No. 2, March 2013

Page 5: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

5

λ). The figures in the upper row of Fig. 6 illustrate ex-amples of ui(x) generated from a piano triad (Fig. 3),

and those in the lower row show hi(x) correspondingto each ui(x). Fig. 6(a-1) shows a candidate ui(x) thathas all the fundamental frequencies and does not haveany harmonic frequency; i.e., the correct combinationof spectral peaks u(x). Fig. 6(b-1) and Fig. 6(c-1) areexamples of incorrect combinations of spectral peaks,where they lack some fundamental frequencies or havesome harmonic frequencies. Fig. 6(a-2) shows hi(x)corresponding to the correct combination of spectralpeaks u(x); i.e., the optimal harmonic structure h(x).As shown in these figures, h(x) is the most similar tothe harmonic structure (Fig. 7) among those on thelower row of Fig. 6.

On the other hand, the harmonic structure inFig. 6(b-2) has numerous peaks, though that in Fig. 7does not have peaks at the same position. Also,Fig. 6(c-2) does not have large peaks in the harmo-nic frequencies. The structure like Fig. 6(c-2) is calledthe non-harmonic structure in this paper.

3.4.2 Rejection of non-harmonic structuresIn order to reduce the computation cost for finding

the optimal harmonic structure which is described inthe 3.4.3, non-harmonic structures are rejected in ad-vance using a technique described in this section.

If the instrument or the pitch varies, the relativepower ratio of each harmonic frequency varies but eachappearance position of harmonic frequencies does notvary. Therefore, the appearance position of harmonicfrequency in the harmonic structure (Ω2,Ω3, . . . ,ΩN )is regarded as the information that is independent ofthe pitch, where Ωn represents the position of the n-th harmonic component, and Ω1 represents the ori-gin position for the fundamental frequency. Basedon this information, it is important to check whetherthere are values at (Ω2,Ω3, . . . ,ΩN ). For example,any structure that does not have any large peaks at(Ω2,Ω3, . . . ,ΩN ) like Fig. 6(c-2) is treated as a non-harmonic structure, and such structures are rejectedby using the threshold set experimentally.

3.4.3 Finding the optimal harmonic structure basedon the sparseness

An ideal harmonic structure has peaks only in thefundamental frequency and the harmonic frequenciesas in Fig. 7. In our method, in order to select theoptimal harmonic structure, we calculate the sparse-ness of each hi(x) that is not rejected, as describedin Section 3.4.2. According to Fig. 7 and Fig. 6, theoptimal harmonic structure h(x) is considered to besparser and has larger peaks at the harmonics thanother hi(x). Thus, the sparseness S is defined as

S(i) = −αLa(i)− (1− α)Lb(i) (6)

where α represents the weight. If L1 norm is used inthe first and the second terms,

La(i) =X∑

x=1

1−N∑j=1

δ (Ωj − x)|hi(x)| (7)

Lb(i) =

X∑x=1

N∑j=1

δ (Ωj − x) |hi(x)| (8)

where δ is the Kronecker’s delta. The first term, La(i),means the sparseness (except for harmonic compo-nents), and the second term, Lb(i), means the sum-mation of values at harmonics. If L2 norm is used inEq. (6),

La(i) =X∑

x=1

1−N∑j=1

δ (Ωj − x)hi(x)2 (9)

Lb(i) =X∑

x=1

N∑j=1

δ (Ωj − x)hi(x)2 (10)

Assuming that h(x) = hi(x), i can be determined by

i = argmaxi

S(i) (11)

3.5 Correct fundamental frequency distribution

As described above, the optimal harmonic struc-ture h(x) is obtained based on sparseness criteria. Fi-nally, u(x) corresponding to h(x) is selected uniquelyamong ui(x).

Summing up our method, the steps shown beloware processed for each frame.

1. Based on the observed spectrum v(x), the candi-dates of the optimal fundamental frequency dis-tribution ui(x) are prepared.

2. The candidates of the optimal harmonic struc-ture hi(x) are obtained by substituting ui(x) inEq. (5).

3. Non-harmonic structures are rejected amonghi(x), and the most sparsest hi(x) is determi-ned as h(x).

4. u(x) corresponding to h(x) is selected amongui(x).

This method does not need to learn the pitch or theinstrumental information since each step is indepen-dent of pitch and instrumental information.

Journal of Signal Processing, Vol. 17, No. 2, March 2013 33

Page 6: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

6 Journal of Signal Processing, Vol. , No. ,

Time

C3

C4

C5

C6

(a)

Time

C4

C5

C6

(b)

Fig. 8 (a) Piano-roll of test MIDI (data A) and (b)Piano-roll of test MIDI (data B)

4. Experiments

4.1 Conditions

To evaluate our method, we use two songs from theRWC Music Database1 as the test data (Table 1), andFig. 8 shows the piano-roll of data A and data B. Thetest signal is recorded at a 16kHz sampling rate usingMIDI instruments: piano, violin or acoustic guitar.Wavelet transform with Gabor function is applied tothe test data to obtain the spectrum. The parameterM described in Section 3.3 is set at 7. This meansthat we can analyze the observed signal having up to7 tones at the same time. Next, the parameter N isset at 6 since the value at ΩN tends to be unobservablewhen N increases and setting too large of an N mightcause the rejecting all hi(x).

Table 1 List of experimental data

Symbol Title Catalog numberdata A Sicilienne RWC-MDB-C-2001 No.43data B Gavotte E-Dur RWC-MDB-C-2001 No.36

4.2 Results

Fig. 9 depicts analysis results of data A, where (L2,L1) and the weight parameter of 0.9 are used for a pi-

1http://staff.aist.go.jp/m.goto/RWC-MDB/

Time

C3

C4

C5

C6

Fig. 9 An example of analysis result (data A): Redcircles indicate some mistaken notes.

ano roll. Almost all the notes are estimated correctly,but some notes are mistaken as octave-different notes.

Fig. 10 and 11 show the accuracies of data A anddata B for piano, violin, and guitar using our propo-sed method (without modeling harmonic structure),respectively. For example, (L1, L2) in the figures in-dicates that the L1 norm is used in the first term ofEq. (6), and L2 norm is used in the second term ofEq. (6). The weight parameter α in Eq. (6) is chan-ged from 0.0 to 1.0. The accuracy is calculated asfollows:

Accuracy(%) =Nall − (Nins +Ndel)

Nall× 100 (12)

where Nall, Nins and Ndel represent the total numberof notes, insertion errors and deletion errors, respecti-vely. In our experiments, the note duration is notevaluated, and we permit the onset time to shift τ se-conds (in experiments, τ = 0.3) since the onset timeand the duration of each tone are not exactly equal tothe score.

As shown in Fig. 10 and Fig. 11, the optimal para-meter varied, depending on the instrument. For piano,the results with large weight indicate higher accuracy(Fig. 10(a) and 11(a)). This means La may work ef-fectively for instruments with frequency structures si-milar to that of a piano, where the largest peak atthe origin (fundamental frequency) is observed, andbecause the frequency is higher, the peak value issmall. For violin, the use of small weight resultedin the higher accuracy (Fig. 10(b) and 11(b)). Thismeans that Lb which calculates the summation of va-lues at the harmonic, may work well for instrumentswith frequency structures similar to that of a violin,where the structure is different from that of a pianoin terms of having a larger peak at the second har-monic than the fundamental frequency. We will needto investigate further the effectiveness of La and Lb

in future work. For guitar, Fig. 10(c) shows that theuse of middle weight resulted in higher accuracy, andFig. 11(c) shows the use of small weight resulted hig-her accuracy. In the case of guitar, the largest peakis observed at the fundamental frequency, similar to

34 Journal of Signal Processing, Vol. 17, No. 2, March 2013

Page 7: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

7

(a) Accuracy results for piano (data A) (b) Accuracy results for violin (data A) (c) Accuracy results for guitar (data A)

Fig. 10 Accuracy results (data A)

(a) Accuracy results for piano (data B) (b) Accuracy results for violin (data B) (c) Accuracy results for guitar (data B)

Fig. 11 Accuracy results (data B)

piano, but a guitar sometimes produces the peaks ofattack sound at the lower frequency than the funda-mental frequency. Therefore, occasionally, the attacksound is regarded as the fundamental frequency, andthe correct fundamental frequency is regarded as thesecond harmonic. For that reason, some notes may beregarded as a violin. As a consequence, the optimalweight for a guitar varies depending on the number ofthe notes regarded as a violin.

In all results for data A (except violin), the combi-nation of (L2, L1) resulted in the best accuracy, whereL2 norm is used in the first term, La in Eq. (6). Inorder to increase the value of Eq. (6), La, which is thesummation of the noises in the harmonic structure,has to be small, and Lb, which is the summation ofthe harmonic, has to be large. L2 norm reduces thevalue of La (the first term in Eq. (6)) better than theuse of L1 norm because L2 norm makes the value thatis less than 1 more smaller (all noises are smaller than1). On the other hand, in order to increase the va-lue of Lb (the second term in Eq. (6)), the use of L1

norm is better than L2 because most harmonics arealso smaller than 1.

Table 2 shows the comparison between thespecmurt-based method with modeling of the commonharmonic structure [18] and the proposed method indata A, where the optimal parameters are selected ineach method. The proposed method (without mo-deling the common harmonic structure) obtained hig-her accuracies than that with the common harmonic

Fig. 12 Observed spectrum (multi-pitch D4 and B4of piano)

structure for each instrument.Fig. 12 shows an observed spectrum of multi-pitch

(D4 and B4). Fig. 13(a-1) and Fig. 13(a-2) show thefundamental frequencies and the harmonic structureobtained from the observed spectrum by modeling theharmonic structure, and Fig. 13(b-1) and Fig. 13(b-2) show the results gained by the proposed method.The modeled harmonic structure (Fig. 13(a-2)) hasno noise, but the fundamental frequency distributioncorresponding to it (Fig. 13(a-1)) is incorrect. Somemistaken peaks of the distribution may be eliminatedusing threshold processing; however, the larger peak

Journal of Signal Processing, Vol. 17, No. 2, March 2013 35

Page 8: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

8 Journal of Signal Processing, Vol. , No. ,

Fig. 13 The fundamental frequency distribution and the harmonic structure obtained from Fig. 12 by conventionalspecmurt (left) and proposed (right)

Fig. 14 The harmonic structure of piano (D4)

circled in black in Fig. 13(a-1) may not be excluded.On the other hand, the harmonic structure producedby the proposed method (Fig. 13(b-1)) has some smallnoises, but the fundamental frequency (Fig. 13(b-2))is correct. The noises in the harmonic structure comefrom the difference of each harmonic structure D4 andB4. Since the noises absorb the difference, the opti-mal fundamental distribution can be obtained.

Fig. 14 shows the harmonic structure of a piano(D4). There are some differences between this struc-ture and Fig. 13(a-2) although Fig. 13(a-2) is themodeled harmonic structure. This may be becauseit is difficult to model the optimal common harmo-nic structure in multi-pitch music since the harmonicstructure can vary slightly depending on the pitch. Inour future work, we will study how to best obtain theoptimal common harmonic structure in multi-pitch si-

Table 2 Comparison of a specmurt-based methodwith modeling harmonic structure to the proposedmethod

with modeling w/o modelingharmonics harmonics

Piano 89.2% 92.7%Guitar 74.3% 79.7%Violin 65.0% 71.7%

tuations.

5. Conclusion

In this paper, we proposed a specmurt-based,multi-pitch analysis method without modeling thecommon harmonic structure. Instead of modelingthe structure, the optimal harmonic structure is se-lected among the candidates based on sparseness cri-teria. The experiments show our method is effectivefor multi-pitch analysis. The results from Fig. 10 andFig. 11 indicate that the optimal parameter α variesdepending on instruments or music. Since multi-pitchanalysis in a real environment deals with some instru-ments or pitches without instrument information, inour future work, we will study how to determine theoptimal parameter. In the future, we will improve themethod by adding other criteria to avoid octave dif-ference errors and to make it possible to apply ourmethod to vocal singing harmony.

36 Journal of Signal Processing, Vol. 17, No. 2, March 2013

Page 9: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

9

References

[1] L. R. Rabiner: On the use of autocorrelation analysis for

pitch detection, IEEE Trans. ASSP, Vol. ASSP-25, No. 1,

pp. 24-33, 1977.

[2] D.J. Hermes: Measurement of pitch by subharmonic sum-

mation, Journal of ASA, Vol. 83, No. 1, pp. 257-264, 1988.

[3] Y. Takasawa: Transcription with Computer, IPSJ, Vol. 29,

No. 6, pp. 593-598, 1988.

[4] P. Cuadra, A. Master and C. Sapp: Efficient pitch de-

tection techniques for interactive music, International Com-

puter Music Conference, 2001.

[5] T. Miwa, Y. Tadokoro and T. Saito: The pitch estimation

of different musical instruments sounds using comb filters for

transcription, IEICE Trans. D-II, Vol. J81-D-II, No. 9, pp.

1965-1974, 1998.

[6] K. Kashino, K. Nakadai, T. Kinoshita and H. Tanaka: Or-

ganization of hierarchical perceptual sounds: Music scene

analysis with autonomous processing modules and a quan-

titive information integration mechanism, Proc. IJCAI, Vol.

1, pp. 158-164, 1995.

[7] K. Kashino, T. Kinoshita, K. Nakadai and H. Tanaka:

Chord recognition mechanisms in the OPTIMA processing

architecture for music scene analysis, IEICE Trans. D-II, Vol.

J79-D-II, No. 11, pp. 1762-1770, 1996.

[8] A. Klapuri, T. Virtanen and J. Holm: Robust multipich

estimation for the analysis and manipulation of polyphonic

musical signals, Proc. COST-G6 Conference on Digital Au-

dio Effects, pp. 233-236, 2000.

[9] T. Virtanen and A. Klapuri: Separation of harmonic

sounds using linear models for the overtone series, Proc.

ICASSP2002, Vol. 2, pp. 1757-1760, 2002.

[10] M. Goto: F0 estimation of melody and bass lines in musical

audio signals, IEICE Trans. D-II, Vol. J84-D-II, No. 1, pp.

12-22, 2001.

[11] M. Goto: A real-time music scene description system:

Predominant-F0 estimation for detecting melody and bass

lines in real-world audio signals, ISCA Journal, Vol. 43, No.

4, pp. 311-329, 2004.

[12] K. Miyamoto, H. Kameoka, T. Nishino, N. Ono and S.

Sagayama: Harmonic, temporal and timbral unified cluste-

ring for multi-instrumental music signal analysis, IPSJ SIG

Technical Report, 2005-MUS, Vol. 82, pp. 71-78, 2005.

[13] H. Kameoka, J. Le Roux, N. Ono and S. Sagayama: Har-

monic temporal structured clustering: A new approach to

CASA, ASJ, Vol. 36, No. 7, pp. 575-580, 2006.

[14] K. Miyamoto, H. Kameoka, T. Nishimoto, N. Ono and S.

Sagayama: Harmonic-temporal-timbral clustering (HTTC)

for the analysis of multi-instrument polyphonic music signals,

Proc. ICASSP2008 pp. 113-116, 2008.

[15] K. Takahashi, T. Nishimoto and S. Sagayama: Multi-pitch

analysis using deconvolution of log-frequency spectrum, IPSJ

SIG Technical Report, 2003-MUS, Vol. 127, pp. 113-116,

2008.

[16] S. Sagayama, K. Takahashi, H. Kameoka and T. Nishino:

Specmurt analysis: A piano-roll-visualization of polypho-

nic music signal by deconvolution of log-frequency spectrum,

Proc. ISCA Tutorial and Research Workshop on Statistical

and Perceptual Audio Processing (SAPA2004), to appear,

2004.

[17] H. Kameoka, S. Saito, T. Nishino and S. Sagayama: Re-

cursive estimation of quasi-optimal common harmonic struc-

ture pattern for specmurt analysis: Piano-roll visualization

and MIDI conversion of polyphonic music signal, IPSJ SIG

Technical Report, 2004-MUS, Vol. 84, pp.41-48, 2004.

[18] S. Saito, H. Kameoka, T. Nishimoto and S. Sagayama:

Specmurt analysis of multi-pitch music signals with adaptive

estimation of common harmonic structure, Proc, Internatio-

nal Conference on Music Information Retrieval (ISMIR2005),

pp. 84-91, 2005.

[19] S. Saito, H. Kameoka, N. Ono and S. Sagayama: POCS-

based common harmonic structure estimation for specmurt

analysis, IPSJ SIG Technical Report, 2006-MUS, Vol. 45, pp.

13-18, 2006.

[20] S. Saito, H. Kameoka, N. Ono and S. Sagayama: Iterative

multipitch estimation algorithm for MAP specmurt analysis,

IPSJ SIG Technical Report, 2006-MUS, Vol. 90, pp. 85-92,

2006.

[21] S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto and S.

Sagayama: Specmurt analysis of polyphonic music signals,

IEEE Trans. ASLP, Vol. 16, No. 3, pp. 639-650, 2008.

Daiki Nishimura received hisB.E. degree in computer sciencefrom Kobe University in 2011. Hiscurrent research interest includesacoustic signal processing. He is amember of ASJ.

Toru Nakashika received hisB.E. and M.E. degrees in compu-ter science from Kobe University in2009 and 2011, respectively. In thesame year, he continued his researchas a doctoral student. From Sep-tember 2011 to August 2012 he stu-died at INSA de Lyon in France. Heis currently a 2nd-year doctoral stu-dent at Kobe University. His rese-arch interest is speech and image re-cognition and statistical signal pro-cessing. He is a member of IEEE

and ASJ.

Journal of Signal Processing, Vol. 17, No. 2, March 2013 37

Page 10: Sparseness Criteria of F0-Frequencies Selection for ...takigu/pdf/2013/risp.pdf · Fig. 4 Flowchart of sparseness criteria of F0-frequencies selection for specmurt-based multi-pitch

10 Journal of Signal Processing, Vol. , No. ,

Tetsuya Takiguchi receivedhis B.S. degree in applied mathe-matics from Okayama University ofScience, Okayama, Japan, in 1994,and his M.E. and Dr. Eng. degreesin information science from Nara In-stitute of Science and Technology,Nara, Japan, in 1996 and 1999, re-spectively. From 1999 to 2004, hewas a researcher at IBM Research,Tokyo Research Laboratory, Kana-gawa, Japan. He is currently an As-sociate Professor at Kobe University.

His research interests include statistic signal processing and pat-tern recognition. He received the Awaya Award from the Acou-stical Society of Japan in 2002. He is a member of IEEE, IPSJand ASJ.

Yasuo Ariki received his B.E.,M.E. and Ph.D. degrees in informa-tion science from Kyoto Universityin 1974, 1976 and 1979, respecti-vely. He was an Assistant Profes-sor at Kyoto University from 1980to 1990, and stayed at EdinburghUniversity as visiting academic from1987 to 1990. From 1990 to 1992 hewas an Associate Professor and from1992 to 2003 a Professor at RyukokuUniversity. Since 2003 he has been aProfessor at Kobe University. He is

mainly engaged in speech and image recognition and interestedin information retrieval and database. He is a member of IEEE,IPSJ, JSAI, ITE and IIEEJ.

(Received July 17, 2012; revised January 7, 2013)

38 Journal of Signal Processing, Vol. 17, No. 2, March 2013