Ph.D. Defense - IRCAMimtr.ircam.fr/imtr/images/PhD_Defense.pdf · Ph.D. Defense Computational Methods of Information Geometry with Real-Time Applications in Audio Signal Processing

Preliminaries on Information GeometrySequential Change Detection with Exponential Families

Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences

Real-Time Polyphonic Music Transcription

Ph.D. Defense

Computational Methods of Information Geometrywith Real-Time Applications in Audio Signal Processing

Arnaud Dessein

December 13th 2012

M. Gérard Assayag DirecteurM. Arshia Cont EncadrantM. Francis Bach RapporteurM. Frank Nielsen RapporteurM. Roland Badeau ExaminateurM. Silvère Bonnabel ExaminateurM. Jean-Luc Zarader Examinateur

[email protected] December 13th 2012 Ph.D. Defense 1/46




IntroductionFrom information geometry theory:

Study of statistics with concepts from:Differential geometry such as smooth manifolds.Information theory such as statistical divergences.

Parametric statistical models possess an intrinsic geometrical structure.

To computational information geometry:Broad community around the development and application of computational methodsbased on information geometry theory.Many techniques in machine learning and signal processing rely on statistical models ordistance functions: principal component analysis, independent component analysis, centroidcomputation, k-means, expectation-maximization, nearest neighbor search, range search, smallestenclosing balls, Voronoi diagrams.

Objectives of the thesis:Employ this framework for audio signal processing.Primary motivations from real-time machine listening.







Parametric statistical models possess an intrinsic geometrical structure.To computational information geometry:

Broad community around the development and application of computational methodsbased on information geometry theory.Many techniques in machine learning and signal processing rely on statistical models ordistance functions: principal component analysis, independent component analysis, centroidcomputation, k-means, expectation-maximization, nearest neighbor search, range search, smallestenclosing balls, Voronoi diagrams.








Parametric statistical models possess an intrinsic geometrical structure.To computational information geometry:

Broad community around the development and application of computational methodsbased on information geometry theory.Many techniques in machine learning and signal processing rely on statistical models ordistance functions: principal component analysis, independent component analysis, centroidcomputation, k-means, expectation-maximization, nearest neighbor search, range search, smallestenclosing balls, Voronoi diagrams.






Outline

3. Non-Negative Matrix Factorizationwith Convex-Concave Divergences

1. Preliminaries onInformation Geometry

2. Sequential Change Detectionwith Exponential Families

5. Real-Time PolyphonicMusic Transcription

4. Real-Time AudioSegmentation

I. Computational Methods of Information Geometry

II. Real-Time Applications in Audio Signal Processing





Separable divergences on the space of discrete positive measuresExponential families of probability distributions

Outline













Basic notions and common information divergences

Divergences or distance functions generalize metric distances:Divergence: D(y‖y′) ≥ 0 and D(y‖y) = 0.Separable divergence: D(y‖y′) =

∑mi=1 d(yi ‖y

′i ).

Convex-concave divergence: d(y‖y ′) = qd(y‖y ′) + pd(y‖y ′).Squared Euclidean distance on R: dE(y‖y′) = (y − y′)2.Kullback-Leibler divergence on R∗+: dKL(y‖y′) = y log y/y′ − y + y′.Itakura-Saito divergence on R∗+: dIS(y‖y′) = y/y′ − log y/y′ − 1.

Separable divergences on (R∗+)n generated by a smooth convex function ϕ:Csiszár: d (C)

ϕ (y‖y ′) = y ϕ(y ′/y), where ϕ(1) = ϕ′(1) = 0.

α-divergences: d(a)α (y‖y′) = 1

α(1−α)(αy + (1− α)y′ − yαy′1−α).

Kullback-Leibler, dual Kullback-Leibler, Hellinger, Pearson’s χ2, Neyman’s χ2.

Bregman: d (B)ϕ (y‖y ′) = ϕ(y)− ϕ(y ′)− (y − y ′)ϕ′(y ′).

β-divergences: d(b)β (y‖y′) = 1

β(β−1)(yβ + (β − 1)y′β − βyy′β−1).

Squared Euclidean, Kullback-Leibler, Itakura-Saito.

Skew Jeffreys-Bregman: d (JB)ϕ,λ(y‖y ′) = λdϕ(y‖y ′) + (1− λ)dϕ(y ′‖y).

Skew Jensen-Bregman:d (JB′)ϕ,λ (y‖y ′) = λdϕ(y‖λy + (1− λ)y ′) + (1− λ)dϕ(y ′‖λy + (1− λ)y ′).






Basic notions and common information divergences

Divergences or distance functions generalize metric distances:Divergence: D(y‖y′) ≥ 0 and D(y‖y) = 0.Separable divergence: D(y‖y′) =

∑mi=1 d(yi ‖y

′i ).

Convex-concave divergence: d(y‖y ′) = qd(y‖y ′) + pd(y‖y ′).Squared Euclidean distance on R: dE(y‖y′) = (y − y′)2.Kullback-Leibler divergence on R∗+: dKL(y‖y′) = y log y/y′ − y + y′.Itakura-Saito divergence on R∗+: dIS(y‖y′) = y/y′ − log y/y′ − 1.

Separable divergences on (R∗+)n generated by a smooth convex function ϕ:Csiszár: d (C)

ϕ (y‖y ′) = y ϕ(y ′/y), where ϕ(1) = ϕ′(1) = 0.

α-divergences: d(a)α (y‖y′) = 1

α(1−α)(αy + (1− α)y′ − yαy′1−α).

Kullback-Leibler, dual Kullback-Leibler, Hellinger, Pearson’s χ2, Neyman’s χ2.

Bregman: d (B)ϕ (y‖y ′) = ϕ(y)− ϕ(y ′)− (y − y ′)ϕ′(y ′).

β-divergences: d(b)β (y‖y′) = 1

β(β−1)(yβ + (β − 1)y′β − βyy′β−1).

Squared Euclidean, Kullback-Leibler, Itakura-Saito.

Skew Jeffreys-Bregman: d (JB)ϕ,λ(y‖y ′) = λdϕ(y‖y ′) + (1− λ)dϕ(y ′‖y).

Skew Jensen-Bregman:d (JB′)ϕ,λ (y‖y ′) = λdϕ(y‖λy + (1− λ)y ′) + (1− λ)dϕ(y ′‖λy + (1− λ)y ′).






Skew (α, β, λ)-divergencesSeparable divergences on (R∗+)n (continued):

(α, β)-divergence: d (ab)α,β(y‖y ′) = 1

αβ(α+β) (αyα+β + βy ′α+β − (α + β)yαy ′β).α-divergence for α + β = 1.(β + 1)-divergence for α = 1.

Skew (α, β, λ)-divergence: d (ab)α,β,λ(y‖y ′) = λd (ab)

α,β(y‖y ′) + (1− λ)d (ab)α,β(y ′‖y).






Basic notions and properties

Exponential family: pθ(x) = exp(θ>x− ψ(θ)).The sufficient observations x belong to Rm.The natural parameters θ belong to a convex set N ⊆ Rm.The log-normalizer ψ is convex on N and smooth on intN .

Many common models: Bernoulli, Dirichlet, Gaussian, Laplace, Pareto, Poisson, Rayleigh, VonMises-Fisher, Weibull, Wishart, log-normal, exponential, beta, gamma, geometric, binomial, negativebinomial, categorical, multinomial.Legendre-Fenchel conjugate: φ(η) = supθ∈Rm θ>η − ψ(θ).

The expectation parameters η belong to the convex set intK.We have duality between natural and expectation parameters through ∇ψ and ∇φ.

Maximum likelihood: pηml(x1, . . . , xn) = 1n∑n

j=1 xj .Simple arithmetic mean in expectation parameters.Natural parameters obtained by convex duality.








Many common models: Bernoulli, Dirichlet, Gaussian, Laplace, Pareto, Poisson, Rayleigh, VonMises-Fisher, Weibull, Wishart, log-normal, exponential, beta, gamma, geometric, binomial, negativebinomial, categorical, multinomial.

Legendre-Fenchel conjugate: φ(η) = supθ∈Rm θ>η − ψ(θ).The expectation parameters η belong to the convex set intK.We have duality between natural and expectation parameters through ∇ψ and ∇φ.






























Dually flat geometry

Canonical divergences:Kullback-Leibler divergence: DKL(Pθ‖Pθ′ ) =

∫pθ log(pθ/pθ′ ) dν.

Bregman divergences: Bϕ(ξ‖ξ′) = ϕ(ξ)− ϕ(ξ′)− (ξ − ξ′)>∇ϕ(ξ′).Relation: DKL(Pθ‖Pθ′ ) = Bψ(θ′‖θ) = Bφ(η(θ)‖η(θ′)).





ContextStatistical frameworkMethods for exponential familiesDiscussion

Outline













Background

Principle:Decide whether the process presents some structural modifications along time.Find the time instants corresponding to the different change points.Characterize the properties within the respective segments.

Applications:Quality control in industrial production.Fault detection in technological processes.Automatic surveillance for intrusion and abnormal behavior in security monitoring.Signal processing in geophysics, econometrics, audio, medicine, image.

Approaches:Statistical modeling and monitoring of the distributions [Basseville & Nikiforov, 1993,Lai, 1995, Poor & Hadjiliadis, 2009, Chen & Gupta, 2012, Polunchenko & Tartakovsky, 2012].Machine learning techniques relying on distance functions [Harchaoui & Lévy-Leduc, 2008,Harchaoui & Lévy-Leduc, 2010, Vert & Bleakley, 2010, Desobry et al., 2005, Harchaoui et al., 2009].






Background









Background









Motivations and contributions

Issues of statistical online approaches:Either approximations of the exact statistics with unknown parameters for tractability.Or restrictions on the data and scenarios [Siegmund & Venkatraman, 1995, Mei, 2006].With the exception of a full Bayesian framework for exponential families[Lai & Xing, 2010].

Goals in the non-Bayesian framework:Known or unknown parameters.Additive or non-additive changes.Topology of the parameters and data.Exact inference for online schemes.

Contributions in this context:Study of the generalized likelihood ratios within the dually flat information geometry.Estimation with arbitrary estimators compared to maximum likelihood.Alternative expression of the statistics through convex duality.Attractive simplification for exact inference with maximum likelihood.
























Multiple hypothesis

Problem formulation:X1, . . . ,Xn are mutually independent from P = {Pξ}ξ∈Ξ

.Observe sx = (x1, . . . , xn) ∈ X n.Decide whether X1, . . . ,Xn are i.i.d. or not.

Multiple hypotheses:H0 : X1, . . . ,Xn ∼ Pξ0 , ξ0 ∈ Ξ0.H1 : X1, . . . ,Xi ∼ P

ξi0, ξi

0 ∈ Ξi0 , Xi+1, . . . ,Xn ∼ P

ξi1, ξi

1 ∈ Ξi1 , i ∈ v1, n − 1w.

H i1 : X1, . . . ,Xi ∼ P

ξi0, ξi

0 ∈ Ξi0 , Xi+1, . . . ,Xn ∼ P

ξi1, ξi

1 ∈ Ξi1.

. . .






Multiple hypothesis

Problem formulation:X1, . . . ,Xn are mutually independent from P = {Pξ}ξ∈Ξ

.Observe sx = (x1, . . . , xn) ∈ X n.Decide whether X1, . . . ,Xn are i.i.d. or not.

Multiple hypotheses:H0 : X1, . . . ,Xn ∼ Pξ0 , ξ0 ∈ Ξ0.H1 : X1, . . . ,Xi ∼ P

ξi0, ξi

0 ∈ Ξi0 , Xi+1, . . . ,Xn ∼ P

ξi1, ξi

1 ∈ Ξi1 , i ∈ v1, n − 1w.

H i1 : X1, . . . ,Xi ∼ P

ξi0, ξi

0 ∈ Ξi0 , Xi+1, . . . ,Xn ∼ P

ξi1, ξi

1 ∈ Ξi1.

. . .






Test statistics and decision rules

Likelihood ratio for known parameters: Λi (sx)=−2 log∏n

j=1 pξbef (xj )∏ij=1 pξbef (xj )

∏nj=i+1 pξaft (xj )

.

Simplification as cumulative sum statistics: 12 Λi (sx) =

∑nj=i+1 log

pξaft(xj )

pξbef(xj ) .

Efficient recursive implementation for online procedures.

Generalized likelihood ratio: pΛi (sx) = −2 log∏n

j=i+1 ppξ0(sx)(xj )∏i

j=1 ppξi0(sx)(xj )

∏nj=i+1 ppξi1(sx)

(xj ).

Two cumulative sums: 12

pΛi (sx) =∑i

j=1 logppξi0(sx)

(xj )

ppξ0(sx)(xj ) +

∑nj=i+1 log

ppξi1(sx)(xj )

ppξ0(sx)(xj ) .

Computationally more demanding so usually approximated for online procedures.

Non-Bayesian decision rule for a change: max1≤i≤n−1 pΛi (sx)H1≷H0λ.

Comparison of the maximum statistics to a threshold.Change point estimated as the first time point where the maximum is reached.










.


∑nj=i+1 log

pξaft(xj )

pξbef(xj ) .




j=1 ppξi0(sx)(xj )


(xj ).


pΛi (sx) =∑i

j=1 logppξi0(sx)

(xj )

ppξ0(sx)(xj ) +

∑nj=i+1 log

ppξi1(sx)(xj )

ppξ0(sx)(xj ) .













.


∑nj=i+1 log

pξaft(xj )

pξbef(xj ) .




j=1 ppξi0(sx)(xj )


(xj ).


pΛi (sx) =∑i

j=1 logppξi0(sx)

(xj )

ppξ0(sx)(xj ) +

∑nj=i+1 log

ppξi1(sx)(xj )

ppξ0(sx)(xj ) .









Generic scheme

Theorem (2.1)

For an exponential family, the generalized likelihood ratio satisfies:

12

pΛi (sx) = i{DKL

(Ppθi

0 ml(sx)

∥∥∥Ppθ0(sx)

)− DKL

(Ppθi

0 ml(sx)

∥∥∥Ppθi0(sx)

)}+ (n − i)

{DKL

(Ppθi

1 ml(sx)

∥∥∥Ppθ0(sx)

)− DKL

(Ppθi

1 ml(sx)

∥∥∥Ppθi1(sx)

)}.






Specific casesVarious scenarios:

Known parameters before and after change.Known parameter before change, unknown parameter after change.Unknown parameters before and after change.

Example (2.4)

Exact statistics and maximum likelihood:12

pΛi (sx) = i DKL(Ppθi

0 ml(sx)

∥∥∥Ppθ0 ml(sx)

)+ (n − i)DKL

(Ppθi

1 ml(sx)


).

Example (2.3)

Approximate statistics: 12

pΛi (sx) = (n − i)DKL(Ppθi

1 ml(sx)

∥∥∥Ppθ0(sx)

).






Specific casesVarious scenarios:

Known parameters before and after change.Known parameter before change, unknown parameter after change.Unknown parameters before and after change.

Example (2.4)

Exact statistics and maximum likelihood:12


0 ml(sx)


)+ (n − i)DKL

(Ppθi

1 ml(sx)


).

Example (2.3)

Approximate statistics: 12

pΛi (sx) = (n − i)DKL(Ppθi

1 ml(sx)

∥∥∥Ppθ0(sx)

).






Revisiting through convex duality

Proposition (2.3)


12

pΛi (sx) = i φ(pηi0(sx)) + (n − i)φ(pηi

1(sx))− n φ(pη0(sx)) + ∆iml(sx) .

where the corrective term ∆iml compared to maximum likelihood estimation equals:

∆iml(sx) = i (pηi

0 ml(sx)− pηi0(sx))

>∇φ(pηi0(sx)) + (n − i) (pηi


>∇φ(pηi1(sx))

− n (pη0 ml(sx)− pη0(sx))>∇φ(pη0(sx)) .

Example (2.5)

The exact generalized likelihood ratio for unknown parameters and maximumlikelihood estimation verifies:

12

pΛi (sx) = i φ(pηi0 ml(sx)) + (n − i)φ(pηi

1 ml(sx))− n φ(pη0 ml(sx)) .






Revisiting through convex duality

Proposition (2.3)


12

pΛi (sx) = i φ(pηi0(sx)) + (n − i)φ(pηi

1(sx))− n φ(pη0(sx)) + ∆iml(sx) .

where the corrective term ∆iml compared to maximum likelihood estimation equals:

∆iml(sx) = i (pηi


>∇φ(pηi0(sx)) + (n − i) (pηi


>∇φ(pηi1(sx))

− n (pη0 ml(sx)− pη0(sx))>∇φ(pη0(sx)) .

Example (2.5)

The exact generalized likelihood ratio for unknown parameters and maximumlikelihood estimation verifies:

12

pΛi (sx) = i φ(pηi0 ml(sx)) + (n − i)φ(pηi

1 ml(sx))− n φ(pη0 ml(sx)) .






Discussion

Summary:Standard non-Bayesian approach to sequential change detection.Dually flat information geometry of exponential families.Generalized likelihood ratios with arbitrary estimators.Attractive scheme for exact inference when unknown parameters.

Perspectives:Direct extensions:

Non-steep or curved exponential families.Maximum a posteriori estimators.

Asymptotic properties:Distribution of the test statistics.Optimality formulation and analysis.

Statistical dependence:Autoregressive models.Non-linear systems and particle filtering.

Alternative test statistics:Reversing the problem and starting from geometric considerations.Information divergences and more robust estimators.






Discussion

Summary:Standard non-Bayesian approach to sequential change detection.Dually flat information geometry of exponential families.Generalized likelihood ratios with arbitrary estimators.Attractive scheme for exact inference when unknown parameters.

Perspectives:Direct extensions:

Non-steep or curved exponential families.Maximum a posteriori estimators.

Asymptotic properties:Distribution of the test statistics.Optimality formulation and analysis.

Statistical dependence:Autoregressive models.Non-linear systems and particle filtering.

Alternative test statistics:Reversing the problem and starting from geometric considerations.Information divergences and more robust estimators.





ContextProposed approachExperimental resultsDiscussion

Outline













Background

Principle:Determine time boundaries that partition a sound into homogeneous and continuoustemporal segments, such that adjacent segments exhibit inhomogeneties.Define a criterion to quantify the homogeneity of the segments.

Approaches:Supervised: high-level classes and automatic classification.Unsupervised: statistical and distance-based approaches:

Musical onset detection [Bello et al., 2005, Dixon, 2006].Speaker segmentation [Kemp et al., 2000, Kotti et al., 2008].






Background

Principle:Determine time boundaries that partition a sound into homogeneous and continuoustemporal segments, such that adjacent segments exhibit inhomogeneties.Define a criterion to quantify the homogeneity of the segments.

Approaches:Supervised: high-level classes and automatic classification.Unsupervised: statistical and distance-based approaches:

Musical onset detection [Bello et al., 2005, Dixon, 2006].Speaker segmentation [Kemp et al., 2000, Kotti et al., 2008].







Issues of unsupervised approaches to audio segmentation:Often tailored to particular types of signal and homogeneity criterion.Specific distance functions or models.Some are offline.Others approximate the exact statistics.

Goals towards a unifying framework for audio segmentation:Arbitrary types of signals homogeneity criteria.Large choice of distance functions or models.Real-time constraints.Exact online inference.

Contributions in this context:Generic framework for real-time audio segmentation.Unification of several standard approaches.Online change detection with exponential families.Exact generalized likelihood ratios and maximum likelihood.
























System architecture

Segmentation scheme:1 Represent frames with a short-time sound description.2 Model the observations with probability distributions.3 Detect sequentially changes in the distribution parameters.

Short-time sound representation:Energy for information on loudness.Fourier transform for information on spectral content.Mel-frequency cepstral coefficients for information on timbre.Many other possibilities.

Statistical modeling:Exponential families and generalized likelihood ratios.Unknown parameters and maximum likelihood.

Auditory scene

Statistical modeling

Change detection

Short-time sound representation

Audio segmentation (online)






System architecture




Auditory scene


Change detection








System architecture




Auditory scene


Change detection








Change detection

Clarification of the relations between statistical approaches:Likelihood statistics: −2 log(p(sx|H0)/p(sx|H i

1)) > λ.Exact GLR: pη0 ml(sx) = 1

n∑n

j=1 xj , pηi0 ml(sx) = 1

i∑i


n−i∑n

j=i+1 xj .Approximate GLR on the whole window: pηi

0 ml(sx) ≈ pη0 ml(sx) = 1n∑n

j=1 xj .Approximate GLR in a dead region: pηi

0 ml(sx) ≈ pη0 ml(sx) ≈ 1n0∑n0

j=1 xj .

Model selection: −2 log(p(sx|H0)/p(sx|H i1)) > λ.

AIC: λ = 2d .BIC: λ = d log n.Penalized BIC: λ = γ d log n.

Links with distance-based approaches:12


0 ml(sx)


)+ (n − i)DKL

(Ppθi

1 ml(sx)


).

Heuristics:Threshold on the observations.Distance between the observations at successive frames.

Kernels methods:Equivalence between one-class support vector machines for novelty detection and approximateGLR statistics [Canu & Smola, 2006].






Change detection

Clarification of the relations between statistical approaches:Likelihood statistics: −2 log(p(sx|H0)/p(sx|H i

1)) > λ.Exact GLR: pη0 ml(sx) = 1

n∑n


i∑i


n−i∑n

j=i+1 xj .Approximate GLR on the whole window: pηi

0 ml(sx) ≈ pη0 ml(sx) = 1n∑n

j=1 xj .Approximate GLR in a dead region: pηi

0 ml(sx) ≈ pη0 ml(sx) ≈ 1n0∑n0

j=1 xj .

Model selection: −2 log(p(sx|H0)/p(sx|H i1)) > λ.

AIC: λ = 2d .BIC: λ = d log n.Penalized BIC: λ = γ d log n.

Links with distance-based approaches:12


0 ml(sx)


)+ (n − i)DKL

(Ppθi

1 ml(sx)


).

Heuristics:Threshold on the observations.Distance between the observations at successive frames.

Kernels methods:Equivalence between one-class support vector machines for novelty detection and approximateGLR statistics [Canu & Smola, 2006].






Segmentation into silence and activity

Parameters:Short-time sound representation: energy in a Mel-frequency filter bank at 11025 Hz.Statistical model: Rayleigh distributions.Topology: 1 dimension, continuous non-negative values.

0.5 1 1.5 2 2.5 3 3.5 4 4.5−1

0

1Original audio

0.5 1 1.5 2 2.5 3 3.5 4 4.5Time in seconds

Reference annotation

0.5 1 1.5 2 2.5 3 3.5 4 4.5

80

100

Short−time sound representation

SilenceActivity ActivitySilence Silence






Segmentation into music and speech

Parameters:Short-time sound representation: Mel-frequency cepstral coefficients at 11025 Hz.Parametric statistical model: multivariate spherical normal distributions fixed variance.Topology: 12 dimensions, continuous real values.

5 10 15 20 25 30 35 40 45 50−1

0

1Original audio

5 10 15 20 25 30 35 40 45 50


Time in seconds

Speech 1 Speech 2 Music 3Music 2Music 1






Segmentation into different speakers

Parameters:Short-time sound representation: Mel-frequency cepstral coefficients at 11025 Hz.Parametric statistical model: multivariate spherical normal distributions fixed variance.Topology: 12 dimensions, continuous real values.

−10

0

10Estimated Mel−frequency cepstral coefficients

−1

0

1Original audio

2 4 6 8 10 12 14Time in seconds


Speaker 2Speaker 1 Speaker 5Speaker 4Speaker 3






Segmentation into polyphonic note slices

Parameters:Short-time sound representation: normalized magnitude spectrum at 11025 Hz.Parametric statistical model: categorical distributions.Topology: 257 dimensions, discrete frequency histograms.

−1

0

1Original audio

0 5 10 15 20 25 30 35F2

A2#

D3#

G3#

C4#

F4#

B4

E5

A5

Time in seconds

Pitc

h

Reference piano roll






Evaluation on musical onset detection


Evaluation of generalized likelihood ratios GLR and spectral flux SF:Difficult dataset [Leveau et al., 2004].Standard methodology.

Algorithm Threshold P R F Distance functionGLR 5.00 60.93 68.55 64.52 Kullback-Leibler

SF 0.06 22.56 33.87 27.08 EuclideanSF 0.10 34.42 41.26 37.53 Kullback-LeiblerSF 0.17 40.20 42.74 41.43 Half-wave rectified difference

Comparison to the state-of-the-art:LFSF: offline spectral flux with a logarithmic frequency scale and filtering[Böck et al., 2012].TSPC: online spectral peak classification into transients and non-transients [Röbel, 2011].

Algorithm P R F

GLR 60.93 68.55 64.52LFSF 79.60 86.02 82.69TSPC 76.46 84.27 80.17












Algorithm P R F

GLR 60.93 68.55 64.52LFSF 79.60 86.02 82.69TSPC 76.46 84.27 80.17












Algorithm P R F

GLR 60.93 68.55 64.52LFSF 79.60 86.02 82.69TSPC 76.46 84.27 80.17






Discussion

Summary:Real-system for audio segmentation.Various types of signals and of homogeneity criteria.Sequential change detection with exponential families.Unification and generalization of several statistical and distance-based approaches.

Perspectives:Dependent observations:

Autoregressive models.Non-linear systems and particle filtering.

Improved robustness:Post-processing by smoothing, adaptation.Growing and sliding window heuristics.

Consideration of prior information:Maximum a posteriori.Full Bayesian framework.

Further applications:Audio processing and music information retrieval.Other domains in signal processing.






Discussion

Summary:Real-system for audio segmentation.Various types of signals and of homogeneity criteria.Sequential change detection with exponential families.Unification and generalization of several statistical and distance-based approaches.

Perspectives:Dependent observations:

Autoregressive models.Non-linear systems and particle filtering.

Improved robustness:Post-processing by smoothing, adaptation.Growing and sliding window heuristics.

Consideration of prior information:Maximum a posteriori.Full Bayesian framework.

Further applications:Audio processing and music information retrieval.Other domains in signal processing.





ContextOptimization frameworkMethods for convex-concave divergencesDiscussion

Outline













Background

Principle:Decompose the observed non-negative data as a linear combination of basis elements.Constrain the dictionary and encoding to be non-negative.

Applications:Bioinformatics, spectroscopy, email surveillance.Signal processing in text, image, audio.

Approaches:Euclidean cost function alternating non-negative least squares[Paatero & Tapper, 1994, Paatero, 1997].Euclidean or Kullback-Leibler cost functions and multiplicative updates[Lee & Seung, 1999, Lee & Seung, 2001].Many extensions [Berry et al., 2007, Cichocki et al., 2009].Statistical inference [Févotte & Cemgil, 2009, Cemgil, 2009].






Background









Background










Issues of approaches to optimizing factored models:Often particular cost functions or statistical models.Potential problems in convergence.With the exception of updates for classes of divergences[Cichocki et al., 2006, Cichocki et al., 2008, Dhillon & Sra, 2006, Dhillon & Sra, 2006,Kompass, 2007, Nakano et al., 2010, Févotte & Idier, 2011, Cichocki et al., 2011]..

Goals for providing generic methods:Common information divergences.Multiplicative updates.Convergence guarantees.

Contributions in this context:Generic scheme for arbitrary convex-concave divergences.Majorization-minimization scheme with these auxiliary functions.Known and novel updates for common information divergences.Multiplicative updates for α-divergences, β-divergences, (α, β)-divergences.Extension to skew divergences.
























Cost function minimization problem

Problem formulation:y ∈ Rm

+ is an observation vector.A ∈ Rm×r

+ is a dictionary matrix.Find an encoding vector x ∈ Rr

+ of y into A such that y ≈ Ax.

Optimization formulation:Separable divergence: D(y‖y′) =

∑mi=1 d(yi ‖y

′i ).

Cost function: C(x) = D(y‖Ax).Problem: minimize C(x) subject to x ∈ X .






Cost function minimization problem

Problem formulation:y ∈ Rm

+ is an observation vector.A ∈ Rm×r

+ is a dictionary matrix.Find an encoding vector x ∈ Rr

+ of y into A such that y ≈ Ax.Optimization formulation:

Separable divergence: D(y‖y′) =∑m

i=1 d(yi ‖y′i ).

Cost function: C(x) = D(y‖Ax).Problem: minimize C(x) subject to x ∈ X .






Variational bounding and auxiliary functions

Variational bounding:Iterative technique for minimization problems.Cost function replaced at each step with a surrogate majorizing function.

Auxiliary function: G(sx|sx) = C(sx) and G(x|sx) ≥ C(x).Iterative scheme: if G(x|sx) ≤ G(sx|sx), then C(x) ≤ C(sx).Majorization-minimization scheme: minimize G(x, sx) subject to x ∈ X .

Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.

Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).

Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.

Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).

Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:

minimize G(x, sx) subject to x ! X . (3.5)

The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.

We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.

48









48









48









48









48

AbstractC(·)G(·|sx)This thesis proposes novel computational methods of information geometry with

real-time applications in audio signal processing. In this context, we address inparallel the applicative problems of real-time audio segmentation, and of real-timepolyphonic music transcription. This is achieved by developing theoretical frame-works respectively for sequential change detection with exponential families, andfor non-negative matrix factorization with convex-concave divergences. On the onehand, sequential change detection is studied in the light of the dually flat informa-tion geometry of exponential families. We notably develop a generic and unifyingstatistical framework relying on multiple hypothesis testing with decision rules basedon exact generalized likelihood ratios. This is applied to devise a modular systemfor real-time audio segmentation with arbitrary types of signals and of homogeneitycriteria. The proposed system controls the information rate of the audio streamas it unfolds in time to detect changes. On the other hand, non-negative matrixfactorization is investigated by the way of convex-concave divergences on the spaceof discrete positive measures. In particular, we formulate a generic and unifyingoptimization framework for non-negative matrix factorization based on variationalbounding with auxiliary functions. This is employed to design a real-time systemfor polyphonic music transcription with an explicit control on the frequency com-promise during the analysis. The developed system decomposes the music signal asit arrives in time onto a dictionary of note spectral templates. These contributionsprovide interesting insights and directions for future research in the realm of audiosignal processing, and more generally of machine learning and signal processing,in the relatively young but nonetheless prolific field of computational informationgeometry.

Keywords: computational methods, information geometry, real-time applications,audio signal processing, change detection, exponential families, non–negative matrixfactorization, convex-concave divergences, audio segmentation, polyphonic musictranscription.

v




v






Variational bounding and auxiliary functions

Variational bounding:Iterative technique for minimization problems.Cost function replaced at each step with a surrogate majorizing function.

Auxiliary function: G(sx|sx) = C(sx) and G(x|sx) ≥ C(x).Iterative scheme: if G(x|sx) ≤ G(sx|sx), then C(x) ≤ C(sx).Majorization-minimization scheme: minimize G(x, sx) subject to x ∈ X .









48









48









48









48









48




v




v






Generic updates

Proposition (3.3)

For a convex-concave divergence, we have the following auxiliary function:

G(x|sx) =m∑i=1

{pd(yi

∥∥∥∥∥r∑

l=1ailsxl

)+

r∑k=1

aiksxk∑rl=1 ailsxl

qd(yi

∥∥∥∥∥r∑

l=1ailsxl

xksxk

)

+r∑

k=1aik(xk − sxk)

∂pd∂y ′

(yi

∥∥∥∥∥r∑

l=1ailsxl

)}.

Theorem (3.4)

For a convex-concave divergence, the cost function decreases monotonically byiteratively solving:

m∑i=1

aik∂qd∂y ′

(yi

∥∥∥∥∥r∑

l=1ailsxl

xksxk

)= −

m∑i=1

aik∂pd∂y ′

(yi

∥∥∥∥∥r∑

l=1ailsxl

).






Generic updates

Proposition (3.3)

For a convex-concave divergence, we have the following auxiliary function:

G(x|sx) =m∑i=1

{pd(yi

∥∥∥∥∥r∑

l=1ailsxl

)+

r∑k=1

aiksxk∑rl=1 ailsxl

qd(yi

∥∥∥∥∥r∑

l=1ailsxl

xksxk

)

+r∑

k=1aik(xk − sxk)

∂pd∂y ′

(yi

∥∥∥∥∥r∑

l=1ailsxl

)}.

Theorem (3.4)

For a convex-concave divergence, the cost function decreases monotonically byiteratively solving:

m∑i=1

aik∂qd∂y ′

(yi

∥∥∥∥∥r∑

l=1ailsxl

xksxk

)= −

m∑i=1

aik∂pd∂y ′

(yi

∥∥∥∥∥r∑

l=1ailsxl

).






Specific cases

Various informations divergences:Csiszár ϕ-divergences.α-divergences.Left-sided Bregman ϕ-divergences.Skew Jensen-Bregman (ϕ, λ)-divergences.Skew Jeffreys (β, λ)-divergences.Skew Jensen β-divergences.(α, β)-divergences.Skew (α, β, λ)-divergences.

Example (3.8)

Multiplicative updates for (α, β)-divergences:

xk = sxk ×(∑m

i=1 aik yαi (∑r

l=1 ail sxl)β−1∑m

i=1 aik(∑r

l=1 ail sxl)α+β−1

)pα,β.






Specific cases

Various informations divergences:Csiszár ϕ-divergences.α-divergences.Left-sided Bregman ϕ-divergences.Skew Jensen-Bregman (ϕ, λ)-divergences.Skew Jeffreys (β, λ)-divergences.Skew Jensen β-divergences.(α, β)-divergences.Skew (α, β, λ)-divergences.

Example (3.8)

Multiplicative updates for (α, β)-divergences:

xk = sxk ×(∑m

i=1 aik yαi (∑r

l=1 ail sxl)β−1∑m

i=1 aik(∑r

l=1 ail sxl)α+β−1

)pα,β.






Discussion

Summary:Non-negative matrix factorization with convex-concave divergences.Many common information divergences.Variational bounding with auxiliary functions.Symmetrized and skew divergences.

Perspectives:Enhanced factorization model:

Convex models, non-negative tensor models.Convolutive models.

Extended cost functions:Penalty terms for regularization.`p -norms for sparsity regularization.

Other generic updates:Majorization-equalization and tempered updates.Further assumptions on convexity properties.

Strong convergence properties:Convergence of the cost function to a minimum.Convergence of the updates.






Discussion

Summary:Non-negative matrix factorization with convex-concave divergences.Many common information divergences.Variational bounding with auxiliary functions.Symmetrized and skew divergences.

Perspectives:Enhanced factorization model:

Convex models, non-negative tensor models.Convolutive models.

Extended cost functions:Penalty terms for regularization.`p -norms for sparsity regularization.

Other generic updates:Majorization-equalization and tempered updates.Further assumptions on convexity properties.

Strong convergence properties:Convergence of the cost function to a minimum.Convergence of the updates.






Outline













Background

Principle:Convert a raw music signal into a symbolic representation such as a score.From a low-level audio waveform to high-level symbolic information.

Approaches:Many methods for multiple pitch estimation [de Cheveigné, 2006, Klapuri & Davy, 2006].Non-negative matrix factorization[Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006,Raczyński et al., 2007, Niedermayer, 2008, Marolt, 2009, Grindlay & Ellis, 2009, Vincent et al., 2010].Probabilistic latent component analysis [Smaragdis et al., 2008, Mysore & Smaragdis, 2009,Grindlay & Ellis, 2011, Hennequin et al., 2011, Fuentes et al., 2011, Benetos & Dixon, 2011].






Background

Principle:Convert a raw music signal into a symbolic representation such as a score.From a low-level audio waveform to high-level symbolic information.

Approaches:Many methods for multiple pitch estimation [de Cheveigné, 2006, Klapuri & Davy, 2006].Non-negative matrix factorization[Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006,Raczyński et al., 2007, Niedermayer, 2008, Marolt, 2009, Grindlay & Ellis, 2009, Vincent et al., 2010].Probabilistic latent component analysis [Smaragdis et al., 2008, Mysore & Smaragdis, 2009,Grindlay & Ellis, 2011, Hennequin et al., 2011, Fuentes et al., 2011, Benetos & Dixon, 2011].







Issues of approaches based on non-negative matrix factorization:Inherently suitable for offline processing.Need for structuring the dictionary.With the exception of systems for audio decomposition[Sha & Saul, 2005, Cheng et al., 2008, Cont, 2006, Cont et al., 2007].Specific cost functions.No suitable controls on the decomposition.Potential convergence issues.

Contributions in this context:Parametric family of (α, β)-divergences for decomposition.Insights into their relevancy for audio analysis.Flexible control on the frequency compromise during the decomposition.Multiplicative updates tailored to real time.Monotonic decrease of the cost function.






System architecture

Transcription scheme:1 Learn a dictionary offline for note

templates.2 Decompose the music signal online

onto the dictionary.3 Output the activations along time.

Note template learning.Characteristic and discriminativetemplates.Short-term magnitude or powerspectrum.Non-negative matrix factorization.

Note

Freq

uenc

y (H

z)

A0 A1 A2 A3 A4 A5 A6 A70

628

1255

1883

2510

3138

3765

4393

5020

5648

6275

−11

−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

Isolated note samples Auditory scene

Non-negative matrix factorization Non-negative decomposition

Note templates Note activations

Short-term sound representation Short-time sound representation

Note template learning (offline) Music signal decomposition (online)






System architecture

Transcription scheme:1 Learn a dictionary offline for note

templates.2 Decompose the music signal online

onto the dictionary.3 Output the activations along time.

Note template learning.Characteristic and discriminativetemplates.Short-term magnitude or powerspectrum.Non-negative matrix factorization.

Note

Freq

uenc

y (H

z)

A0 A1 A2 A3 A4 A5 A6 A70

628

1255

1883

2510

3138

3765

4393

5020

5648

6275

−11

−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

Isolated note samples Auditory scene

Non-negative matrix factorization Non-negative decomposition

Note templates Note activations

Short-term sound representation Short-time sound representation

Note template learning (offline) Music signal decomposition (online)






Non-negative decomposition

Scaling property of (α, β)-divergences: d(ab)α,β (γy‖γy ′) = γα+β d(ab)

α,β (y‖y ′).Different emphasis is put on the coefficients depending on their magnitude.α + β > 0: more emphasis is put on the higher magnitude coefficients.α + β < 0: converse effect.Compromise between fundamental frequencies, first partials, and higher partials.

Tailored updates in vector form: x← x⊗( (

A>⊗(

y�α er>))>

(Ax)� β−1

A>(Ax)�α+β−1

)� pα,β.

1 element-wise matrix multiplication per time frame.1 vector transposed replication per time frame.3 matrix-vector multiplications per iteration.1 element-wise vector multiplication per iteration.1 element-wise vector division per iteration.3 element-wise vector powers per iteration.






Non-negative decomposition

Scaling property of (α, β)-divergences: d(ab)α,β (γy‖γy ′) = γα+β d(ab)

α,β (y‖y ′).Different emphasis is put on the coefficients depending on their magnitude.α + β > 0: more emphasis is put on the higher magnitude coefficients.α + β < 0: converse effect.Compromise between fundamental frequencies, first partials, and higher partials.

Tailored updates in vector form: x← x⊗( (

A>⊗(

y�α er>))>

(Ax)� β−1

A>(Ax)�α+β−1

)� pα,β.

1 element-wise matrix multiplication per time frame.1 vector transposed replication per time frame.3 matrix-vector multiplications per iteration.1 element-wise vector multiplication per iteration.1 element-wise vector division per iteration.3 element-wise vector powers per iteration.






Sample example of piano music

β =

2α = −1

A0A1A2A3A4A5A6A7

α = 0 α = 1 α = 2

0

0.5

1β

= 1

A0A1A2A3A4A5A6A7

0

0.5

1

β =

0

A0A1A2A3A4A5A6A7

0

0.5

1

Time in seconds

β =

−1

0 2 4 6 8 10A0A1A2A3A4A5A6A7

Time in seconds0 2 4 6 8 10

Time in seconds0 2 4 6 8 10

Time in seconds

0 2 4 6 8 100

0.5

1






Evaluation on multiple fundamental frequency estimation

Evaluation:Standard piano dataset [Emiya et al., 2010].Methodology from MIREX [Bay et al., 2009].

Comparison of ABND to the state-of-the-art:BHNMF: offline unsupervised non-negative matrix factorization with the β-divergencesand a harmonic model exploiting spectral smoothness [Vincent et al., 2010].SACS: offline sinusoidal analysis with a candidate selection exploiting spectral features[Yeh et al., 2010].

Algorithm P R F A Esub Emis Efal Etot

ABND 67.23 73.85 70.39 54.31 6.24 19.91 29.76 55.91BHNMF 61.00 66.74 63.74 46.78 10.38 22.88 32.30 65.56SACS 60.03 70.84 64.99 48.13 16.35 12.81 30.83 59.99







α + β α β Threshold F Distance function0.0 −1.0 +1.0 0.007 55.62

−0.5 +0.5 0.008 60.13∓0.0 ±0.0+0.5 −0.5 0.011 64.00+1.0 −1.0 0.013 64.92 Itakura-Saito+1.5 −1.5 0.015 65.67+2.0 −2.0 0.016 66.19+2.5 −2.5 0.018 66.51+3.0 −3.0 0.019 66.75+3.5 −3.5 0.021 66.86+4.0 −4.0 0.022 66.94+4.5 −4.5 0.023 66.91+5.0 −5.0 0.024 66.87

0.5 −1.0 +1.5 0.009 61.13−0.5 +1.0 0.011 65.52∓0.0 +0.5+0.5 ±0.0 0.015 69.19+1.0 −0.5 0.017 69.92 β-divergence with β = 0.5+1.5 −1.0 0.018 70.27+2.0 −1.5 0.020 70.37+2.5 −2.0 0.022 70.39+3.0 −2.5 0.023 70.35+3.5 −3.0 0.025 70.27+4.0 −3.5 0.026 70.17+4.5 −4.0 0.027 70.04+5.0 −4.5 0.028 69.92

1.0 −1.0 +2.0 0.013 62.89 Neyman’s χ2−0.5 +1.5 0.016 65.76 α-divergence with α = −0.5∓0.0 +1.0 0.018 66.92 Dual Kullback-Leibler+0.5 +0.5 0.021 67.19 Hellinger+1.0 ±0.0 0.023 67.19 Kullback-Leibler+1.5 −0.5 0.024 67.09 α-divergence with α = 1.5+2.0 −1.0 0.026 66.94 Pearson’s χ2+2.5 −1.5 0.028 66.78 α-divergence with α = 2.5+3.0 −2.0 0.028 66.58 α-divergence with α = 3+3.5 −2.5 0.030 66.37 α-divergence with α = 3.5+4.0 −3.0 0.031 66.19 α-divergence with α = 4+4.5 −3.5 0.031 66.00 α-divergence with α = 4.5+5.0 −4.0 0.032 65.78 α-divergence with α = 5










ABND 67.23 73.85 70.39 54.31 6.24 19.91 29.76 55.91BHNMF 61.00 66.74 63.74 46.78 10.38 22.88 32.30 65.56SACS 60.03 70.84 64.99 48.13 16.35 12.81 30.83 59.99










ABND 67.23 73.85 70.39 54.31 6.24 19.91 29.76 55.91BHNMF 61.00 66.74 63.74 46.78 10.38 22.88 32.30 65.56SACS 60.03 70.84 64.99 48.13 16.35 12.81 30.83 59.99






Evaluation on multiple fundamental frequency trackingSimilar evaluation framework.

α + β α β Pruning Threshold F1 M10.5 +2.5 −2.0 1 0.069 68.80 53.36

2 0.065 71.84 54.193 0.046 74.47 56.424 0.046 76.43 56.685 0.038 77.82 57.15

Algorithm P1 R1 F1 M1 P2 R2 F2 M2ABND 77.73 77.90 77.82 57.15 28.93 28.99 28.96 77.08BHNMF 58.09 73.71 64.98 57.66 20.72 26.29 23.17 78.64SACS 33.00 58.83 42.29 55.10 11.59 20.67 14.86 82.17

2010 MIREX evaluation campaign.






Evaluation on multiple fundamental frequency trackingSimilar evaluation framework.

α + β α β Pruning Threshold F1 M10.5 +2.5 −2.0 1 0.069 68.80 53.36

2 0.065 71.84 54.193 0.046 74.47 56.424 0.046 76.43 56.685 0.038 77.82 57.15

Algorithm P1 R1 F1 M1 P2 R2 F2 M2ABND 77.73 77.90 77.82 57.15 28.93 28.99 28.96 77.08BHNMF 58.09 73.71 64.98 57.66 20.72 26.29 23.17 78.64SACS 33.00 58.83 42.29 55.10 11.59 20.67 14.86 82.17

2010 MIREX evaluation campaign.






Discussion

Summary:Real-time system for polyphonic music transcription.Non-negative decomposition with (α, β)-divergences.Control on the frequency compromise and convergence guarantees.

Perspectives:Non-stationary spectral templates:

Extended models for time-varying objects.State representation of sounds.

Other representations:Non-linear frequency scales, wavelet transforms, modulation spectrum.Multi-channel information with tensors.

Improved robustness:Post-processing by modeling the activations.Other information divergences.

Generalization capacities:Hierarchical instrument basis.Adaptive templates and online dictionary learning.






Discussion

Summary:Real-time system for polyphonic music transcription.Non-negative decomposition with (α, β)-divergences.Control on the frequency compromise and convergence guarantees.

Perspectives:Non-stationary spectral templates:

Extended models for time-varying objects.State representation of sounds.

Other representations:Non-linear frequency scales, wavelet transforms, modulation spectrum.Multi-channel information with tensors.

Improved robustness:Post-processing by modeling the activations.Other information divergences.

Generalization capacities:Hierarchical instrument basis.Adaptive templates and online dictionary learning.





Conclusion

Summary of the present work:Study the application of computational information geometry to audio signal processing.From sequential change detection to audio segmentation.From non-negative matrix factorization to polyphonic music transcription:

Perspectives for future work:Sequential change detection and audio segmentation:

Generalization to address statistical dependence for more accurate models.Other estimators than maximum likelihood for a priori knowledge or improved robustness.

Non-negative matrix factorization and polyphonic music transcription:Direct extensions to convex models and tensors for richer information.More elaborate cost functions for alternative controls.

General directions of investigation:Apply other novel or existing computational methods to audio signal processing.Apply the proposed computational methods to broader applications and domains.

Scientific exchange:Japanese-French Laboratory for Informatics, Tokyo University.Organization of Brillouin Seminar Series.Geometric Science of Information 2013.





Conclusion











Conclusion











Bibliography I

Abdallah, S. A. & Plumbley, M. D. (2004).Polyphonic music transcription by non-negative sparse coding of power spectra.In 5th International Conference on Music Information Retrieval (ISMIR) (pp. 318–325). Barcelona, Spain.

Basseville, M. & Nikiforov, I. V. (1993).Detection of Abrupt Changes: Theory and Application.Upper Saddle River, NJ, USA: Prentice-Hall, Inc.

Bay, M., Ehmann, A. F., & Downie, J. S. (2009).Evaluation of multiple-F0 estimation and tracking systems.In 10th International Society for Music Information Retrieval Conference (ISMIR) (pp. 315–320). Kobe, Japan.

Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005).A tutorial on onset detection in music signals.IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.

Benetos, E. & Dixon, S. (2011).Multiple-instrument polyphonic music transcription using a convolutive probabilistic model.In 8th Sound and Music Computing Conference (SMC) (pp. 19–24). Padova, Italy.

Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., & Plemmons, R. J. (2007).Algorithms and applications for approximate nonnegative matrix factorization.Computational Statistics & Data Analysis, 52(1), 155–173.

Böck, S., Krebs, F., & Schedl, M. (2012).Evaluating the online capabilities of onset detection methods.In 13th International Society for Music Information Retrieval Conference (ISMIR) (pp. 49–54). Porto, Portugal.





Bibliography II

Canu, S. & Smola, A. (2006).Kernel methods and the exponential family.Neurocomputing, 69(7–9), 714–720.

Cemgil, A. T. (2009).Bayesian inference for nonnegative matrix factorisation models.Computational Intelligence and Neuroscience, 2009, 17 pages.

Chen, J. & Gupta, A. K. (2012).Parametric Statistical Change Point Analysis: With Applications to Genetics, Medecine, and Finance.New York, NY, USA: Birkhäuser, second edition.

Cheng, C.-C., Hu, D. J., & Saul, L. K. (2008).Nonnegative matrix factorization for real time musical analysis and sight-reading evaluation.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 2017–2020). Las Vegas, NV, USA.

Cichocki, A., Cruces, S., & Amari, S.-i. (2011).Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization.Entropy, 13(1), 134–170.

Cichocki, A., Lee, H., Kim, Y.-D., & Choi, S. (2008).Non-negative matrix factorization with α-divergence.Pattern Recognition Letters, 29(9), 1433–1440.





Bibliography III

Cichocki, A., Zdunek, R., & Amari, S.-i. (2006).Csiszár’s divergences for non-negative matrix factorization: Family of new algorithms.In J. Rosca, D. Erdogmus, J. C. Príncipe, & S. Haykin (Eds.), Independent Component Analysis and Blind Signal Separation: 6thInternational Conference, ICA 2006, Charleston, SC, USA, March 5-8, 2006, Proceedings, volume 3889 of Lecture Notes inComputer Science (pp. 32–39). Berlin/Heidelberg, Germany: Springer.

Cichocki, A., Zdunek, R., Phan, A. H., & Amari, S.-i. (2009).Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation.Chichester, UK: Wiley.

Cont, A. (2006).Realtime multiple pitch observation using sparse non-negative constraints.In 7th International Conference on Music Information Retrieval (ISMIR) (pp. 206–211). Victoria, Canada.

Cont, A., Dubnov, S., & Wessel, D. (2007).Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints.In 10th International Conference on Digital Audio Effects (DAFx) (pp. 85–92). Bordeaux, France.

de Cheveigné, A. (2006).Multiple F0 estimation.In D. Wang & G. J. Brown (Eds.), Computational Auditory Scene Analysis: Principles, Algorithms and Applications chapter 2,(pp. 45–79). Hoboken, NJ, USA: Wiley-IEEE Press.

Desobry, F., Davy, M., & Doncarli, C. (2005).An online kernel change detection algorithm.IEEE Transactions on Signal Processing, 53(8), 2961–2974.





Bibliography IV

Dhillon, I. S. & Sra, S. (2006).Generalized nonnegative matrix approximations with Bregman divergences.In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 18 (pp.283–290). Cambridge, MA, USA: MIT Press.

Dixon, S. (2006).Onset detection revisited.In 9th International Conference on Digital Audio Effects (DAFx) (pp. 133–137). Montreal, Canada.

Emiya, V., Badeau, R., & David, B. (2010).Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle.IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.

Févotte, C. & Cemgil, A. T. (2009).Nonnegative matrix factorizations as probabilistic inference in composite models.In 17th European Signal Processing Conference (EUSIPCO) (pp. 1913–1917). Glasgow, UK.

Févotte, C. & Idier, J. (2011).Algorithms for nonnegative matrix factorization with the β-divergence.Neural Computation, 23(9), 2421–2456.

Fuentes, B., Badeau, R., & Richard, G. (2011).Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 401–404). Prague, Czech Republic.

Grindlay, G. & Ellis, D. P. W. (2009).Multi-voice polyphonic music transcription using eigeninstruments.In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 53–56). New Paltz, NY, USA.





Bibliography V

Grindlay, G. & Ellis, D. P. W. (2011).Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments.IEEE Journal of Selected Topics in Signal Processing, 5(6), 1159–1169.

Harchaoui, Z., Bach, F., & Moulines, E. (2009).Kernel change-point analysis.In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (NIPS),volume 21 (pp. 609–616). La Jolla, CA, USA: NIPS Foundation.

Harchaoui, Z. & Lévy-Leduc, C. (2008).Catching change-points with lasso.In J. C. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 20(pp. 617–624). Cambridge, MA, USA: MIT Press.

Harchaoui, Z. & Lévy-Leduc, C. (2010).Multiple change-point estimation with a total variation penalty.Journal of the American Statistical Association, 105(492), 1480–1493.

Hennequin, R., Badeau, R., & David, B. (2011).Scale-invariant probabilistic latent component analysis.In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 129–132). New Paltz, NY, USA.

Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000).Strategies for automatic segmentation of audio data.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3 (pp. 1423–1426). Istanbul,Turkey.





Bibliography VI

Klapuri, A. & Davy, M., Eds. (2006).Signal Processing Methods for Music Transcription.New York, NY, USA: Springer.

Kompass, R. (2007).A generalized divergence measure for nonnegative matrix factorization.Neural Computation, 19(3), 780–791.

Kotti, M., Moschou, V., & Kotropoulos, C. (2008).Speaker segmentation and clustering.Signal Processing, 88(5), 1091–1124.

Lai, T. L. (1995).Sequential changepoint detection in quality control and dynamical systems.Journal of the Royal Statistical Society: Series B (Methodological), 57(4), 613–658.

Lai, T. L. & Xing, H. (2010).Sequential change-point detection when the pre- and post-change parameters are unknown.Sequential Analysis: Design Methods and Applications, 29(2), 162–175.

Lee, D. D. & Seung, H. S. (1999).Learning the parts of objects by non-negative matrix factorization.Nature, 401(6755), 788–791.

Lee, D. D. & Seung, H. S. (2001).Algorithms for non-negative matrix factorization.In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 13 (pp.556–562). Cambridge, MA, USA: MIT Press.





Bibliography VII

Leveau, P., Daudet, L., & Richard, G. (2004).Methodology and tools for the evaluation of automatic onset detection algorithms in music.In 5th International Conference on Music Information Retrieval (ISMIR) (pp. 72–75). Barcelona, Spain.

Marolt, M. (2009).Non-negative matrix factorization with selective sparsity constraints for transcription of bell chiming recordings.In 6th Sound and Music Computing Conference (SMC) (pp. 137–142). Porto, Portugal.

Mei, Y. (2006).Sequential change-point detection when unknown parameters are present in the pre-change distribution.The Annals of Statistics, 34(1), 92–122.

Mysore, G. J. & Smaragdis, P. (2009).Relative pitch estimation of multiple instruments.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 313–316). Taipei, Taiwan.

Nakano, M., Kameoka, H., Le Roux, J., Kitano, Y., Ono, N., & Sagayama, S. (2010).Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence.In IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 283–288). Kittilä, Finland.

Niedermayer, B. (2008).Non-negative matrix division for the automatic transcription of polyphonic music.In 9th International Conference on Music Information Retrieval (ISMIR) (pp. 544–549). Philadelphia, PA, USA.

Paatero, P. (1997).Least squares formulation of robust non-negative factor analysis.Chemometrics and Intelligent Laboratory Systems, 37(1), 23–35.





Bibliography VIII

Paatero, P. & Tapper, U. (1994).Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2), 111–126.

Polunchenko, A. S. & Tartakovsky, A. G. (2012).State-of-the-art in sequential change-point detection.Methodology and Computing in Applied Probability, 14(3), 649–684.

Poor, V. H. & Hadjiliadis, O. (2009).Quickest Detection.New York, NY, USA: Cambridge University Press.

Raczyński, S. A., Ono, N., & Sagayama, S. (2007).Multipitch analysis with harmonic nonnegative matrix approximation.In 8th International Conference on Music Information Retrieval (ISMIR) (pp. 381–386). Vienna, Austria.

Röbel, A. (2011).Onset detection by means of transient peak classification.In Music Information Retrieval Evaluation eXchange (MIREX).

Sha, F. & Saul, L. K. (2005).Real-time pitch determination of one or more voices by nonnegative matrix factorization.In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 17 (pp.1233–1240). Cambridge, MA, USA: MIT Press.

Siegmund, D. & Venkatraman, E. S. (1995).Using the generalized likelihood ratio statistic for sequential detection of a change-point.The Annals of Statistics, 23(1), 255–271.





Bibliography IX

Smaragdis, P. & Brown, J. C. (2003).Non-negative matrix factorization for polyphonic music transcription.In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 177–180). New Paltz, NY, USA.

Smaragdis, P., Raj, B., & Shashanka, M. (2008).Sparse and shift-invariant feature extraction from non-negative data.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 2069–2072). Las Vegas, NV, USA.

Vert, J.-P. & Bleakley, K. (2010).Fast detection of multiple change-points shared by many signals using group LARS.In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information ProcessingSystems (NIPS), volume 23 (pp. 2343–2351). La Jolla, CA, USA: NIPS Foundation.

Vincent, E., Bertin, N., & Badeau, R. (2010).Adaptive harmonic spectral decomposition for multiple pitch estimation.IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 528–537.

Virtanen, T. & Klapuri, A. (2006).Analysis of polyphonic audio using source-filter model and non-negative matrix factorization.In NIPS Workshop on Advances in Models for Acoustic Processing Whistler, Canada.

Yeh, C., Röbel, A., & Rodet, X. (2010).Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals.IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1116–1126.


Ph.D. Defense - IRCAMimtr.ircam.fr/imtr/images/PhD_Defense.pdf · Ph.D. Defense Computational Methods of Information Geometry with Real-Time Applications in Audio Signal Processing

Documents