Preliminaries on Information Geometry Sequential Change Detection with Exponential Families Real-Time Audio Segmentation Non-Negative Matrix Factorization with Convex-Concave Divergences Real-Time Polyphonic Music Transcription Ph.D. Defense Computational Methods of Information Geometry with Real-Time Applications in Audio Signal Processing Arnaud Dessein December 13th 2012 M. Gérard Assayag Directeur M. Arshia Cont Encadrant M. Francis Bach Rapporteur M. Frank Nielsen Rapporteur M. Roland Badeau Examinateur M. Silvère Bonnabel Examinateur M. Jean-Luc Zarader Examinateur [email protected]December 13th 2012 Ph.D. Defense 1/46
99
Embed
Ph.D. Defense - IRCAMimtr.ircam.fr/imtr/images/PhD_Defense.pdf · Ph.D. Defense Computational Methods of Information Geometry with Real-Time Applications in Audio Signal Processing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Ph.D. Defense
Computational Methods of Information Geometrywith Real-Time Applications in Audio Signal Processing
Arnaud Dessein
December 13th 2012
M. Gérard Assayag DirecteurM. Arshia Cont EncadrantM. Francis Bach RapporteurM. Frank Nielsen RapporteurM. Roland Badeau ExaminateurM. Silvère Bonnabel ExaminateurM. Jean-Luc Zarader Examinateur
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
IntroductionFrom information geometry theory:
Study of statistics with concepts from:Differential geometry such as smooth manifolds.Information theory such as statistical divergences.
Parametric statistical models possess an intrinsic geometrical structure.
To computational information geometry:Broad community around the development and application of computational methodsbased on information geometry theory.Many techniques in machine learning and signal processing rely on statistical models ordistance functions: principal component analysis, independent component analysis, centroidcomputation, k-means, expectation-maximization, nearest neighbor search, range search, smallestenclosing balls, Voronoi diagrams.
Objectives of the thesis:Employ this framework for audio signal processing.Primary motivations from real-time machine listening.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
IntroductionFrom information geometry theory:
Study of statistics with concepts from:Differential geometry such as smooth manifolds.Information theory such as statistical divergences.
Parametric statistical models possess an intrinsic geometrical structure.To computational information geometry:
Broad community around the development and application of computational methodsbased on information geometry theory.Many techniques in machine learning and signal processing rely on statistical models ordistance functions: principal component analysis, independent component analysis, centroidcomputation, k-means, expectation-maximization, nearest neighbor search, range search, smallestenclosing balls, Voronoi diagrams.
Objectives of the thesis:Employ this framework for audio signal processing.Primary motivations from real-time machine listening.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
IntroductionFrom information geometry theory:
Study of statistics with concepts from:Differential geometry such as smooth manifolds.Information theory such as statistical divergences.
Parametric statistical models possess an intrinsic geometrical structure.To computational information geometry:
Broad community around the development and application of computational methodsbased on information geometry theory.Many techniques in machine learning and signal processing rely on statistical models ordistance functions: principal component analysis, independent component analysis, centroidcomputation, k-means, expectation-maximization, nearest neighbor search, range search, smallestenclosing balls, Voronoi diagrams.
Objectives of the thesis:Employ this framework for audio signal processing.Primary motivations from real-time machine listening.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Separable divergences on the space of discrete positive measuresExponential families of probability distributions
Basic notions and properties
Exponential family: pθ(x) = exp(θ>x− ψ(θ)).The sufficient observations x belong to Rm.The natural parameters θ belong to a convex set N ⊆ Rm.The log-normalizer ψ is convex on N and smooth on intN .
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Separable divergences on the space of discrete positive measuresExponential families of probability distributions
Basic notions and properties
Exponential family: pθ(x) = exp(θ>x− ψ(θ)).The sufficient observations x belong to Rm.The natural parameters θ belong to a convex set N ⊆ Rm.The log-normalizer ψ is convex on N and smooth on intN .
Legendre-Fenchel conjugate: φ(η) = supθ∈Rm θ>η − ψ(θ).The expectation parameters η belong to the convex set intK.We have duality between natural and expectation parameters through ∇ψ and ∇φ.
Maximum likelihood: pηml(x1, . . . , xn) = 1n∑n
j=1 xj .Simple arithmetic mean in expectation parameters.Natural parameters obtained by convex duality.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Separable divergences on the space of discrete positive measuresExponential families of probability distributions
Basic notions and properties
Exponential family: pθ(x) = exp(θ>x− ψ(θ)).The sufficient observations x belong to Rm.The natural parameters θ belong to a convex set N ⊆ Rm.The log-normalizer ψ is convex on N and smooth on intN .
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Separable divergences on the space of discrete positive measuresExponential families of probability distributions
Basic notions and properties
Exponential family: pθ(x) = exp(θ>x− ψ(θ)).The sufficient observations x belong to Rm.The natural parameters θ belong to a convex set N ⊆ Rm.The log-normalizer ψ is convex on N and smooth on intN .
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Background
Principle:Decide whether the process presents some structural modifications along time.Find the time instants corresponding to the different change points.Characterize the properties within the respective segments.
Applications:Quality control in industrial production.Fault detection in technological processes.Automatic surveillance for intrusion and abnormal behavior in security monitoring.Signal processing in geophysics, econometrics, audio, medicine, image.
Approaches:Statistical modeling and monitoring of the distributions [Basseville & Nikiforov, 1993,Lai, 1995, Poor & Hadjiliadis, 2009, Chen & Gupta, 2012, Polunchenko & Tartakovsky, 2012].Machine learning techniques relying on distance functions [Harchaoui & Lévy-Leduc, 2008,Harchaoui & Lévy-Leduc, 2010, Vert & Bleakley, 2010, Desobry et al., 2005, Harchaoui et al., 2009].
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Background
Principle:Decide whether the process presents some structural modifications along time.Find the time instants corresponding to the different change points.Characterize the properties within the respective segments.
Applications:Quality control in industrial production.Fault detection in technological processes.Automatic surveillance for intrusion and abnormal behavior in security monitoring.Signal processing in geophysics, econometrics, audio, medicine, image.
Approaches:Statistical modeling and monitoring of the distributions [Basseville & Nikiforov, 1993,Lai, 1995, Poor & Hadjiliadis, 2009, Chen & Gupta, 2012, Polunchenko & Tartakovsky, 2012].Machine learning techniques relying on distance functions [Harchaoui & Lévy-Leduc, 2008,Harchaoui & Lévy-Leduc, 2010, Vert & Bleakley, 2010, Desobry et al., 2005, Harchaoui et al., 2009].
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Background
Principle:Decide whether the process presents some structural modifications along time.Find the time instants corresponding to the different change points.Characterize the properties within the respective segments.
Applications:Quality control in industrial production.Fault detection in technological processes.Automatic surveillance for intrusion and abnormal behavior in security monitoring.Signal processing in geophysics, econometrics, audio, medicine, image.
Approaches:Statistical modeling and monitoring of the distributions [Basseville & Nikiforov, 1993,Lai, 1995, Poor & Hadjiliadis, 2009, Chen & Gupta, 2012, Polunchenko & Tartakovsky, 2012].Machine learning techniques relying on distance functions [Harchaoui & Lévy-Leduc, 2008,Harchaoui & Lévy-Leduc, 2010, Vert & Bleakley, 2010, Desobry et al., 2005, Harchaoui et al., 2009].
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Motivations and contributions
Issues of statistical online approaches:Either approximations of the exact statistics with unknown parameters for tractability.Or restrictions on the data and scenarios [Siegmund & Venkatraman, 1995, Mei, 2006].With the exception of a full Bayesian framework for exponential families[Lai & Xing, 2010].
Goals in the non-Bayesian framework:Known or unknown parameters.Additive or non-additive changes.Topology of the parameters and data.Exact inference for online schemes.
Contributions in this context:Study of the generalized likelihood ratios within the dually flat information geometry.Estimation with arbitrary estimators compared to maximum likelihood.Alternative expression of the statistics through convex duality.Attractive simplification for exact inference with maximum likelihood.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Motivations and contributions
Issues of statistical online approaches:Either approximations of the exact statistics with unknown parameters for tractability.Or restrictions on the data and scenarios [Siegmund & Venkatraman, 1995, Mei, 2006].With the exception of a full Bayesian framework for exponential families[Lai & Xing, 2010].
Goals in the non-Bayesian framework:Known or unknown parameters.Additive or non-additive changes.Topology of the parameters and data.Exact inference for online schemes.
Contributions in this context:Study of the generalized likelihood ratios within the dually flat information geometry.Estimation with arbitrary estimators compared to maximum likelihood.Alternative expression of the statistics through convex duality.Attractive simplification for exact inference with maximum likelihood.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Motivations and contributions
Issues of statistical online approaches:Either approximations of the exact statistics with unknown parameters for tractability.Or restrictions on the data and scenarios [Siegmund & Venkatraman, 1995, Mei, 2006].With the exception of a full Bayesian framework for exponential families[Lai & Xing, 2010].
Goals in the non-Bayesian framework:Known or unknown parameters.Additive or non-additive changes.Topology of the parameters and data.Exact inference for online schemes.
Contributions in this context:Study of the generalized likelihood ratios within the dually flat information geometry.Estimation with arbitrary estimators compared to maximum likelihood.Alternative expression of the statistics through convex duality.Attractive simplification for exact inference with maximum likelihood.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Discussion
Summary:Standard non-Bayesian approach to sequential change detection.Dually flat information geometry of exponential families.Generalized likelihood ratios with arbitrary estimators.Attractive scheme for exact inference when unknown parameters.
Perspectives:Direct extensions:
Non-steep or curved exponential families.Maximum a posteriori estimators.
Asymptotic properties:Distribution of the test statistics.Optimality formulation and analysis.
Statistical dependence:Autoregressive models.Non-linear systems and particle filtering.
Alternative test statistics:Reversing the problem and starting from geometric considerations.Information divergences and more robust estimators.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextStatistical frameworkMethods for exponential familiesDiscussion
Discussion
Summary:Standard non-Bayesian approach to sequential change detection.Dually flat information geometry of exponential families.Generalized likelihood ratios with arbitrary estimators.Attractive scheme for exact inference when unknown parameters.
Perspectives:Direct extensions:
Non-steep or curved exponential families.Maximum a posteriori estimators.
Asymptotic properties:Distribution of the test statistics.Optimality formulation and analysis.
Statistical dependence:Autoregressive models.Non-linear systems and particle filtering.
Alternative test statistics:Reversing the problem and starting from geometric considerations.Information divergences and more robust estimators.
Principle:Determine time boundaries that partition a sound into homogeneous and continuoustemporal segments, such that adjacent segments exhibit inhomogeneties.Define a criterion to quantify the homogeneity of the segments.
Approaches:Supervised: high-level classes and automatic classification.Unsupervised: statistical and distance-based approaches:
Musical onset detection [Bello et al., 2005, Dixon, 2006].Speaker segmentation [Kemp et al., 2000, Kotti et al., 2008].
Principle:Determine time boundaries that partition a sound into homogeneous and continuoustemporal segments, such that adjacent segments exhibit inhomogeneties.Define a criterion to quantify the homogeneity of the segments.
Approaches:Supervised: high-level classes and automatic classification.Unsupervised: statistical and distance-based approaches:
Musical onset detection [Bello et al., 2005, Dixon, 2006].Speaker segmentation [Kemp et al., 2000, Kotti et al., 2008].
Issues of unsupervised approaches to audio segmentation:Often tailored to particular types of signal and homogeneity criterion.Specific distance functions or models.Some are offline.Others approximate the exact statistics.
Goals towards a unifying framework for audio segmentation:Arbitrary types of signals homogeneity criteria.Large choice of distance functions or models.Real-time constraints.Exact online inference.
Contributions in this context:Generic framework for real-time audio segmentation.Unification of several standard approaches.Online change detection with exponential families.Exact generalized likelihood ratios and maximum likelihood.
Issues of unsupervised approaches to audio segmentation:Often tailored to particular types of signal and homogeneity criterion.Specific distance functions or models.Some are offline.Others approximate the exact statistics.
Goals towards a unifying framework for audio segmentation:Arbitrary types of signals homogeneity criteria.Large choice of distance functions or models.Real-time constraints.Exact online inference.
Contributions in this context:Generic framework for real-time audio segmentation.Unification of several standard approaches.Online change detection with exponential families.Exact generalized likelihood ratios and maximum likelihood.
Issues of unsupervised approaches to audio segmentation:Often tailored to particular types of signal and homogeneity criterion.Specific distance functions or models.Some are offline.Others approximate the exact statistics.
Goals towards a unifying framework for audio segmentation:Arbitrary types of signals homogeneity criteria.Large choice of distance functions or models.Real-time constraints.Exact online inference.
Contributions in this context:Generic framework for real-time audio segmentation.Unification of several standard approaches.Online change detection with exponential families.Exact generalized likelihood ratios and maximum likelihood.
Segmentation scheme:1 Represent frames with a short-time sound description.2 Model the observations with probability distributions.3 Detect sequentially changes in the distribution parameters.
Short-time sound representation:Energy for information on loudness.Fourier transform for information on spectral content.Mel-frequency cepstral coefficients for information on timbre.Many other possibilities.
Statistical modeling:Exponential families and generalized likelihood ratios.Unknown parameters and maximum likelihood.
Segmentation scheme:1 Represent frames with a short-time sound description.2 Model the observations with probability distributions.3 Detect sequentially changes in the distribution parameters.
Short-time sound representation:Energy for information on loudness.Fourier transform for information on spectral content.Mel-frequency cepstral coefficients for information on timbre.Many other possibilities.
Statistical modeling:Exponential families and generalized likelihood ratios.Unknown parameters and maximum likelihood.
Segmentation scheme:1 Represent frames with a short-time sound description.2 Model the observations with probability distributions.3 Detect sequentially changes in the distribution parameters.
Short-time sound representation:Energy for information on loudness.Fourier transform for information on spectral content.Mel-frequency cepstral coefficients for information on timbre.Many other possibilities.
Statistical modeling:Exponential families and generalized likelihood ratios.Unknown parameters and maximum likelihood.
Parameters:Short-time sound representation: energy in a Mel-frequency filter bank at 11025 Hz.Statistical model: Rayleigh distributions.Topology: 1 dimension, continuous non-negative values.
Comparison to the state-of-the-art:LFSF: offline spectral flux with a logarithmic frequency scale and filtering[Böck et al., 2012].TSPC: online spectral peak classification into transients and non-transients [Röbel, 2011].
Comparison to the state-of-the-art:LFSF: offline spectral flux with a logarithmic frequency scale and filtering[Böck et al., 2012].TSPC: online spectral peak classification into transients and non-transients [Röbel, 2011].
Comparison to the state-of-the-art:LFSF: offline spectral flux with a logarithmic frequency scale and filtering[Böck et al., 2012].TSPC: online spectral peak classification into transients and non-transients [Röbel, 2011].
Summary:Real-system for audio segmentation.Various types of signals and of homogeneity criteria.Sequential change detection with exponential families.Unification and generalization of several statistical and distance-based approaches.
Perspectives:Dependent observations:
Autoregressive models.Non-linear systems and particle filtering.
Improved robustness:Post-processing by smoothing, adaptation.Growing and sliding window heuristics.
Consideration of prior information:Maximum a posteriori.Full Bayesian framework.
Further applications:Audio processing and music information retrieval.Other domains in signal processing.
Summary:Real-system for audio segmentation.Various types of signals and of homogeneity criteria.Sequential change detection with exponential families.Unification and generalization of several statistical and distance-based approaches.
Perspectives:Dependent observations:
Autoregressive models.Non-linear systems and particle filtering.
Improved robustness:Post-processing by smoothing, adaptation.Growing and sliding window heuristics.
Consideration of prior information:Maximum a posteriori.Full Bayesian framework.
Further applications:Audio processing and music information retrieval.Other domains in signal processing.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextOptimization frameworkMethods for convex-concave divergencesDiscussion
Motivations and contributions
Issues of approaches to optimizing factored models:Often particular cost functions or statistical models.Potential problems in convergence.With the exception of updates for classes of divergences[Cichocki et al., 2006, Cichocki et al., 2008, Dhillon & Sra, 2006, Dhillon & Sra, 2006,Kompass, 2007, Nakano et al., 2010, Févotte & Idier, 2011, Cichocki et al., 2011]..
Goals for providing generic methods:Common information divergences.Multiplicative updates.Convergence guarantees.
Contributions in this context:Generic scheme for arbitrary convex-concave divergences.Majorization-minimization scheme with these auxiliary functions.Known and novel updates for common information divergences.Multiplicative updates for α-divergences, β-divergences, (α, β)-divergences.Extension to skew divergences.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextOptimization frameworkMethods for convex-concave divergencesDiscussion
Motivations and contributions
Issues of approaches to optimizing factored models:Often particular cost functions or statistical models.Potential problems in convergence.With the exception of updates for classes of divergences[Cichocki et al., 2006, Cichocki et al., 2008, Dhillon & Sra, 2006, Dhillon & Sra, 2006,Kompass, 2007, Nakano et al., 2010, Févotte & Idier, 2011, Cichocki et al., 2011]..
Goals for providing generic methods:Common information divergences.Multiplicative updates.Convergence guarantees.
Contributions in this context:Generic scheme for arbitrary convex-concave divergences.Majorization-minimization scheme with these auxiliary functions.Known and novel updates for common information divergences.Multiplicative updates for α-divergences, β-divergences, (α, β)-divergences.Extension to skew divergences.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextOptimization frameworkMethods for convex-concave divergencesDiscussion
Motivations and contributions
Issues of approaches to optimizing factored models:Often particular cost functions or statistical models.Potential problems in convergence.With the exception of updates for classes of divergences[Cichocki et al., 2006, Cichocki et al., 2008, Dhillon & Sra, 2006, Dhillon & Sra, 2006,Kompass, 2007, Nakano et al., 2010, Févotte & Idier, 2011, Cichocki et al., 2011]..
Goals for providing generic methods:Common information divergences.Multiplicative updates.Convergence guarantees.
Contributions in this context:Generic scheme for arbitrary convex-concave divergences.Majorization-minimization scheme with these auxiliary functions.Known and novel updates for common information divergences.Multiplicative updates for α-divergences, β-divergences, (α, β)-divergences.Extension to skew divergences.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextOptimization frameworkMethods for convex-concave divergencesDiscussion
Variational bounding and auxiliary functions
Variational bounding:Iterative technique for minimization problems.Cost function replaced at each step with a surrogate majorizing function.
Auxiliary function: G(sx|sx) = C(sx) and G(x|sx) ≥ C(x).Iterative scheme: if G(x|sx) ≤ G(sx|sx), then C(x) ≤ C(sx).Majorization-minimization scheme: minimize G(x, sx) subject to x ∈ X .
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
AbstractC(·)G(·|sx)This thesis proposes novel computational methods of information geometry with
real-time applications in audio signal processing. In this context, we address inparallel the applicative problems of real-time audio segmentation, and of real-timepolyphonic music transcription. This is achieved by developing theoretical frame-works respectively for sequential change detection with exponential families, andfor non-negative matrix factorization with convex-concave divergences. On the onehand, sequential change detection is studied in the light of the dually flat informa-tion geometry of exponential families. We notably develop a generic and unifyingstatistical framework relying on multiple hypothesis testing with decision rules basedon exact generalized likelihood ratios. This is applied to devise a modular systemfor real-time audio segmentation with arbitrary types of signals and of homogeneitycriteria. The proposed system controls the information rate of the audio streamas it unfolds in time to detect changes. On the other hand, non-negative matrixfactorization is investigated by the way of convex-concave divergences on the spaceof discrete positive measures. In particular, we formulate a generic and unifyingoptimization framework for non-negative matrix factorization based on variationalbounding with auxiliary functions. This is employed to design a real-time systemfor polyphonic music transcription with an explicit control on the frequency com-promise during the analysis. The developed system decomposes the music signal asit arrives in time onto a dictionary of note spectral templates. These contributionsprovide interesting insights and directions for future research in the realm of audiosignal processing, and more generally of machine learning and signal processing,in the relatively young but nonetheless prolific field of computational informationgeometry.
AbstractC(·)G(·|sx)This thesis proposes novel computational methods of information geometry with
real-time applications in audio signal processing. In this context, we address inparallel the applicative problems of real-time audio segmentation, and of real-timepolyphonic music transcription. This is achieved by developing theoretical frame-works respectively for sequential change detection with exponential families, andfor non-negative matrix factorization with convex-concave divergences. On the onehand, sequential change detection is studied in the light of the dually flat informa-tion geometry of exponential families. We notably develop a generic and unifyingstatistical framework relying on multiple hypothesis testing with decision rules basedon exact generalized likelihood ratios. This is applied to devise a modular systemfor real-time audio segmentation with arbitrary types of signals and of homogeneitycriteria. The proposed system controls the information rate of the audio streamas it unfolds in time to detect changes. On the other hand, non-negative matrixfactorization is investigated by the way of convex-concave divergences on the spaceof discrete positive measures. In particular, we formulate a generic and unifyingoptimization framework for non-negative matrix factorization based on variationalbounding with auxiliary functions. This is employed to design a real-time systemfor polyphonic music transcription with an explicit control on the frequency com-promise during the analysis. The developed system decomposes the music signal asit arrives in time onto a dictionary of note spectral templates. These contributionsprovide interesting insights and directions for future research in the realm of audiosignal processing, and more generally of machine learning and signal processing,in the relatively young but nonetheless prolific field of computational informationgeometry.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextOptimization frameworkMethods for convex-concave divergencesDiscussion
Variational bounding and auxiliary functions
Variational bounding:Iterative technique for minimization problems.Cost function replaced at each step with a surrogate majorizing function.
Auxiliary function: G(sx|sx) = C(sx) and G(x|sx) ≥ C(x).Iterative scheme: if G(x|sx) ≤ G(sx|sx), then C(x) ≤ C(sx).Majorization-minimization scheme: minimize G(x, sx) subject to x ∈ X .
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
Figure 3.2.: Auxiliary function for the cost function. The auxiliary function definesa majorizing function above the current solution, which can be used asa surrogate for optimizing the cost.
Lemma 3.1. Let x, sx ! X . If G(x|sx) " G(sx|sx), then C(x) " C(sx).
Proof. Let x, sx ! X . By definition, we have C(x) " G(x|sx) and C(sx) = G(sx|sx).Now if G(x|sx) " G(sx|sx), then we have C(x) " G(x|sx) " G(sx|sx) = C(sx), whichproves the lemma.
Remark 3.8. We also have strict decrease of the cost function as soon as we choosea vector x that makes the auxiliary function strictly decrease as G(x|sx) < G(sx|sx).
Remark 3.9. This justifies the use of an auxiliary function to minimize or at leastmake the original cost function decrease. Indeed, if the current solution is given bysx ! X , then choosing a point x ! X such that G(x|sx) " G(sx|sx) provides a bettersolution. This may be iterated until a termination criterion is met. In general, whenit is possible, the point x is chosen as a minimizer of the auxiliary function at sx, sothat we need to solve the following optimization problem:
minimize G(x, sx) subject to x ! X . (3.5)
The minimization can be done on an arbitrary subset X ! # X as soon as sx ! X !. Itis also possible to equalize the auxiliary function instead, or to choose any point inbetween the minimization and the equalization.
We show in the sequel that we can build auxiliary functions for a wide range ofcommon information divergences presented in Chapter 1. We can therefore op-timize the respective cost functions by variational bounding. We will focus onmaximization-minimization schemes where the auxiliary function is iteratively min-imized to update the solution as discussed above.
48
AbstractC(·)G(·|sx)This thesis proposes novel computational methods of information geometry with
real-time applications in audio signal processing. In this context, we address inparallel the applicative problems of real-time audio segmentation, and of real-timepolyphonic music transcription. This is achieved by developing theoretical frame-works respectively for sequential change detection with exponential families, andfor non-negative matrix factorization with convex-concave divergences. On the onehand, sequential change detection is studied in the light of the dually flat informa-tion geometry of exponential families. We notably develop a generic and unifyingstatistical framework relying on multiple hypothesis testing with decision rules basedon exact generalized likelihood ratios. This is applied to devise a modular systemfor real-time audio segmentation with arbitrary types of signals and of homogeneitycriteria. The proposed system controls the information rate of the audio streamas it unfolds in time to detect changes. On the other hand, non-negative matrixfactorization is investigated by the way of convex-concave divergences on the spaceof discrete positive measures. In particular, we formulate a generic and unifyingoptimization framework for non-negative matrix factorization based on variationalbounding with auxiliary functions. This is employed to design a real-time systemfor polyphonic music transcription with an explicit control on the frequency com-promise during the analysis. The developed system decomposes the music signal asit arrives in time onto a dictionary of note spectral templates. These contributionsprovide interesting insights and directions for future research in the realm of audiosignal processing, and more generally of machine learning and signal processing,in the relatively young but nonetheless prolific field of computational informationgeometry.
AbstractC(·)G(·|sx)This thesis proposes novel computational methods of information geometry with
real-time applications in audio signal processing. In this context, we address inparallel the applicative problems of real-time audio segmentation, and of real-timepolyphonic music transcription. This is achieved by developing theoretical frame-works respectively for sequential change detection with exponential families, andfor non-negative matrix factorization with convex-concave divergences. On the onehand, sequential change detection is studied in the light of the dually flat informa-tion geometry of exponential families. We notably develop a generic and unifyingstatistical framework relying on multiple hypothesis testing with decision rules basedon exact generalized likelihood ratios. This is applied to devise a modular systemfor real-time audio segmentation with arbitrary types of signals and of homogeneitycriteria. The proposed system controls the information rate of the audio streamas it unfolds in time to detect changes. On the other hand, non-negative matrixfactorization is investigated by the way of convex-concave divergences on the spaceof discrete positive measures. In particular, we formulate a generic and unifyingoptimization framework for non-negative matrix factorization based on variationalbounding with auxiliary functions. This is employed to design a real-time systemfor polyphonic music transcription with an explicit control on the frequency com-promise during the analysis. The developed system decomposes the music signal asit arrives in time onto a dictionary of note spectral templates. These contributionsprovide interesting insights and directions for future research in the realm of audiosignal processing, and more generally of machine learning and signal processing,in the relatively young but nonetheless prolific field of computational informationgeometry.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextOptimization frameworkMethods for convex-concave divergencesDiscussion
Discussion
Summary:Non-negative matrix factorization with convex-concave divergences.Many common information divergences.Variational bounding with auxiliary functions.Symmetrized and skew divergences.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
ContextOptimization frameworkMethods for convex-concave divergencesDiscussion
Discussion
Summary:Non-negative matrix factorization with convex-concave divergences.Many common information divergences.Variational bounding with auxiliary functions.Symmetrized and skew divergences.
Principle:Convert a raw music signal into a symbolic representation such as a score.From a low-level audio waveform to high-level symbolic information.
Approaches:Many methods for multiple pitch estimation [de Cheveigné, 2006, Klapuri & Davy, 2006].Non-negative matrix factorization[Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006,Raczyński et al., 2007, Niedermayer, 2008, Marolt, 2009, Grindlay & Ellis, 2009, Vincent et al., 2010].Probabilistic latent component analysis [Smaragdis et al., 2008, Mysore & Smaragdis, 2009,Grindlay & Ellis, 2011, Hennequin et al., 2011, Fuentes et al., 2011, Benetos & Dixon, 2011].
Principle:Convert a raw music signal into a symbolic representation such as a score.From a low-level audio waveform to high-level symbolic information.
Approaches:Many methods for multiple pitch estimation [de Cheveigné, 2006, Klapuri & Davy, 2006].Non-negative matrix factorization[Smaragdis & Brown, 2003, Abdallah & Plumbley, 2004, Virtanen & Klapuri, 2006,Raczyński et al., 2007, Niedermayer, 2008, Marolt, 2009, Grindlay & Ellis, 2009, Vincent et al., 2010].Probabilistic latent component analysis [Smaragdis et al., 2008, Mysore & Smaragdis, 2009,Grindlay & Ellis, 2011, Hennequin et al., 2011, Fuentes et al., 2011, Benetos & Dixon, 2011].
Issues of approaches based on non-negative matrix factorization:Inherently suitable for offline processing.Need for structuring the dictionary.With the exception of systems for audio decomposition[Sha & Saul, 2005, Cheng et al., 2008, Cont, 2006, Cont et al., 2007].Specific cost functions.No suitable controls on the decomposition.Potential convergence issues.
Contributions in this context:Parametric family of (α, β)-divergences for decomposition.Insights into their relevancy for audio analysis.Flexible control on the frequency compromise during the decomposition.Multiplicative updates tailored to real time.Monotonic decrease of the cost function.
α,β (y‖y ′).Different emphasis is put on the coefficients depending on their magnitude.α + β > 0: more emphasis is put on the higher magnitude coefficients.α + β < 0: converse effect.Compromise between fundamental frequencies, first partials, and higher partials.
Tailored updates in vector form: x← x⊗( (
A>⊗(
y�α er>))>
(Ax)� β−1
A>(Ax)�α+β−1
)� pα,β.
1 element-wise matrix multiplication per time frame.1 vector transposed replication per time frame.3 matrix-vector multiplications per iteration.1 element-wise vector multiplication per iteration.1 element-wise vector division per iteration.3 element-wise vector powers per iteration.
α,β (y‖y ′).Different emphasis is put on the coefficients depending on their magnitude.α + β > 0: more emphasis is put on the higher magnitude coefficients.α + β < 0: converse effect.Compromise between fundamental frequencies, first partials, and higher partials.
Tailored updates in vector form: x← x⊗( (
A>⊗(
y�α er>))>
(Ax)� β−1
A>(Ax)�α+β−1
)� pα,β.
1 element-wise matrix multiplication per time frame.1 vector transposed replication per time frame.3 matrix-vector multiplications per iteration.1 element-wise vector multiplication per iteration.1 element-wise vector division per iteration.3 element-wise vector powers per iteration.
Evaluation on multiple fundamental frequency estimation
Evaluation:Standard piano dataset [Emiya et al., 2010].Methodology from MIREX [Bay et al., 2009].
Comparison of ABND to the state-of-the-art:BHNMF: offline unsupervised non-negative matrix factorization with the β-divergencesand a harmonic model exploiting spectral smoothness [Vincent et al., 2010].SACS: offline sinusoidal analysis with a candidate selection exploiting spectral features[Yeh et al., 2010].
Evaluation on multiple fundamental frequency estimation
Evaluation:Standard piano dataset [Emiya et al., 2010].Methodology from MIREX [Bay et al., 2009].
Comparison of ABND to the state-of-the-art:BHNMF: offline unsupervised non-negative matrix factorization with the β-divergencesand a harmonic model exploiting spectral smoothness [Vincent et al., 2010].SACS: offline sinusoidal analysis with a candidate selection exploiting spectral features[Yeh et al., 2010].
Evaluation on multiple fundamental frequency estimation
Evaluation:Standard piano dataset [Emiya et al., 2010].Methodology from MIREX [Bay et al., 2009].
Comparison of ABND to the state-of-the-art:BHNMF: offline unsupervised non-negative matrix factorization with the β-divergencesand a harmonic model exploiting spectral smoothness [Vincent et al., 2010].SACS: offline sinusoidal analysis with a candidate selection exploiting spectral features[Yeh et al., 2010].
Summary:Real-time system for polyphonic music transcription.Non-negative decomposition with (α, β)-divergences.Control on the frequency compromise and convergence guarantees.
Perspectives:Non-stationary spectral templates:
Extended models for time-varying objects.State representation of sounds.
Other representations:Non-linear frequency scales, wavelet transforms, modulation spectrum.Multi-channel information with tensors.
Improved robustness:Post-processing by modeling the activations.Other information divergences.
Generalization capacities:Hierarchical instrument basis.Adaptive templates and online dictionary learning.
Summary:Real-time system for polyphonic music transcription.Non-negative decomposition with (α, β)-divergences.Control on the frequency compromise and convergence guarantees.
Perspectives:Non-stationary spectral templates:
Extended models for time-varying objects.State representation of sounds.
Other representations:Non-linear frequency scales, wavelet transforms, modulation spectrum.Multi-channel information with tensors.
Improved robustness:Post-processing by modeling the activations.Other information divergences.
Generalization capacities:Hierarchical instrument basis.Adaptive templates and online dictionary learning.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Conclusion
Summary of the present work:Study the application of computational information geometry to audio signal processing.From sequential change detection to audio segmentation.From non-negative matrix factorization to polyphonic music transcription:
Perspectives for future work:Sequential change detection and audio segmentation:
Generalization to address statistical dependence for more accurate models.Other estimators than maximum likelihood for a priori knowledge or improved robustness.
Non-negative matrix factorization and polyphonic music transcription:Direct extensions to convex models and tensors for richer information.More elaborate cost functions for alternative controls.
General directions of investigation:Apply other novel or existing computational methods to audio signal processing.Apply the proposed computational methods to broader applications and domains.
Scientific exchange:Japanese-French Laboratory for Informatics, Tokyo University.Organization of Brillouin Seminar Series.Geometric Science of Information 2013.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Conclusion
Summary of the present work:Study the application of computational information geometry to audio signal processing.From sequential change detection to audio segmentation.From non-negative matrix factorization to polyphonic music transcription:
Perspectives for future work:Sequential change detection and audio segmentation:
Generalization to address statistical dependence for more accurate models.Other estimators than maximum likelihood for a priori knowledge or improved robustness.
Non-negative matrix factorization and polyphonic music transcription:Direct extensions to convex models and tensors for richer information.More elaborate cost functions for alternative controls.
General directions of investigation:Apply other novel or existing computational methods to audio signal processing.Apply the proposed computational methods to broader applications and domains.
Scientific exchange:Japanese-French Laboratory for Informatics, Tokyo University.Organization of Brillouin Seminar Series.Geometric Science of Information 2013.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Conclusion
Summary of the present work:Study the application of computational information geometry to audio signal processing.From sequential change detection to audio segmentation.From non-negative matrix factorization to polyphonic music transcription:
Perspectives for future work:Sequential change detection and audio segmentation:
Generalization to address statistical dependence for more accurate models.Other estimators than maximum likelihood for a priori knowledge or improved robustness.
Non-negative matrix factorization and polyphonic music transcription:Direct extensions to convex models and tensors for richer information.More elaborate cost functions for alternative controls.
General directions of investigation:Apply other novel or existing computational methods to audio signal processing.Apply the proposed computational methods to broader applications and domains.
Scientific exchange:Japanese-French Laboratory for Informatics, Tokyo University.Organization of Brillouin Seminar Series.Geometric Science of Information 2013.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography I
Abdallah, S. A. & Plumbley, M. D. (2004).Polyphonic music transcription by non-negative sparse coding of power spectra.In 5th International Conference on Music Information Retrieval (ISMIR) (pp. 318–325). Barcelona, Spain.
Basseville, M. & Nikiforov, I. V. (1993).Detection of Abrupt Changes: Theory and Application.Upper Saddle River, NJ, USA: Prentice-Hall, Inc.
Bay, M., Ehmann, A. F., & Downie, J. S. (2009).Evaluation of multiple-F0 estimation and tracking systems.In 10th International Society for Music Information Retrieval Conference (ISMIR) (pp. 315–320). Kobe, Japan.
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005).A tutorial on onset detection in music signals.IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.
Benetos, E. & Dixon, S. (2011).Multiple-instrument polyphonic music transcription using a convolutive probabilistic model.In 8th Sound and Music Computing Conference (SMC) (pp. 19–24). Padova, Italy.
Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., & Plemmons, R. J. (2007).Algorithms and applications for approximate nonnegative matrix factorization.Computational Statistics & Data Analysis, 52(1), 155–173.
Böck, S., Krebs, F., & Schedl, M. (2012).Evaluating the online capabilities of onset detection methods.In 13th International Society for Music Information Retrieval Conference (ISMIR) (pp. 49–54). Porto, Portugal.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography II
Canu, S. & Smola, A. (2006).Kernel methods and the exponential family.Neurocomputing, 69(7–9), 714–720.
Cemgil, A. T. (2009).Bayesian inference for nonnegative matrix factorisation models.Computational Intelligence and Neuroscience, 2009, 17 pages.
Chen, J. & Gupta, A. K. (2012).Parametric Statistical Change Point Analysis: With Applications to Genetics, Medecine, and Finance.New York, NY, USA: Birkhäuser, second edition.
Cheng, C.-C., Hu, D. J., & Saul, L. K. (2008).Nonnegative matrix factorization for real time musical analysis and sight-reading evaluation.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 2017–2020). Las Vegas, NV, USA.
Cichocki, A., Cruces, S., & Amari, S.-i. (2011).Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization.Entropy, 13(1), 134–170.
Cichocki, A., Lee, H., Kim, Y.-D., & Choi, S. (2008).Non-negative matrix factorization with α-divergence.Pattern Recognition Letters, 29(9), 1433–1440.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography III
Cichocki, A., Zdunek, R., & Amari, S.-i. (2006).Csiszár’s divergences for non-negative matrix factorization: Family of new algorithms.In J. Rosca, D. Erdogmus, J. C. Príncipe, & S. Haykin (Eds.), Independent Component Analysis and Blind Signal Separation: 6thInternational Conference, ICA 2006, Charleston, SC, USA, March 5-8, 2006, Proceedings, volume 3889 of Lecture Notes inComputer Science (pp. 32–39). Berlin/Heidelberg, Germany: Springer.
Cichocki, A., Zdunek, R., Phan, A. H., & Amari, S.-i. (2009).Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation.Chichester, UK: Wiley.
Cont, A. (2006).Realtime multiple pitch observation using sparse non-negative constraints.In 7th International Conference on Music Information Retrieval (ISMIR) (pp. 206–211). Victoria, Canada.
Cont, A., Dubnov, S., & Wessel, D. (2007).Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints.In 10th International Conference on Digital Audio Effects (DAFx) (pp. 85–92). Bordeaux, France.
de Cheveigné, A. (2006).Multiple F0 estimation.In D. Wang & G. J. Brown (Eds.), Computational Auditory Scene Analysis: Principles, Algorithms and Applications chapter 2,(pp. 45–79). Hoboken, NJ, USA: Wiley-IEEE Press.
Desobry, F., Davy, M., & Doncarli, C. (2005).An online kernel change detection algorithm.IEEE Transactions on Signal Processing, 53(8), 2961–2974.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography IV
Dhillon, I. S. & Sra, S. (2006).Generalized nonnegative matrix approximations with Bregman divergences.In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 18 (pp.283–290). Cambridge, MA, USA: MIT Press.
Dixon, S. (2006).Onset detection revisited.In 9th International Conference on Digital Audio Effects (DAFx) (pp. 133–137). Montreal, Canada.
Emiya, V., Badeau, R., & David, B. (2010).Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle.IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.
Févotte, C. & Cemgil, A. T. (2009).Nonnegative matrix factorizations as probabilistic inference in composite models.In 17th European Signal Processing Conference (EUSIPCO) (pp. 1913–1917). Glasgow, UK.
Févotte, C. & Idier, J. (2011).Algorithms for nonnegative matrix factorization with the β-divergence.Neural Computation, 23(9), 2421–2456.
Fuentes, B., Badeau, R., & Richard, G. (2011).Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 401–404). Prague, Czech Republic.
Grindlay, G. & Ellis, D. P. W. (2009).Multi-voice polyphonic music transcription using eigeninstruments.In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 53–56). New Paltz, NY, USA.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography V
Grindlay, G. & Ellis, D. P. W. (2011).Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments.IEEE Journal of Selected Topics in Signal Processing, 5(6), 1159–1169.
Harchaoui, Z., Bach, F., & Moulines, E. (2009).Kernel change-point analysis.In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (NIPS),volume 21 (pp. 609–616). La Jolla, CA, USA: NIPS Foundation.
Harchaoui, Z. & Lévy-Leduc, C. (2008).Catching change-points with lasso.In J. C. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 20(pp. 617–624). Cambridge, MA, USA: MIT Press.
Harchaoui, Z. & Lévy-Leduc, C. (2010).Multiple change-point estimation with a total variation penalty.Journal of the American Statistical Association, 105(492), 1480–1493.
Hennequin, R., Badeau, R., & David, B. (2011).Scale-invariant probabilistic latent component analysis.In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 129–132). New Paltz, NY, USA.
Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000).Strategies for automatic segmentation of audio data.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3 (pp. 1423–1426). Istanbul,Turkey.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography VI
Klapuri, A. & Davy, M., Eds. (2006).Signal Processing Methods for Music Transcription.New York, NY, USA: Springer.
Kompass, R. (2007).A generalized divergence measure for nonnegative matrix factorization.Neural Computation, 19(3), 780–791.
Kotti, M., Moschou, V., & Kotropoulos, C. (2008).Speaker segmentation and clustering.Signal Processing, 88(5), 1091–1124.
Lai, T. L. (1995).Sequential changepoint detection in quality control and dynamical systems.Journal of the Royal Statistical Society: Series B (Methodological), 57(4), 613–658.
Lai, T. L. & Xing, H. (2010).Sequential change-point detection when the pre- and post-change parameters are unknown.Sequential Analysis: Design Methods and Applications, 29(2), 162–175.
Lee, D. D. & Seung, H. S. (1999).Learning the parts of objects by non-negative matrix factorization.Nature, 401(6755), 788–791.
Lee, D. D. & Seung, H. S. (2001).Algorithms for non-negative matrix factorization.In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 13 (pp.556–562). Cambridge, MA, USA: MIT Press.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography VII
Leveau, P., Daudet, L., & Richard, G. (2004).Methodology and tools for the evaluation of automatic onset detection algorithms in music.In 5th International Conference on Music Information Retrieval (ISMIR) (pp. 72–75). Barcelona, Spain.
Marolt, M. (2009).Non-negative matrix factorization with selective sparsity constraints for transcription of bell chiming recordings.In 6th Sound and Music Computing Conference (SMC) (pp. 137–142). Porto, Portugal.
Mei, Y. (2006).Sequential change-point detection when unknown parameters are present in the pre-change distribution.The Annals of Statistics, 34(1), 92–122.
Mysore, G. J. & Smaragdis, P. (2009).Relative pitch estimation of multiple instruments.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 313–316). Taipei, Taiwan.
Nakano, M., Kameoka, H., Le Roux, J., Kitano, Y., Ono, N., & Sagayama, S. (2010).Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence.In IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 283–288). Kittilä, Finland.
Niedermayer, B. (2008).Non-negative matrix division for the automatic transcription of polyphonic music.In 9th International Conference on Music Information Retrieval (ISMIR) (pp. 544–549). Philadelphia, PA, USA.
Paatero, P. (1997).Least squares formulation of robust non-negative factor analysis.Chemometrics and Intelligent Laboratory Systems, 37(1), 23–35.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography VIII
Paatero, P. & Tapper, U. (1994).Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.Environmetrics, 5(2), 111–126.
Polunchenko, A. S. & Tartakovsky, A. G. (2012).State-of-the-art in sequential change-point detection.Methodology and Computing in Applied Probability, 14(3), 649–684.
Poor, V. H. & Hadjiliadis, O. (2009).Quickest Detection.New York, NY, USA: Cambridge University Press.
Raczyński, S. A., Ono, N., & Sagayama, S. (2007).Multipitch analysis with harmonic nonnegative matrix approximation.In 8th International Conference on Music Information Retrieval (ISMIR) (pp. 381–386). Vienna, Austria.
Röbel, A. (2011).Onset detection by means of transient peak classification.In Music Information Retrieval Evaluation eXchange (MIREX).
Sha, F. & Saul, L. K. (2005).Real-time pitch determination of one or more voices by nonnegative matrix factorization.In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (NIPS), volume 17 (pp.1233–1240). Cambridge, MA, USA: MIT Press.
Siegmund, D. & Venkatraman, E. S. (1995).Using the generalized likelihood ratio statistic for sequential detection of a change-point.The Annals of Statistics, 23(1), 255–271.
Preliminaries on Information GeometrySequential Change Detection with Exponential Families
Real-Time Audio SegmentationNon-Negative Matrix Factorization with Convex-Concave Divergences
Real-Time Polyphonic Music Transcription
Bibliography IX
Smaragdis, P. & Brown, J. C. (2003).Non-negative matrix factorization for polyphonic music transcription.In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 177–180). New Paltz, NY, USA.
Smaragdis, P., Raj, B., & Shashanka, M. (2008).Sparse and shift-invariant feature extraction from non-negative data.In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 2069–2072). Las Vegas, NV, USA.
Vert, J.-P. & Bleakley, K. (2010).Fast detection of multiple change-points shared by many signals using group LARS.In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information ProcessingSystems (NIPS), volume 23 (pp. 2343–2351). La Jolla, CA, USA: NIPS Foundation.
Vincent, E., Bertin, N., & Badeau, R. (2010).Adaptive harmonic spectral decomposition for multiple pitch estimation.IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 528–537.
Virtanen, T. & Klapuri, A. (2006).Analysis of polyphonic audio using source-filter model and non-negative matrix factorization.In NIPS Workshop on Advances in Models for Acoustic Processing Whistler, Canada.
Yeh, C., Röbel, A., & Rodet, X. (2010).Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals.IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1116–1126.