YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: CS 59000 Statistical Machine learning Lecture 15

CS 59000 Statistical Machine learningLecture 15

Yuan (Alan) QiPurdue CS

Oct. 21 2008

Page 2: CS 59000 Statistical Machine learning Lecture 15

Outline

• Review of Gaussian Processes (GPs)• From linear regression to GP • GP for regression

• Learning hyperparameters• Automatic Relevance Determination• GP for classification

Page 3: CS 59000 Statistical Machine learning Lecture 15

Gaussian Processes

How kernels arise naturally in a Bayesian setting?

Instead of assigning a prior on parameters w, we assign a prior on function value y.Infinite space in theory

Finite space in practice (finite number of training set and test set)

Page 4: CS 59000 Statistical Machine learning Lecture 15

Linear Regression Revisited

Let

We have

Page 5: CS 59000 Statistical Machine learning Lecture 15

From Prior on Parameter to Prior on Function

The prior on function value:

Page 6: CS 59000 Statistical Machine learning Lecture 15

Stochastic Process

A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)

Page 7: CS 59000 Statistical Machine learning Lecture 15

Gaussian Processes

The joint distribution of any variables is a multivariable Gaussian distribution.

Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :

Page 8: CS 59000 Statistical Machine learning Lecture 15

Impact of Kernel FunctionCovariance matrix : kernel function

Application economics & finance

Page 9: CS 59000 Statistical Machine learning Lecture 15

Gaussian Process for Regression

Likelihood:

Prior:

Marginal distribution:

Page 10: CS 59000 Statistical Machine learning Lecture 15

Samples of Data Points

Page 11: CS 59000 Statistical Machine learning Lecture 15

Predictive Distribution

is a Gaussian distribution with mean and variance:

Page 12: CS 59000 Statistical Machine learning Lecture 15

Predictive Mean

is the nth component ofWe see the same form as kernel ridge

regression and kernel PCA.

Page 13: CS 59000 Statistical Machine learning Lecture 15

GP Regression

Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?

Page 14: CS 59000 Statistical Machine learning Lecture 15

Computational Complexity

GP prediction for a new data point:

GP: O(N3) where N is number of data pointsBasis function model: O(M3) where M is the

dimension of the feature expansionWhen N is large: computationally expensive.Sparsification: make prediction based on only a few

data points (essentially make N small)

Page 15: CS 59000 Statistical Machine learning Lecture 15

Learning Hyperparameters

Empirical Bayes Methods

Page 16: CS 59000 Statistical Machine learning Lecture 15

Automatic Relevance Determination

Consider two-dimensional problems:

Maximizing the marginal likelihood will make certain small, reducing its relevance to prediction.

Page 17: CS 59000 Statistical Machine learning Lecture 15

Example

t = sin(2 π x1)

x2 = x1 +n

x3 = e

Page 18: CS 59000 Statistical Machine learning Lecture 15

Gaussian Processes for Classification

Likelihood:

GP Prior:

Covariance function:

Page 19: CS 59000 Statistical Machine learning Lecture 15

Sample from GP Prior

Page 20: CS 59000 Statistical Machine learning Lecture 15

Predictive Distribution

No analytical solution.Approximate this integration:

Laplace’s methodVariational BayesExpectation propagation

Page 21: CS 59000 Statistical Machine learning Lecture 15

Laplace’s method for GP Classification (1)

Page 22: CS 59000 Statistical Machine learning Lecture 15

Laplace’s method for GP Classification (2)

Taylor expansion:

Page 23: CS 59000 Statistical Machine learning Lecture 15

Laplace’s method for GP Classification (3)

Newton-Raphson update:

Page 24: CS 59000 Statistical Machine learning Lecture 15

Laplace’s method for GP Classification (4)

Gaussian approximation:

Page 25: CS 59000 Statistical Machine learning Lecture 15

Laplace’s method for GP Classification (4)

Question: How to get the mean and the variance above?

Page 26: CS 59000 Statistical Machine learning Lecture 15

Predictive Distribution

Page 27: CS 59000 Statistical Machine learning Lecture 15

Example

Page 28: CS 59000 Statistical Machine learning Lecture 15

Related Documents