Top Banner
CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 – 3.3, [M] Chapt. 7 CS485/685 (c) 2016 P. Poupart 1
14

CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Jul 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

CS485/685Lecture 6: Jan 21, 2016Linear Regression by Maximum 

Likelihood, Maximum A Posteriori and Bayesian Learning

[B] Sections 3.1 – 3.3, [M] Chapt. 7

CS485/685 (c) 2016 P. Poupart 1

Page 2: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Noisy Linear Regression• Assume  is obtained from  by a deterministic function  that has been perturbed (i.e., noisy measurement)

• Gaussian noise: 0,

CS485/685 (c) 2016 P. Poupart 2

Page 3: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Maximum Likelihood

• Possible objective: find best  ∗ by maximizing the likelihood of the data

• We arrive at the original least square problem!

CS485/685 (c) 2016 P. Poupart 3

Page 4: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Maximum A Posteriori

• Alternative Objective: find  ∗ with highest posterior probability

• Consider Gaussian prior: 

• Posterior: 

CS485/685 (c) 2016 P. Poupart 4

Page 5: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Maximum A Posterior

• Optimization:∗

• Let  then

• We arrive at the original regularized least square problem!

CS485/685 (c) 2016 P. Poupart 5

Page 6: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Expected Squared Loss• Even though we use a statistical framework, it is interesting to evaluate the expected squared loss

Pr ,,

Pr , , Pr , 2,

Pr ,, Pr

noise (constant) error (depends on  )

Expectation is 0

CS485/685 (c) 2012 P. Poupart 6

Page 7: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Expected Squared Loss

• Let’s focus on the error part, which depends on 

• But the choice of  depends on the dataset • Instead consider expectation with respect to 

where  is the weight vector obtained based on 

CS485/685 (c) 2012 P. Poupart 7

Page 8: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Bias‐Variance Decomposition• Decompose squared loss

Expectation is 0

bias2 varianceCS485/685 (c) 2012 P. Poupart 8

Page 9: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Bias‐Variance Decomposition

• Hence: 

• Picture:

CS485/685 (c) 2012 P. Poupart 9

Page 10: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Bias‐Variance Decomposition

• Example

CS485/685 (c) 2012 P. Poupart 10

Page 11: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Bayesian Linear Regression

• We don’t know if  ∗ is the true underlying • Instead of making predictions according to  ∗, compute the weighted average prediction according to 

where 

CS485/685 (c) 2016 P. Poupart 11

Page 12: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Bayesian Learning

CS485/685 (c) 2016 P. Poupart 12

Page 13: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Bayesian Learning

CS485/685 (c) 2016 P. Poupart 13

Page 14: CS485/685 Lecture 6: Jan 21, 2016CS485/685 Lecture 6: Jan 21, 2016 Linear Regression by Maximum Likelihood, Maximum A Posteriori and Bayesian Learning [B] Sections 3.1 –3.3, [M]

Bayesian Prediction

• Let  ∗ be the input for which we want a prediction and  ∗ be the corresponding prediction

∗ ∗ ∗ ∗

∗ ∗

∗ ∗ ∗

CS485/685 (c) 2016 P. Poupart 14