Top Banner
Model Assessment & Model Assessment & Selection Selection Prof. Liqing Zhang Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University
31

Model Assessment & Selection

Jan 01, 2016

Download

Documents

Cameron Butler

Model Assessment & Selection. Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University. Outline. Bias, Variance and Model Complexity The Bias-Variance Decomposition Optimism of the Training Error Rate Estimates of In-Sample Prediction Error - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Model Assessment & Selection

Model Assessment & Model Assessment & SelectionSelection

Prof. Liqing Zhang Prof. Liqing Zhang

Dept. Computer Science & Engineering,

Shanghai Jiaotong University

Page 2: Model Assessment & Selection

23/04/20 Model Assessment & Selection 2

Outline• Bias, Variance and Model Complexity

• The Bias-Variance Decomposition

• Optimism of the Training Error Rate

• Estimates of In-Sample Prediction Error

• The Effective Number of Parameters

• The Bayesian Approach and BIC

• Minimum Description Length

• Vapnik-Chernovenkis Dimension

• Cross-Validation

• Bootstrap Methods

Page 3: Model Assessment & Selection

23/04/20 Model Assessment & Selection 3

Bias, Variance & Model Complexity

Page 4: Model Assessment & Selection

23/04/20 Model Assessment & Selection 4

Bias, Variance & Model Complexity

• The standard of model assessment : the generalization performance of a learning method– Model:– Prediction Model:– Loss function:

)(; XfYYX

error absolute

error squared

|)(ˆ|

))(ˆ())(ˆ,(

2

XfY

XfYXfYL

)(ˆ Xf

Page 5: Model Assessment & Selection

23/04/20 Model Assessment & Selection 5

Bias, Variance & Model Complexity

• Error: training error, generalization error

• Typical loss function:

))(ˆ,(

))(ˆ,(1

1

XfYLEErr

xfyLN

errN

iii

:error tionGeneraliza

:error Training

)(ˆlog2

)(ˆlog)(2))(ˆ,(

))(ˆ())(ˆ,(

1

Xp

XpkGIXpGL

XGGIXGGL

G

k

K

k

likelihood-log

loss 1-0

Page 6: Model Assessment & Selection

23/04/20 Model Assessment & Selection 6

Bias-Variance Decomposition

• Basic Model:• The expected prediction error of a regression

fit .

• The more complex the model, the lower the (squared) bias but the higher the variance.

VarianceBiasError eIrreducibl

2

2

2

))(ˆ())(ˆ(

)](ˆ)(ˆ[)]()(ˆ[

]|))(ˆ[()(

02

0

200

200

02

00

xfVarxfBias

xfxfEExfxfE

xXxfYExErr

),0(~,)( 2 NXfY

)(ˆ Xf

Page 7: Model Assessment & Selection

23/04/20 Model Assessment & Selection 7

Bias-Variance Decomposition

• For the k-NN regression fit the prediction error:

• For the linear model fit

kσxfk

xfσ

x|XxfYExErr

ε

k

jjε /])(

1)([

]))(ˆ[()(

22

10

2

02

00

xxf Tp )(ˆ

22

02

00

02

00

)()](ˆ)([

]|))(ˆ[()

xhxfExf

xXxfYExErr

p

p

(2

Page 8: Model Assessment & Selection

23/04/20 Model Assessment & Selection 8

Bias-Variance Decomposition

• The in-sample error of the Linear Model

– The model complexity is directly related to the number of parameters p.

• For ridge regression the square bias

2

1i

2 N

pxfExf

NxErr

N

N

ii

N

ii

2

1

)](ˆ)([1

)(1

22 Bias] n[EstimatioBias] [Model

2

00

2

00

2

0ˆ)()(ˆ)(

0000xExExxfExfExfE TT

xT

xxx

Page 9: Model Assessment & Selection

23/04/20 Model Assessment & Selection 9

Bias-Variance Decomposition

Regularized fit

Closest fit

Estimation Variance

Model bias

Estimation bias

Truth

RealizationClosest fit in population

RESTRICED MODEL SPACE

MODEL SPACE

• Schematic of the behavior of bias and variance

Page 10: Model Assessment & Selection

23/04/20 Model Assessment & Selection 10

Optimism of the Training Error Rate

• Training Error < True Error

• is extra-sample error

• The in-sample error

• Optimism:

)](ˆ,([

))(ˆ,(1

1

XfYLEErr

xfyLN

errN

iii

Error True

Error Training

Err

N

ii

newiYyin xfYLEE

NErr new

1

))(ˆ,(1

Page 11: Model Assessment & Selection

23/04/20 Model Assessment & Selection 11

Optimism of the Training Error Rate

• For squared error, 0-1, other loss function :

• is obtained by a linear fit with d inputs or basis function, a simplification is:

N

i iiyi n

N

i ii

yyCovN

errEErr

yyCovN

op

1

1

),ˆ(2

),ˆ(2

iy

2

2

1

2

),ˆcov(

N

derrEErr

dyy

yi n

N

i ii

•输入维数或基函数的个数增加,乐观性增大•训练样本数增加,乐观性降低

Page 12: Model Assessment & Selection

23/04/20 Model Assessment & Selection 12

Estimates of In-sample Prediction Error

• The general form of the in-sample estimates is

• parameters are fit under Squared error loss

• Use a log-likelihood function to estimate

– This relationship introduce the Akaike Information Criterion

poerrErrE yin ˆ][ˆ

2ˆ2 N

derrCC pp :statistic

inErr

N

i iyloglikN

dloglikE

NYE-N

1)(Prlog

22

)(Prlog2,

d

Page 13: Model Assessment & Selection

23/04/20 Model Assessment & Selection 13

Akaike Information Criterion

• Akaike Information Criterion is a similar but more generally applicable estimate of

• A set of models with a turning parameter :

• provides an estimate of the test error curve, and we find the turning parameter that minimizes it.

inErr)(xf

parameters of number :error training the:

)()(

ˆ)(

2)()AIC( 2

derr

N

derr

)AIC(

)ˆminAIC(:ˆ|)(ˆ xf

Page 14: Model Assessment & Selection

23/04/20 Model Assessment & Selection 14

Akaike Information Criterion

• For the logistic regression model, using the binomial log-likelihood.

• For Gaussian model the AICAIC statistic equals to the CCp p statistic.

N

dlikE

N2][log

2AIC

2ˆ2AIC N

derrC p

Page 15: Model Assessment & Selection

23/04/20 Model Assessment & Selection 15

Akaike 信息准则• 音素识别例子:

M)d(

,)()(1

M

kkk fhf

Page 16: Model Assessment & Selection

23/04/20 Model Assessment & Selection 16

Effective number of parameters

• A linear fitting method:

• Effective number of parameters:

– If is an orthogonal projection matrix onto a basis set spanned by features, then:

– is the correct quantity to replace in the CCpp statistic

ixNNSSyy on depending matrix, is ,ˆ

)trace()( SSd

MS )trace(

SM

MS )trace( d

Page 17: Model Assessment & Selection

23/04/20 Model Assessment & Selection 17

Bayesian Approach & BIC

• The Bayesian Information Criterion (BIC)

• Gaussian model: – Variance – then– So

– is proportional to , 2 replaced by– 倾向选择简单模型 , 而惩罚复杂模型

dNloglik )(log2BIC

2

22

1

2 //))(ˆ(2 errNxfyCloglikN

iii

])(log[BIC 22

N

dNerr

N

BIC )AIC(Cp Nlog,4.72 eN

Page 18: Model Assessment & Selection

23/04/20 Model Assessment & Selection 18

Bayesian Model Selection

• BIC derived from Bayesian Model Selection

• Candidate models MMmm , model parameter and

a prior distribution• Posterior probability:

m)|Pr( mm M

Nii

mmmmmm

mmm

yx

dMMM

MMM

1},{Z

)|Pr(),|Pr(Z)Pr(

)|Pr(Z)Pr(Z)|Pr(

data training the represents -

Page 19: Model Assessment & Selection

23/04/20 Model Assessment & Selection 19

Bayesian Model Selection

• Compare two models

• If the odds are greater than 1, model m will be chosen, otherwise choose model

• Bayes Factor:

– The contribution of the data to the posterior odds

MM m and

)|Pr(Z

)|Pr(Z

)Pr(

)Pr(

Z)|Pr(

Z)|Pr(

M

M

M

M

M

M mmm

)|Pr(Z

)|Pr(ZBF(Z)

M

M m

Page 20: Model Assessment & Selection

23/04/20 Model Assessment & Selection 20

Bayesian 模型选择• 如果模型的先验是均匀的 Pr(M) 是常数,

)M,ˆ||ZPr(log

)(ONlogd

)M,ˆ||ZPr(log)M|ZPr(log

mm

mm

mmmm

2

是模型维数。估计,是模型参数的极大似然其中

12

极小 BIC的模型等价于极大化后验概率模型优点:当模型包含真实模型是,当样本趋于无穷时, BIC选择正确的概率是一。

Page 21: Model Assessment & Selection

23/04/20 Model Assessment & Selection 21

最小描述长度 (MDL)

• 来源:最优编码• 信息: z1 z2 z3 z4• 编码: 0 10 110 111• 编码 2 : 110 10 111 0• 准则:最频繁的使用最短的编码

• 发送信息 zi 的概率:• 香农定律指出使用长度: )Pr(logl

)Pr(

i i

i

z

z

Page 22: Model Assessment & Selection

23/04/20 Model Assessment & Selection 22

最小描述长度 (MDL)

8/1;8/1;4/1;2/1)Pr(,

)Pr(log)Pr((

Length

i

li

ii

zAp

zzE

i 等式成立。

Page 23: Model Assessment & Selection

23/04/20 Model Assessment & Selection 23

模型选择 MDL

模型参数的平均长度:

的偏差传递模型与实际目标值:

率假定模型输出的条件概

输入输出参数:模型:

length

)|Pr(log

),,|Pr(log

)|Pr(log),,|Pr(log

),,|Pr(

),(;

M

XMy

MXMy

XMy

yXZM

Page 24: Model Assessment & Selection

23/04/20 Model Assessment & Selection 24

模型选择 MDL

度。小的具有较短的信息长

,参数如果

2

)(log

)10(~),,(~2

2

2

2

yNNy

Const.Length

MDL原理:我们应该选择模型,使得下列长度极小

).|Pr(log),,|Pr(log MxMy

M

length

参数:模型:

Page 25: Model Assessment & Selection

23/04/20 Model Assessment & Selection 25

Vapnik-Chernovenkis 维• 问题:如何选择模型的参数的个数 d ?• 该参数代表了模型的复杂度• VC 维是描述模型复杂度的一个重要的指标

。?)(sin),(,

复杂度)0(线性指示函数:

,)},,({

1010

只有一个参数但是

指示函数参数函数族

fxIxfIRx

fxIfEx

fIRxxf

T

p

1p:

),,(

Page 26: Model Assessment & Selection

23/04/20 Model Assessment & Selection 26

VC 维• 类 的 VC 维定义为可以被 成员分散

的点的最大的个数

• 平面的直线类 VC 维为 3 。• sin(ax) 的 VC 维是无穷大。

)},({ xf )},({ xf

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

Page 27: Model Assessment & Selection

23/04/20 Model Assessment & Selection 27

VC 维• 实值函数类 的 VC 维定义为指示类 的 VC 维。• 引入 VC 维可以为泛化误差提供一个估计• 设 的 VC 维为 h, 样本数为 N.

)},({ xg

)}0),(({ xgI

)},({ xg

二类分类 )4

11(2 err

errErr

N

hNaha

c

errErr

)4/log()1/[log(;

)1(2

1

回归

Page 28: Model Assessment & Selection

23/04/20 Model Assessment & Selection 28

交叉验证

1 2 3 4 5

训练集 训练集 检验集 训练集 训练集

Page 29: Model Assessment & Selection

23/04/20 Model Assessment & Selection 29

自助法• 基本思想:从训练数据中有放回地随机抽样

数据集,每个数据集的与原训练集相同。• 如此产生 B 组 自助法数据集• 如何利用这些数据集进行预测?

自助集误差:

上的预测值是如果 iib xxf )(ˆ *

B

b

N

ii

biboot xfyL

NBErr

1 1

*, ))(ˆ(

11

Page 30: Model Assessment & Selection

23/04/20 Model Assessment & Selection 30

自助法• 自助法过程图解:

1*Z 2*Z BZ *

)( 1*ZS )( 2*ZS )( *BZS

),,,( 21 NzzzZ

重复实验

样本

训练样本

Page 31: Model Assessment & Selection

23/04/20 Model Assessment & Selection 31

Summary• Bias, Variance and Model Complexity• The Bias-Variance Decomposition• Optimism of the Training Error Rate• Estimates of In-Sample Prediction Error• The Effective Number of Parameters• The Bayesian Approach and BIC• Minimum Description Length• Vapnik-Chernovenkis Dimension• Cross-Validation• Bootstrap Methods