Top Banner
Data mining and statistic al learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification Relationship to other prediction models Some simple examples of neural networks Parameter estimation Joint framework for prediction and classification Features of neural networks
26

Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification Relationship.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks

- a model class providing a joint framework

for prediction and classification

Relationship to other prediction models

Some simple examples of neural networks

Parameter estimation

Joint framework for prediction and classification

Features of neural networks

Page 2: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Ordinary least squares regression (OLS)

x1 x2 xp…

yModel:

Terminology:

0: intercept (or bias)

1, …, p: regression coefficients (or weights)

The response variable responds directly and linearly to changes in the inputs

errorxβ...xy pp 110

errory T Xβ0

Page 3: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Principal components regression (PCR)

Extract principal components (linear combinations of the inputs) as derived features, and then model the target (response) as a linear function of these features

MmZ Tmm ,...,1, Xα

x1 x2 xp

z1 z2 zM…

y

ZβTy 0

The response variable responds indirectly and linearly to changes in the inputs

Page 4: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural network with a single target

Output

x1 x2 xp

z1 z2 zM…

y

Hidden layer

of neurons

Inputs

The response to changes in inputs is indirect and nonlinear

Page 5: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neuron

XαTmm 0 )( 0 XαTmmmZ Sigmoid

activation function

-1.5

-1

-0.5

0

0.5

1

1.5

-5 -3 -1 1 3 5

Page 6: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks with a single target

Extract linear combinations

of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function (activation function) of these features

MmZ Tmmm ,...,1,)( 0 Xα

x1 x2 xp

z1 z2 zM…

y

ZβTy 0

MmTmm ,...,1,0 Xα

Page 7: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural network with one input, one neuron, and one target

)( 10 XZ

x

z

y

Zy 10 -1.5

-1

-0.5

0

0.5

1

1.5

-5 -3 -1 1 3 5

Page 8: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural network with one input, one neuron, and one target

)1,0,1.0,0( 1010

)( 10 XZ

Zy 10

x

z

y

-1.5-1

-0.50

0.51

1.5

-20 -10 0 10 20

x

y

-1

-0.5

0

0.5

1

-20 -10 0 10 20

x

y-1

-0.5

0

0.5

1

-20 -10 0 10 20

x

y

-1.5-1

-0.50

0.51

1.5

-20 -10 0 10 20

xy

)1,0,1,0( 1010

)1,0,1,0( 1010 )1,0,1.0,0( 1010

)( 10 XZ

Page 9: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural network with one input, one neuron, and one target

- a simple example

Select Advanced user interface

Select 1 hidden node

Tick Outputs from Training,… -1.5

-1

-0.5

0

0.5

1

1.5

-15 -10 -5 0 5 10 15

xy

Page 10: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural network with one input, one neuron, and one target

-1.5

-1

-0.5

0

0.5

1

1.5

-15 -10 -5 0 5 10 15

y

P_y

Page 11: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Output from proc Neural

- one input, one neuron, one target

Parameter Estimates

Gradient

Objective

N Parameter Estimate Function

1 x_H11 -5.851506 0.000000103

2 BIAS_H11 -0.032606 -0.000001516

3 H11_y -1.017515 1.8123827E-8

4 BIAS_y -0.006434 1.2814216E-8

Value of Objective Function = 0.0106538302

H11 =

Hidden layer 1, neuron 1

Page 12: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural network with one input, one neuron, and one target

- manual calculation of predicted values

Parameter Estimates

Gradient

Objective

N Parameter Estimate Function

1 x_H11 -5.851506 0.000000103

2 BIAS_H11 -0.032606 -0.000001516

3 H11_y -1.017515 1.8123827E-8

4 BIAS_y -0.006434 1.2814216E-8

Standardize x to mean zero and variance one

Compute xstand*x_H11+BIAS_H11

Take tanh to compute z

Compute z*H11_y+BIAS_y

Page 13: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks with one input, two neurons, and one target

2,1),( 10 mXZ mmm

22110 ZZy

x

z1 z2

y

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

-20 -10 0 10 20

x

y

1,5.0,0

1,1,0

0

21202

11101

0

Page 14: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Output from proc Neural

- one input, two neurons, one target

Parameter Estimates

Gradient

Objective

N Parameter Estimate Function

1 x_H11 -4.040296 -0.000006221

2 x_H12 -4.755015 0.000008922

3 BIAS_H11 0.449445 -0.000046905

4 BIAS_H12 0.176599 0.000092579

5 H11_y 0.767115 0.000009568

6 H12_y -1.781053 0.000026628

7 BIAS_y -0.014300 -0.000086070

Value of Objective Function = 0.0104173896

Page 15: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Absorbance records for ten samples of chopped meat

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

1 12 23 34 45 56 67 78 89 100

Channel

Ab

sorb

ance

Sample_1

Sample_2

Sample_3

Sample_4

Sample_5

Sample_6

Sample_7

Sample_8

Sample_9

Sample_10

1 response variable (fat)

100 predictors (absorbance at 100 wavelengths or channels)

The predictors are strongly correlated to each other

Page 16: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Absorbance records for 215 samples of chopped meat

The target is poorly correlated to each predictor

0

10

20

30

40

50

60

0 1 2 3 4 5 6

Absorbance in channel 50

Fat

(%

)

Page 17: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks with a single target and many inputs

- the fat content and absorbance dataset

A total of (p+2)*3+1 parameters are estimated

3,...,1,)( 0 mZ Tmmm Xα

x1 x1 xp

z1 z2 z3

y

3322110 ZZZy

Page 18: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks with a single target and many inputs

- parameter estimates for a model with three neurons

.

.

.

291 Channel90_H13 -0.534226 -0.243706

292 Channel91_H13 -0.590502 -0.245327

293 Channel92_H13 -0.482705 -0.246851

294 Channel93_H13 -0.528643 -0.248195

295 Channel94_H13 -0.333949 -0.249403

296 Channel95_H13 -0.258637 -0.250348

297 Channel96_H13 0.162351 -0.250953

298 Channel97_H13 0.273746 -0.251128

299 Channel98_H13 0.711445 -0.250887

300 Channel99_H13 0.879623 -0.250285

301 BIAS_H11 -2.144805 0.003961

302 BIAS_H12 0.738894 0.095724

303 BIAS_H13 -0.771776 0.587769

304 H11_Fat -1.504744 0.054906

305 H12_Fat -15.057170 -0.025459

306 H13_Fat -18.345040 0.006471

307 BIAS_Fat 16.856496 -0.029187

Value of Objective Function = 0.3045279048

A total of 307

parameters

Page 19: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks with a single target and many inputs

- output from a model with three neurons

Page 20: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks with a single target and many inputs

- output from models with 1 to 10 neurons

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9 10# neurons

Ro

o t

AS

E

Root ASE Test:Root ASE

Convergence problems

Page 21: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks with multiple targets

Extract linear combinations

of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function (activation function) of these features

MmZ Tmmm ,...,1,)( 0 Xα

x1 x2 xp

z1 z2 zM…

y1

ZβTKkky 0

MmTmm ,...,1,0 Xα

yK…

Page 22: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks for K-class classification

With the softmax activation function

and the deviance (cross-entropy) error function

the neural network model is exactly a logistic regression model in the hidden units, and all the parameters are estimated by maximum likelihood

K

ll

kk

y

yYg

1

)exp(

)exp()(

x1 x2 xp

z1 z2 zM…

y1 yK…

K

ll

kk

y

yYg

1

)exp(

)exp()(

N

i

K

kikik xfyR

1 1

)(log),(

Page 23: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks for regression and K-class classification

For regression, we use the sum-of-squared errors

as our measure of fit

For classification, we normally use the deviance (cross-entropy) error function

and the corresponding classifier is

.

x1 x2 xp

z1 z2 zM…

y1 yK…

K

ll

kk

y

yYg

1

)exp(

)exp()(

N

i

K

kikik xfyR

1 1

)(log),(

N

i

K

kikik xfyR

1 1

2))((),(

)(maxarg)( xfxG kk

Page 24: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Fitting neural networks

x1 x2 xp

z1 z2 zM…

y1 yK…M(p+1)+K(M+1) parameters (weights)

We don’t want the global minimizer of the deviance (cross-entropy) function.

Instead we use early stopping or a penalty term

Page 25: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Neural networks

Provide a joint framework for prediction and classification

Can describe both linear and nonlinear responses

Can accommodate multidimensional correlated inputs

Are normally over-fitted – validation is a must

Are difficult to interpret

Convergence problems are not uncommon

Page 26: Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Data mining and statistical learning - lecture 11

Some characteristics of different learning methods

Characteristic Neural networks Trees

Natural handling of data of “mixed” type

Handling of missing values

Robustness to outliers in input space

Insensitive to monotone transformations of inputs

Computational scalability (large N)

Ability to deal with irrelevant inputs

Ability to extract linear combinations of features

Good Poor

Interpretability Poor Fair/good

Predictive power Good Poor