Regression and Classification: An Artificial Neural Network Approach

Welcome to my presentation on

Presented byMd. Menhazul Abedin

Research studentDept. of Statistics

University of RajshahiRajshahi-6205

Dedication

• This presentation is dedicated to my honorable supervisor

05/02/2023 2

Three pioneer of ANN

Warren McCulloch Walter Pitts

Frank Rosenblatt05/02/2023 3

OutlinesMotivation/Why this study?ObjectivesMethodologyFindingsConclusionLimitationArea of further research

05/02/2023 4

Motivation/Why this study?

• Vector, matrix, sound, image, wave, string, text etc.• How to analyze them? Pitfall of human civilization from several decades.

05/02/2023 5

Objectives?

• To study neural network as a technique for regression and classification.

• To compare neural network with classical regression and classification techniques.

• To study the limitations of neural network.

05/02/2023 6

• Structure of neuron

05/02/2023 7

What is ANN?Biological neural network

Artificial neural network

05/02/2023 8

• How many hidden layers considered? More hidden layer more approximate nonlinearity • More hidden layer need much time to converge. • Weight adjusted by iterative method (backpropagation)

• Analogy between biological and artificial neural networks

05/02/2023 9

Historical Background of Artificial Neural Network

• In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper on how neurons might work.

• In 1949, Donald Hebb wrote The Organization of Behavior (the ways in which humans learn)

• M. Minsky (1951) built a reinforcement-based network learning system.• F. Rosenblatt (1958) the first practical Artificial Neural Network (ANN) - the

perceptron, • B. Widrow & M.E. Hoff (1960) introduced adaptive percepton-like network using

Least Mean Square (LMS) error algorithm. • 1969 – Marvin Minsky and Seymour showed that perceptron model is not capable

of representing many important problems• 1973 – Christoph Von Der Malsburg used a neuron model that was nonlinear and

biologically more motivated• 1974 – Paul Werbos Developed a learning precedure called backpropagation of

error.

05/02/2023 10

Historical Background of Artificial Neural Network

• 1986, The application area of the MLP networks remained rather limited until the breakthrough when a general back propagation algorithm for a multi-layered perceptron was introduced by Rummelhart and Mclelland.

• 1988, Radial Basis Function (RBF) networks were first introduced by Broomhead & Lowe. Although the basic idea of RBF was developed 30 years ago under the name method of potential function, the work by Broomhead & Lowe opened a new frontier in the neural network community.

05/02/2023 11

ANN regression

• Linear activation function Gives continuous values.

05/02/2023 12

ANN classification

• For two class Sigmoid function ( threshold > 0.5 one class & threshold < 0.5 another class)• More class Softmax function (Gives probability for each class)• tanh function may used as activation function 05/02/2023 13

Activation functions• Linear function ,

• Sigmoid function , Where η=xθ.

• Softmax function,

05/02/2023 14

Perceptron learning model specifies the probability of a binary output yi ε {0,1} given the input xi as follows:

( | , ) ( | ( , ))i i i ip y x w Ber y sigm x w

( | , ) ( | ( , ))n

p y X w Ber y sigm x w

1 1( | , ) 11 1

x w x wi

p y X we e

1; ( 1| , )1 ii i i x wp y x we

Cost function:

( ) log ( | , )

= log (1 ) log(1 )n

i i i ii

c w p y X w

Cross entropy

Construction of cost function: sigmoid formulation

sigm(xi,w)=1

1 ix we

05/02/2023 15

Softmax formulation

sigm(xi,w)=1

1 ix we+1

b1=w10

b2=w20

u12 Softm

ix w x w

1 2 1i i 05/02/2023 16

Indicator: 1 if ( )

0 otherwisei

y cI y

0 1( ) ( )1 2( | , ) i iI y I y

i i i ip y x w

0 1( ) ( )1 2

( | , ) i i

nI y I yi i

p y X w

y 0( | , )

i ix w x w

i i x w

i ix w x w

e ife ep y x we if

0 1 1 21

( ) log ( | , ) ( ( ) log ( ) log )n

i i i ii

c w p y X w I y I y

Construction of cost function: Softmax formulation

XLinear Layer

Log softmax

layerNLL C(w)

05/02/2023 17

Weight update (Backpropagation)

• Derivative cost w.r.t inputs (layer wise).• Information go from to = c forward message.• Error propagate backward message & update its

weights.

05/02/2023 18

Optimization

Our goal is to optimize the cost function.Different optimization techniquesGradient descent algorithmNewton's algorithmStochastic gradient descent(SGD)Online learning, batch & mini batch

optimization

05/02/2023 19

Regression (Findings)• Used data set = 7• (Regression = 4, classification = 3)• Pharmaceuticals data:

Size 26

No. of variables 4 (one dependent and three independent)

Outlier Present (6th , 10th ,and 26th )Autocorrelation AbsenceMulticollinearity AbsenceNormality PresentData type RealCross validation LOOCVApplied methods Linear model, Polynomial & ANN

05/02/2023 20

Regression (cont…)

ANN is the best regression model05/02/2023 21

Regression(cont..)

• Yacht Hydrodynamics Data:Size 308

No. of variables 7 (one dependent and six independent)

Outlier Absence

Autocorrelation Absence

Multicollinearity Absence

Normality Absence (Clustered)

Data type Real

Cross validation Training set and test set

Applied methods Linear model, Polynomial & ANN

05/02/2023 22

• Results of Yacht hydrodynamics..

05/02/2023 23

• 100 times repeat for different training and test set• Box plot of test error grow sense about error variation

• ANN is the best regression model05/02/2023 24

Regression(cont..)• Simulated data-1

Size 1000No. of variables 10 (one dependent and nine independent)Outlier AbsenceAutocorrelation AbsenceMulticollinearity AbsenceNormality presentData type Real Cross validation Training set and test setApplied methods Linear model & ANN

05/02/2023 25

• Results of Simulated data-1

05/02/2023 26

Regression (cont…)

• Simulated data-2Size 20000No. of variables 20 (one dependent and nine independent)Outlier AbsenceAutocorrelation AbsenceMulticollinearity Strong MulticollinearityNormality presentData type Real Cross validation Training set and test setApplied methods Linear model & ANN

05/02/2023 28

• Results of Simulated data-2

05/02/2023 29

Classification• IRIS data

Size 150

No. of variables 5 (one dependent and four independent)

No. of class Three (Setosa, Versicolor, Virginica

Type Balanced

Data type Real

Cross validation LOOCV

Applied methods Logistic, LDA, QDA, KNN, NB & ANN

05/02/2023 31

Classification (cont…)• Results

• ANN is the best classifier

Methods Classification rate Misclassification rate

Logistic 0.98 0.02

LDA 0.98 0.02

QDA 0.98 0.02

KNN 0.95 0.05

NB 0.95 0.05

ANN 0.99 0.01

05/02/2023 32

Classification (cont…)

• Fertility data

Size 100

No. of variables 5 (one dependent and four independent)

No. of class Two (Normal & Altered)

Type Imbalanced

Data type Real

Applied methods Logistic, LDA, KNN, NB & ANN

05/02/2023 33

Classification (cont…)

• Results

Methods Accuracy Sensitivity Specificity PPV NPV

Logistic 0.84 0.87 0.00 0.96 0.00

LDA 0.83 0.95 0.00 0.87 0.00

KNN 0.81 0.90 0.16 0.88 0.20

NB 0.82 0.94 0.00 0.87 0.00

ANN 0.88 0.95 0.34 0.91 0.50

05/02/2023 34

Classification (cont…)• Leukemia data

Size 72

No. of variables 7130 (one dependent and 7129 independent)

No. of class Two (ALL & AML)

Type Balanced

Data type Real

Applied methods Logistic, LDA, QDA, KNN, NB & ANN

05/02/2023 35

Classification (cont…)• Results

Methods Accuracy Sensitivity Specificity

Logistic 0.47 0.62 0.31

LDA 0.62 0.68 0.52

QDA 0.65 1.00 0.00

KNN 0.54 0.65 0.32

NB 0.65 1.00 0.00

ANN 0.64 0.68 0.56

05/02/2023 36

Conclusion

• In all cases ANN is the best .

Data Problems ANN Status

Pharmaceuticals Outlier Best regression model

Yacht hydro: Clustered Best regression model

Simulated data-1 Fresh Best regression model

simulated data-2 Strong multicollinearity Best regression model

IRIS Balanced Best classifier

Fertility Imbalanced Best classifier

Leukemia Large (7129 varisbles) Best classifier

05/02/2023 37

Limitations

• Backpropagation no guarantee of absolute minimum • VC dimension unclear• Weights initialization random result is not unique.• Some weights are zero network doesn’t converge.• Computation of confidence interval is so hard.• Doesn’t perform t-test, F-test.

05/02/2023 38

Areas of further research• Robust, generalized ridge, principle component, latent

root, lasso and step wise regression.• Multivariate regression, time series analysis • Application of artificial neural network on unsupervised

learning• Study of semi supervised learning• Comparative study with others machine learning

techniques and data mining techniques• Improvement of backpropagation algorithm

05/02/2023 39

THANK YOU ALL

05/02/2023 40

Regression and Classification: An Artificial Neural Network Approach

Science

A new hybrid artificial neural networks and fuzzy regression...

Neural Regression Trees

Rapidly Adapting Artificial Neural Networks for Autonomous.....

Support Vector Regression and Artificial Neural...

Research Article Nonlinear Survival Regression Using...

Artificial Neural NetworksArtificial Neural...

Prediction of Concrete Properties Using Multiple Linear...

X-TREPAN : A Multi Class Regression and Adapted Extraction.....

Using Multivariate Adaptive Regression Spline and...

Channel Equalization using Artificial Neural NetworkChannel....

Modeling Rheological Properties Of Oil Well Cement Slurries....

A Study on Surface Roughness in Abrasive Waterjet Machining....

Neural Network Regression

Extraction of rules from artificial neural networks for...

2014 International Joint Conference on Neural Networks...

Artificial neural network model & hidden layers in...