Copyright by Nguyen, Hotta and Nakagawa 1 Pattern recognition and Machine Learning Introduction to Neural Networks Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology
Copyright by Nguyen, Hotta and Nakagawa 1
Pattern recognition and Machine Learning Introduction to Neural Networks
Introduction to Neural Networks
CUONG TUAN NGUYEN
SEIJI HOTTA
MASAKI NAKAGAWA
Tokyo University of Agriculture and Technology
Copyright by Nguyen, Hotta and Nakagawa 2
Pattern recognition and Machine Learning Introduction to Neural Networks
Pattern classification
Which category of an input?
Example: Character recognition for input images
Classifier
Output the category of an input
abc
β¦
y
x
z
Classifier
Feature
extraction
π₯1
π₯π
π₯2
input output
Copyright by Nguyen, Hotta and Nakagawa 3
Pattern recognition and Machine Learning Introduction to Neural Networks
Supervised learning
Learning by a training dataset:
pair<input, target>
Testing on unseen dataset
Generalization ability
a
b
c
Input
Training dataset
Target
Copyright by Nguyen, Hotta and Nakagawa 4
Pattern recognition and Machine Learning Introduction to Neural Networks
Supervised learning
Classifier
a
abcβ¦
yx
z
output
Prediction
Learning
Learning by a training dataset:
pair<input, target>
Testing on unseen dataset
Generalization ability
Copyright by Nguyen, Hotta and Nakagawa 5
Pattern recognition and Machine Learning Introduction to Neural Networks
Human neuron
Neural Networks, A Simple Explanationhttps://www.youtube.com/watch?v=gcK_
5x2KsLA
Copyright by Nguyen, Hotta and Nakagawa 6
Pattern recognition and Machine Learning Introduction to Neural Networks
Artificial neuron
π
π₯1
π₯π
π₯2
Input
π€2
π€1
π€π
Weights
πππ‘πππ‘ =
π=1
π
π₯ππ€ππ¦
π¦ = π(πππ‘)
Activation function
Weighted connections
Copyright by Nguyen, Hotta and Nakagawa 7
Pattern recognition and Machine Learning Introduction to Neural Networks
Activation function
Controls when neuron should be activated
tanhsigmoid
ReLU Leaky ReLU
linear
Copyright by Nguyen, Hotta and Nakagawa 8
Pattern recognition and Machine Learning Introduction to Neural Networks
Weighted connection + Activation function
A neuron is a feature detector: it is activated for
a specific feature
π
π₯1
π₯2
Generated by:
https://playground.tensorflow.org
π₯1
π₯2
-0.82
0.49
β0.82π₯1 + 0.49π₯2 = 0
ReLU
Copyright by Nguyen, Hotta and Nakagawa 9
Pattern recognition and Machine Learning Introduction to Neural Networks
Multi-layer perceptron (MLP)
Neurons are arrange into layers
Each neuron in a layer share the same input from
preceding layer
π₯1
π₯2
Generated by:
https://playground.tensorflow.org
Layers of neurons
Complex featuresSimple features
Copyright by Nguyen, Hotta and Nakagawa 10
Pattern recognition and Machine Learning Introduction to Neural Networks
MLP as a learnable classifier
Output corresponding to an input is constrained
by weighted connection
These weights are learnable (adjustable)
Input
layer
Hidden
layer Output
layer
Weights
(W)
π₯1
π₯π
π₯2
input output
π§1
π§2
π Neural Networks (W) π
π = β(π,π)
Output Input Weight
Copyright by Nguyen, Hotta and Nakagawa 11
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning ability of neural networks
Linear vs Non-linear
With linear activation function: can only learn linear
function
With non-linear activation function: can learn non-
linear function
sigmoid tanh relulinear
Copyright by Nguyen, Hotta and Nakagawa 12
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning ability of neural network
Universal approximation theorem [Hornik, 1991]:
MLP can learn arbitrary function with a single
hidden layer
For complex functions, however, may require large
hidden layer
Deep neural network
Contains many hidden layers, can extract complex
features Hidden
layers Output layer
Input
layer
Copyright by Nguyen, Hotta and Nakagawa 13
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning in Neural Networks
Weighted connection is tuned using the training
data <input, target>
Objective: Networks could output correct targets
corresponding to inputs
Input
patternTarget
Training
dataset
b
Copyright by Nguyen, Hotta and Nakagawa 14
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning in Neural Networks
Loss function (objective function)
Difference between output and target
Learning: optimization process
Minimize the loss (make output match target)
π₯1
π₯π
π₯2
inputoutput
π§1
π§π
Target
π‘1
π‘π
LossοΌLοΌ
πΏ = π β π= π β β π,π= π(π)
Loss
InputWeights
OutputTarget
Input
layer
Hidden
layer Output
layer
Weights
(W)
Copyright by Nguyen, Hotta and Nakagawa 15
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning in Neural Networks
Gradient vector of π for ποΌπ»ππ
Weight update
Reverse gradient direction
π»ππ =ππ π
ππ
ππ’ππππ‘π = πππ’πππππ‘ β πππ π
ππ
π π
π»
π
π:learning rate
Copyright by Nguyen, Hotta and Nakagawa 16
Pattern recognition and Machine Learning Introduction to Neural Networks
Loss function
Logistic regression
Probabilistic loss function
Binary entropy
Cross entropy
Multimodal
Mean square error
Copyright by Nguyen, Hotta and Nakagawa 17
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning & converge
By update weight using gradient, loss is reduced
and converge to minima
π€
π π€
π€0π€1
βπ€
π€2
π€3
Copyright by Nguyen, Hotta and Nakagawa 18
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning through all training samples
After updating weights, new training samples is
fed to the networks to continue learning
When all training samples is learnt, networks has
completed one epoch. Networks must run
through many epochs to converge.
Weight update strategy
Stochastic gradient descent (SGD)
Batch update
Mini-batch
Copyright by Nguyen, Hotta and Nakagawa 19
Pattern recognition and Machine Learning Introduction to Neural Networks
Momentum Optimizer
Learning may stuck on a local
minima.
Momentum: βπ€ retains the latest
optimizing direction. It may help
the optimizer overcome the local
minima.
π€
π π€
π€0π€1
βπ€
ππ’ππππ‘π = πππ’πππππ‘ β πππ π
ππ+ πΌβπ€
π: learning rateπΌ: momentum parameter
Copyright by Nguyen, Hotta and Nakagawa 20
Pattern recognition and Machine Learning Introduction to Neural Networks
Overfitting & Generalization
While training, model complexity increases
through each epoch
Overfitting:
β’ Model is over-complex
β’ Poor generalization: good performance on train set but poor
on test set
Loss
Epochs
Accuracy1.0
0
test
train
Copyright by Nguyen, Hotta and Nakagawa 21
Pattern recognition and Machine Learning Introduction to Neural Networks
Prevent overfitting: Regularization
Weight decaying
Weight noise
Early stopping
Evaluate performance on a validation set
Stop while there is no improvement on validation set
validation
train
Loss
Copyright by Nguyen, Hotta and Nakagawa 22
Pattern recognition and Machine Learning Introduction to Neural Networks
Prevent overfitting: Regularization
Dropout
Randomly drop the neurons with a predefined
probability
Good regularization: large ensembles of networks
Bayesian perspective
Copyright by Nguyen, Hotta and Nakagawa 23
Pattern recognition and Machine Learning Introduction to Neural Networks
Adaptive learning rate
Adam optimizer