Top Banner
Copyright by Nguyen, Hotta and Nakagawa 1 Pattern recognition and Machine Learning Introduction to Neural Networks Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology
24

Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 1

Pattern recognition and Machine Learning Introduction to Neural Networks

Introduction to Neural Networks

CUONG TUAN NGUYEN

SEIJI HOTTA

MASAKI NAKAGAWA

Tokyo University of Agriculture and Technology

Page 2: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 2

Pattern recognition and Machine Learning Introduction to Neural Networks

Pattern classification

Which category of an input?

Example: Character recognition for input images

Classifier

Output the category of an input

abc

…

y

x

z

Classifier

Feature

extraction

π‘₯1

π‘₯𝑛

π‘₯2

input output

Page 3: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 3

Pattern recognition and Machine Learning Introduction to Neural Networks

Supervised learning

Learning by a training dataset:

pair<input, target>

Testing on unseen dataset

Generalization ability

a

b

c

Input

Training dataset

Target

Page 4: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 4

Pattern recognition and Machine Learning Introduction to Neural Networks

Supervised learning

Classifier

a

abc…

yx

z

output

Prediction

Learning

Learning by a training dataset:

pair<input, target>

Testing on unseen dataset

Generalization ability

Page 5: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 5

Pattern recognition and Machine Learning Introduction to Neural Networks

Human neuron

Neural Networks, A Simple Explanationhttps://www.youtube.com/watch?v=gcK_

5x2KsLA

Page 6: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 6

Pattern recognition and Machine Learning Introduction to Neural Networks

Artificial neuron

𝑓

π‘₯1

π‘₯𝑛

π‘₯2

Input

𝑀2

𝑀1

𝑀𝑛

Weights

𝑛𝑒𝑑𝑛𝑒𝑑 =

𝑖=1

𝑛

π‘₯𝑖𝑀𝑖𝑦

𝑦 = 𝑓(𝑛𝑒𝑑)

Activation function

Weighted connections

Page 7: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 7

Pattern recognition and Machine Learning Introduction to Neural Networks

Activation function

Controls when neuron should be activated

tanhsigmoid

ReLU Leaky ReLU

linear

Page 8: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 8

Pattern recognition and Machine Learning Introduction to Neural Networks

Weighted connection + Activation function

A neuron is a feature detector: it is activated for

a specific feature

𝑓

π‘₯1

π‘₯2

Generated by:

https://playground.tensorflow.org

π‘₯1

π‘₯2

-0.82

0.49

βˆ’0.82π‘₯1 + 0.49π‘₯2 = 0

ReLU

Page 9: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 9

Pattern recognition and Machine Learning Introduction to Neural Networks

Multi-layer perceptron (MLP)

Neurons are arrange into layers

Each neuron in a layer share the same input from

preceding layer

π‘₯1

π‘₯2

Generated by:

https://playground.tensorflow.org

Layers of neurons

Complex featuresSimple features

Page 10: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 10

Pattern recognition and Machine Learning Introduction to Neural Networks

MLP as a learnable classifier

Output corresponding to an input is constrained

by weighted connection

These weights are learnable (adjustable)

Input

layer

Hidden

layer Output

layer

Weights

(W)

π‘₯1

π‘₯𝑛

π‘₯2

input output

𝑧1

𝑧2

𝑋 Neural Networks (W) 𝑍

𝑍 = β„Ž(𝑋,π‘Š)

Output Input Weight

Page 11: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 11

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning ability of neural networks

Linear vs Non-linear

With linear activation function: can only learn linear

function

With non-linear activation function: can learn non-

linear function

sigmoid tanh relulinear

Page 12: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 12

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning ability of neural network

Universal approximation theorem [Hornik, 1991]:

MLP can learn arbitrary function with a single

hidden layer

For complex functions, however, may require large

hidden layer

Deep neural network

Contains many hidden layers, can extract complex

features Hidden

layers Output layer

Input

layer

Page 13: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 13

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning in Neural Networks

Weighted connection is tuned using the training

data <input, target>

Objective: Networks could output correct targets

corresponding to inputs

Input

patternTarget

Training

dataset

b

Page 14: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 14

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning in Neural Networks

Loss function (objective function)

Difference between output and target

Learning: optimization process

Minimize the loss (make output match target)

π‘₯1

π‘₯𝑛

π‘₯2

inputoutput

𝑧1

π‘§π‘˜

Target

𝑑1

π‘‘π‘˜

Loss(LοΌ‰

𝐿 = 𝑇 βˆ’ 𝑍= 𝑇 βˆ’ β„Ž 𝑋,π‘Š= 𝑙(π‘Š)

Loss

InputWeights

OutputTarget

Input

layer

Hidden

layer Output

layer

Weights

(W)

Page 15: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 15

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning in Neural Networks

Gradient vector of 𝑙 for π‘ŠοΌšπ›»π‘Šπ‘™

Weight update

Reverse gradient direction

π›»π‘Šπ‘™ =πœ•π‘™ π‘Š

πœ•π‘Š

π‘Šπ‘’π‘π‘‘π‘Žπ‘‘π‘’ = π‘Šπ‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘ βˆ’ πœ‚πœ•π‘™ π‘Š

πœ•π‘Š

𝑙 π‘Š

𝛻

π‘Š

πœ‚:learning rate

Page 16: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 16

Pattern recognition and Machine Learning Introduction to Neural Networks

Loss function

Logistic regression

Probabilistic loss function

Binary entropy

Cross entropy

Multimodal

Mean square error

Page 17: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 17

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning & converge

By update weight using gradient, loss is reduced

and converge to minima

𝑀

𝑙 𝑀

𝑀0𝑀1

βˆ†π‘€

𝑀2

𝑀3

Page 18: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 18

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning through all training samples

After updating weights, new training samples is

fed to the networks to continue learning

When all training samples is learnt, networks has

completed one epoch. Networks must run

through many epochs to converge.

Weight update strategy

Stochastic gradient descent (SGD)

Batch update

Mini-batch

Page 19: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 19

Pattern recognition and Machine Learning Introduction to Neural Networks

Momentum Optimizer

Learning may stuck on a local

minima.

Momentum: βˆ†π‘€ retains the latest

optimizing direction. It may help

the optimizer overcome the local

minima.

𝑀

𝑙 𝑀

𝑀0𝑀1

βˆ†π‘€

π‘Šπ‘’π‘π‘‘π‘Žπ‘‘π‘’ = π‘Šπ‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘ βˆ’ πœ‚πœ•π‘™ π‘Š

πœ•π‘Š+ π›Όβˆ†π‘€

πœ‚: learning rate𝛼: momentum parameter

Page 20: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 20

Pattern recognition and Machine Learning Introduction to Neural Networks

Overfitting & Generalization

While training, model complexity increases

through each epoch

Overfitting:

β€’ Model is over-complex

β€’ Poor generalization: good performance on train set but poor

on test set

Loss

Epochs

Accuracy1.0

0

test

train

Page 21: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 21

Pattern recognition and Machine Learning Introduction to Neural Networks

Prevent overfitting: Regularization

Weight decaying

Weight noise

Early stopping

Evaluate performance on a validation set

Stop while there is no improvement on validation set

validation

train

Loss

Page 22: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 22

Pattern recognition and Machine Learning Introduction to Neural Networks

Prevent overfitting: Regularization

Dropout

Randomly drop the neurons with a predefined

probability

Good regularization: large ensembles of networks

Bayesian perspective

Page 23: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 23

Pattern recognition and Machine Learning Introduction to Neural Networks

Adaptive learning rate

Adam optimizer

Page 24: Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks Loss function (objective function) Difference between output and target Learning:

Copyright by Nguyen, Hotta and Nakagawa 24

Pattern recognition and Machine Learning Introduction to Neural Networks

Practice

GPU implementation

Keras + Tensorflow