Outline

Empirical studies on the online learning algorithms based on combining weight noise

injection and weight decay

Advisor: Dr. John SumStudent: Allen Liang

Outline

• Introduction

• Learning Algorithms

• Experiments

• Conclusion

Introduction

Background

• Neural network (NN) is a network system composed of interconnected neurons.

• Learning aims to make a NN achieving good generalization (small prediction error).

• Fault tolerant is an unavoidable issue that must be considered in hardware implementation.

– Multiplicative weight noise or additive weight noise.

– Weight could be randomly breaking down.

– Hidden node could be out of work (stuck-at-zero & stuck-at-one).

• To have network still workable with graceful degradation in the presence of noise/faults.

Weight noise injection during training

• Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP–By simulation: convergence, fault tolerance–By theoretical analysis: effect of weight noise on

the prediction error of a MLP

• A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993.

• A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994. 6

Weight noise injection during training (Cont.)

• Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN–By simulation: convergence and generalization–By theoretical analysis: effect of weight noise on

the prediction error of a RNN

• Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.

Regularization

• Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized.–Online learning algorithm is developed by the idea

of gradient descent–No noise is injected during training

• J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000

• J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000

Regularization (Cont.)

• Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function–Similar to Bernier et al approach. But, the weighting

factor for the regularizer can be determined by the noise variance

–Online learning is developed by the idea of gradient descent

–No noise is injected during training• J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error

of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.

Misconception • Ho, Leung, & Sum (2009-): Convergence?

– Show that the work by G. An (1996) is incomplete.• Essentially, his work is identical to the works done by Murray &

Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived.

– By theoretical analysis, injecting weight noise during training a RBF has no use.

– By simulation, MSE converges but weights might not converge.

– Injecting weight noise and weight decay during training can improve convergence

• K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press.

• K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009.

• J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.

Objective

• Investigate the fault tolerance and convergence of a NN that is trained by the method of – combining weight noise injection and adding weight decay during

BPA training.

• Compared the results with the NN being trained by – BPA training– weight noise injection during BPA training– adding weight decay during BPA training

• Focus on multiple layer perceptron (MLP) network• Multiplicative and additive weight noise injections

Learning Algorithms

• BPA for linear output MLP (BPA1)

• BPA1 with weight decay

• BPA for sigmoid output MLP (BPA2)

• BPA2 with weight decay

• Weight noise injection training algorithms

• Data set:

• Hidden node output:

• MLP output:

– ps.

BPA 1 (Cont.)

• Objective function:

• Update equation:

– For j = 1, ... , n

BPA 1 with weight decay

– For j = 1, ... , n

• Data set:

• Hidden node output:

• MLP output:

– where

• ps. 16

BPA 2 (Cont.)

– For j = 1, ... , n

BPA 2 with weight decay

– For j = 1, ... , n

Weight noise injection training algorithms

–For multiplicative weight noise injection

–For additive weight noise injection

Experiments

• Data sets

• Methodology

• Results

Date sets

2D mapping

Mackey-Glass

Astrophysical Data

Character Recognition

Methodology

• Training– BPA

– BPA with weight noise injection

– BPA with adding weight decay

– BPA with weight noise injection with weight decay

• Fault tolerance– MWNI-based training: effect of multiplicative weight noise on the

prediction error of the trained MLP

– AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP

• Convergence of the weight vectors 28

Methodology

2D mapping (MWN)

2D mapping (AWN)

Mackey-Glass (MWN)

Mackey-Glass (AWN)

NAR (MWN)

NAR (AWN)

Astrophysical (MWN)

Astrophysical (AWN)

XOR (MWN)

XOR (AWN)

Character recognition (MWN)

Character recognition (AWN)

Summary

Astrophysical data

Conclusion

• For convergence, if we inject appropriate weight noise and adding appropriate weight decay during training it can ensure that the weights will converge.

• The fault tolerance of a MLP can also be improved for the most data sets

Thank You

Outline

weight noise injection

effect of weight noise

synaptic weight noise

multiplicative weight

additive weight noise

analysis of noise

weight deviations

noise varianceonline

Documents

FAA Airport Pavement Technology Program Outline Outline -...

Time to Get Organized (Chapter 12). Types of Outlines...

THE WRITING PROCESS Outlining What, Why, and How? 4 4...

Logistic regression Outline Outline Outline - clic-cimec

Outline Maps · 2018. 12. 12. · Outline Maps Outline Map....

AMASHAYA ANATOMY. Click to edit the outline text format ...

TECHNICAL PROPOSAL OUTLINE Business … PROPOSAL OUTLINE...

Day 1 Whole essay outline Body Paragraph Outline.

Outline: todayÕs topic Outline: entire course Outline ...

Click to edit Master subtitle style CHROMOSOMES. Click to...

New Outline By Vijay Singh Chauhan...

Our Sustainability Journey October 2015. Click to edit the.....

1.10.2009 – 30.4.2012. Click to edit the outline text...

2013 Strategy for Ecommerce Merchants. Click to edit the...

Advanced Computer Graphics Goals Course Outline Course...

Factory Outline Outline of Sakai Works