Outline

Empirical studies on the online learning algorithms based on combining weight noise

injection and weight decay

Advisor: Dr. John SumStudent: Allen Liang

2

Outline

• Introduction

• Learning Algorithms

• Experiments

• Conclusion

2

Introduction

3

Background

4

• Neural network (NN) is a network system composed of interconnected neurons.

• Learning aims to make a NN achieving good generalization (small prediction error).

• Fault tolerant is an unavoidable issue that must be considered in hardware implementation.

– Multiplicative weight noise or additive weight noise.

– Weight could be randomly breaking down.

– Hidden node could be out of work (stuck-at-zero & stuck-at-one).

• To have network still workable with graceful degradation in the presence of noise/faults.

5

Weight noise injection during training

• Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP–By simulation: convergence, fault tolerance–By theoretical analysis: effect of weight noise on

the prediction error of a MLP

• A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993.

• A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994. 6

Weight noise injection during training (Cont.)

• Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN–By simulation: convergence and generalization–By theoretical analysis: effect of weight noise on

the prediction error of a RNN

• Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.

7

Regularization

• Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized.–Online learning algorithm is developed by the idea

of gradient descent–No noise is injected during training

• J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000

• J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000

8

Regularization (Cont.)

• Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function–Similar to Bernier et al approach. But, the weighting

factor for the regularizer can be determined by the noise variance

–Online learning is developed by the idea of gradient descent

–No noise is injected during training• J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error

of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.

9

Misconception • Ho, Leung, & Sum (2009-): Convergence?

– Show that the work by G. An (1996) is incomplete.• Essentially, his work is identical to the works done by Murray &

Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived.

– By theoretical analysis, injecting weight noise during training a RBF has no use.

– By simulation, MSE converges but weights might not converge.

– Injecting weight noise and weight decay during training can improve convergence

• K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press.

• K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009.

• J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.

10

Objective

• Investigate the fault tolerance and convergence of a NN that is trained by the method of – combining weight noise injection and adding weight decay during

BPA training.

• Compared the results with the NN being trained by – BPA training– weight noise injection during BPA training– adding weight decay during BPA training

• Focus on multiple layer perceptron (MLP) network• Multiplicative and additive weight noise injections

11

Learning Algorithms

• BPA for linear output MLP (BPA1)

• BPA1 with weight decay

• BPA for sigmoid output MLP (BPA2)

• BPA2 with weight decay

• Weight noise injection training algorithms

12

13

BPA 1

• Data set:

• Hidden node output:

• MLP output:

– ps.

13

14

BPA 1 (Cont.)

• Objective function:

• Update equation:

–

– For j = 1, ... , n

–

–

–

14

15

BPA 1 with weight decay



–

– For j = 1, ... , n

–

–

–

15

16

BPA 2

• Data set:

• Hidden node output:

• MLP output:

– where

• ps. 16

17

BPA 2 (Cont.)



–

– For j = 1, ... , n

–

–

–

17

18

BPA 2 with weight decay



–

– For j = 1, ... , n

–

–

–

18

19

Weight noise injection training algorithms


–For multiplicative weight noise injection

–For additive weight noise injection

19

Experiments

• Data sets

• Methodology

• Results

20

Date sets

21

2D mapping

22

Mackey-Glass

23

NAR

24

Astrophysical Data

25

XOR

26

Character Recognition

27

Methodology

• Training– BPA

– BPA with weight noise injection

– BPA with adding weight decay

– BPA with weight noise injection with weight decay

• Fault tolerance– MWNI-based training: effect of multiplicative weight noise on the

prediction error of the trained MLP

– AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP

• Convergence of the weight vectors 28

Methodology

29

2D mapping (MWN)

30

2D mapping (MWN)

31

2D mapping (AWN)

32

2D mapping (AWN)

33

Mackey-Glass (MWN)

34

Mackey-Glass (MWN)

35

Mackey-Glass (AWN)

36

Mackey-Glass (AWN)

37

NAR (MWN)

38

NAR (MWN)

39

NAR (AWN)

40

NAR (AWN)

41

Astrophysical (MWN)

42

Astrophysical (MWN)

43

Astrophysical (AWN)

44

Astrophysical (AWN)

45

XOR (MWN)

46

XOR (MWN)

47

XOR (AWN)

48

XOR (AWN)

49

Character recognition (MWN)

50

Character recognition (MWN)

51

Character recognition (AWN)

52

Character recognition (AWN)

53

Summary

54

55

Astrophysical data

56

Conclusion

• For convergence, if we inject appropriate weight noise and adding appropriate weight decay during training it can ensure that the weights will converge.

• The fault tolerance of a MLP can also be improved for the most data sets

Thank You

58

Outline

Documents

weight noise injection

effect of weight noise

synaptic weight noise

multiplicative weight

additive weight noise

analysis of noise

weight deviations

noise varianceonline