Top Banner
Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John Sum Student: Allen Liang
58

Outline

Feb 01, 2016

Download

Documents

heaton

Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John Sum Student: Allen Liang. Outline. Introduction Learning Algorithms Experiments Conclusion. 2. Introduction. Background. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Outline

Empirical studies on the online learning algorithms based on combining weight noise

injection and weight decay

Advisor: Dr. John SumStudent: Allen Liang

Page 2: Outline

2

Outline

• Introduction

• Learning Algorithms

• Experiments

• Conclusion

2

Page 3: Outline

Introduction

3

Page 4: Outline

Background

4

• Neural network (NN) is a network system composed of interconnected neurons.

• Learning aims to make a NN achieving good generalization (small prediction error).

Page 5: Outline

• Fault tolerant is an unavoidable issue that must be considered in hardware implementation.

– Multiplicative weight noise or additive weight noise.

– Weight could be randomly breaking down.

– Hidden node could be out of work (stuck-at-zero & stuck-at-one).

• To have network still workable with graceful degradation in the presence of noise/faults.

5

Page 6: Outline

Weight noise injection during training

• Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP–By simulation: convergence, fault tolerance–By theoretical analysis: effect of weight noise on

the prediction error of a MLP

• A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993.

• A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994. 6

Page 7: Outline

Weight noise injection during training (Cont.)

• Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN–By simulation: convergence and generalization–By theoretical analysis: effect of weight noise on

the prediction error of a RNN

• Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.

7

Page 8: Outline

Regularization

• Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized.–Online learning algorithm is developed by the idea

of gradient descent–No noise is injected during training

• J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000

• J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000

8

Page 9: Outline

Regularization (Cont.)

• Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function–Similar to Bernier et al approach. But, the weighting

factor for the regularizer can be determined by the noise variance

–Online learning is developed by the idea of gradient descent

–No noise is injected during training• J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error

of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.

9

Page 10: Outline

Misconception • Ho, Leung, & Sum (2009-): Convergence?

– Show that the work by G. An (1996) is incomplete.• Essentially, his work is identical to the works done by Murray &

Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived.

– By theoretical analysis, injecting weight noise during training a RBF has no use.

– By simulation, MSE converges but weights might not converge.

– Injecting weight noise and weight decay during training can improve convergence

• K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press.

• K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009.

• J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.

10

Page 11: Outline

Objective

• Investigate the fault tolerance and convergence of a NN that is trained by the method of – combining weight noise injection and adding weight decay during

BPA training.

• Compared the results with the NN being trained by – BPA training– weight noise injection during BPA training– adding weight decay during BPA training

• Focus on multiple layer perceptron (MLP) network• Multiplicative and additive weight noise injections

11

Page 12: Outline

Learning Algorithms

• BPA for linear output MLP (BPA1)

• BPA1 with weight decay

• BPA for sigmoid output MLP (BPA2)

• BPA2 with weight decay

• Weight noise injection training algorithms

12

Page 13: Outline

13

BPA 1

• Data set:

• Hidden node output:

• MLP output:

– ps.

13

Page 14: Outline

14

BPA 1 (Cont.)

• Objective function:

• Update equation:

– For j = 1, ... , n

14

Page 15: Outline

15

BPA 1 with weight decay

• Objective function:

• Update equation:

– For j = 1, ... , n

15

Page 16: Outline

16

BPA 2

• Data set:

• Hidden node output:

• MLP output:

– where

• ps. 16

Page 17: Outline

17

BPA 2 (Cont.)

• Objective function:

• Update equation:

– For j = 1, ... , n

17

Page 18: Outline

18

BPA 2 with weight decay

• Objective function:

• Update equation:

– For j = 1, ... , n

18

Page 19: Outline

19

Weight noise injection training algorithms

• Update equation:

–For multiplicative weight noise injection

–For additive weight noise injection

19

Page 20: Outline

Experiments

• Data sets

• Methodology

• Results

20

Page 21: Outline

Date sets

21

Page 22: Outline

2D mapping

22

Page 23: Outline

Mackey-Glass

23

Page 24: Outline

NAR

24

Page 25: Outline

Astrophysical Data

25

Page 26: Outline

XOR

26

Page 27: Outline

Character Recognition

27

Page 28: Outline

Methodology

• Training– BPA

– BPA with weight noise injection

– BPA with adding weight decay

– BPA with weight noise injection with weight decay

• Fault tolerance– MWNI-based training: effect of multiplicative weight noise on the

prediction error of the trained MLP

– AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP

• Convergence of the weight vectors 28

Page 29: Outline

Methodology

29

Page 30: Outline

2D mapping (MWN)

30

Page 31: Outline

2D mapping (MWN)

31

Page 32: Outline

2D mapping (AWN)

32

Page 33: Outline

2D mapping (AWN)

33

Page 34: Outline

Mackey-Glass (MWN)

34

Page 35: Outline

Mackey-Glass (MWN)

35

Page 36: Outline

Mackey-Glass (AWN)

36

Page 37: Outline

Mackey-Glass (AWN)

37

Page 38: Outline

NAR (MWN)

38

Page 39: Outline

NAR (MWN)

39

Page 40: Outline

NAR (AWN)

40

Page 41: Outline

NAR (AWN)

41

Page 42: Outline

Astrophysical (MWN)

42

Page 43: Outline

Astrophysical (MWN)

43

Page 44: Outline

Astrophysical (AWN)

44

Page 45: Outline

Astrophysical (AWN)

45

Page 46: Outline

XOR (MWN)

46

Page 47: Outline

XOR (MWN)

47

Page 48: Outline

XOR (AWN)

48

Page 49: Outline

XOR (AWN)

49

Page 50: Outline

Character recognition (MWN)

50

Page 51: Outline

Character recognition (MWN)

51

Page 52: Outline

Character recognition (AWN)

52

Page 53: Outline

Character recognition (AWN)

53

Page 54: Outline

Summary

54

Page 55: Outline

55

Page 56: Outline

Astrophysical data

56

Page 57: Outline

Conclusion

• For convergence, if we inject appropriate weight noise and adding appropriate weight decay during training it can ensure that the weights will converge.

• The fault tolerance of a MLP can also be improved for the most data sets

Page 58: Outline

Thank You

58