Top Banner
How is generalization possible? Necessary conditions for good generalization. 1.The function you are trying to learn be, in some sense, smooth. In other words, a small change in the inputs should, most of the time, produce a small change in the outputs. 2. The training cases be a sufficiently large and representative subset of the set of all cases that you want to generalize
15

INTELLIGENT

Dec 24, 2015

Download

Documents

Shivan Biradar

INTELLIGENT SYSTEMS AND CONTROL slides for electrical and electrinic engineering students. by gopinath pillai
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: INTELLIGENT

How is generalization possible?

Necessary conditions for good generalization.

1.The function you are trying to learn be, in some sense, smooth. In other words, a small change in the inputs should, most of the time, produce a small change in the outputs.

2. The training cases be a sufficiently large and representative subset of the set of all cases that you want to generalize

Page 2: INTELLIGENT

11 data points obtained by sampling h(x) at equal intervals of x and adding random noise. Solid curve shows output of a linear network.

Page 3: INTELLIGENT

Here we use a network which has more free parameters than the earlier one. This network is more flexible. Approximation improves.

Page 4: INTELLIGENT

Here we use a network which has many free parameters than the earlier one. This complex network gives perfect fit to the training data, but gives a poor representation of the function.

Not a simple model. Not a complex model. Complexity can be controlled by controlling the free parameters.

Page 5: INTELLIGENT

Adding penalty to the error function to control the model complexity. Assume many free parameters. The total error then

where Ω is called a regularization. The parameter v controls the extent to which Ω influences the form of the solution.

In the figure the function (function with lot of flexibility) has large oscillations, and hence the function has regions of large curvature. We might therefore choose a regularization function which is large for functions with large values of the second derivative, such, as

Regularization

Page 6: INTELLIGENT

Weight Decay

• Weight decay adds a penalty term to the error function. The usual penalty is the sum of squared weights times a decay constant. Weight decay is a subset of regularization methods. The penalty term in weight decay, by definition, penalizes large weights.

• The weight decay penalty term causes the weights to converge to smaller absolute values than they otherwise would. Large weights can hurt generalization in two different ways. Excessively large weights leading to hidden units can cause the output function to be too rough, possibly with near discontinuities. Excessively large weights leading to output units can cause wild outputs far beyond the range of the data if the output activation function is not bounded to the same range as the data.

where the sum runs over all weights and biases.

Page 7: INTELLIGENT

Adding noise to improve generalization (Jitter)

Heuristically, we might expect that the noise will 'smear out' each data point and make it difficult for the network to fit individual data points precisely, and hence will reduce over-fitting

Early Stopping

Page 8: INTELLIGENT

Evaluation Methods

Page 9: INTELLIGENT
Page 10: INTELLIGENT
Page 11: INTELLIGENT
Page 12: INTELLIGENT
Page 13: INTELLIGENT
Page 14: INTELLIGENT

• Various networks are trained by minimization of an appropriate error function defined with respect to a training data set. The performance of the networks is then compared by evaluating the error function using an independent validation set, and the network having the smallest error with respect to the validation set is selected

Page 15: INTELLIGENT

Optimization methods

• Gradient Descent

• Conjugate Gradient

• Levenberg-Marquardt

• Quasi-Newton

• Evolutionary methods