Instance Based Learning

Instance Based Learning

Nearest Neighbor • Remember all your data

• When someone asks a question– Find the nearest old data point

– Return the answer associated with it

• In order to say what point is nearest, we have to define what we mean by "near".

• Typically, we use Euclidean distance between two points.

2)2()1(2)2(2

)1(2

2)2(1

)1(1 )(...)()( kk aaaaaa

Nominal attributes: distance is set to 1 if values are different, 0 if they are equal

Predicting Bankruptcy

Predicting Bankruptcy• Now, let's say we have a new person with R equal to 0.3 and L

equal to 2. • What y value should we predict?

And so our answer would be "no".

Scaling• The naïve Euclidean distance isn't always appropriate.

• Consider the case where we have two features describing a car.

– f1 = weight in pounds

– f2 = number of cylinders.

• Any effect of f2 will be completely lost because of the relative scales.

• So, rescale the inputs to put all of the features on about equal footing:

ii

iii vv

vva

minmax

min

Time and Space• Learning is fast

– We just have to remember the training data.

• Space is n.

• What takes longer is answering a query.

• If we do it naively, we have to, for each point in our training set (and there are n of them) compute the distance to the query point (which takes about m computations, since there are m features to compare).

• So, overall, this takes about m * n time.

Noise

Someone with an apparently healthy financial record goes bankrupt.

Remedy: K-Nearest Neighbors• k-nearest neighbor algorithm:

– Just like the old algorithm, except that when we get a query, we'll search for the k closest points to the query points.

• Output what the majority says.

– In this case, we've chosen k to be 3.

– The three closest points consist of two "no"s and a "yes", so our answer would be "no".

Find the optimal k using cross-validation

Other Variants• IB2: save memory, speed up classification

– Work incrementally

– Only incorporate misclassified instances

– Problem: noisy data gets incorporated

• IB3: deal with noise– Discard instances that don’t perform well

– Keep a record of the number of correct and incorrect classification decisions that each exemplar makes.

– Two predetermined thresholds are set on success ratio. • If the performance of exemplar falls below the low threshold it is

deleted.

• If the performance exceeds the upper threshold it is used for prediction.

Instance-based learning: IB2• IB2: save memory, speed

up classification– Work incrementally– Only incorporate

misclassified instances– Problem: noisy data

gets incorporated

Data: “Who buys gold jewelry”

(25,60,no) (45,60,no) (50,75,no) (50,100,no)

(50,120,no) (70,110,yes) (85,140,yes) (30,260,yes)

(25,400,yes) (45,350,yes) (50,275,yes) (60,260,yes)

Instance-based learning: IB2• Data:

– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes) – (50,75,no) – (50,120,no)– (70,110,yes)– (25,400,yes)– (50,100,no)– (45,350,yes)– (50,275,yes)– (60,260,yes)

This is the final answer. I.e. we memorize only these 5 points. However, let’s compute gradually the classifier.


– (25,60,no)


– (25,60,no) – (85,140,yes)

Since so far the model has only the first

instance memorized, this second instance

gets wrongly classified. So, we memorize it as

well.


– (25,60,no) – (85,140,yes) – (45,60,no)

So far the model has the two first instances memorized.

The third instance gets properly classified, since it happens to be

closer with the first. So, we don’t memorize it.


– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes)


The fourth instance gets properly classified, since it happens to be

closer with the second. So, we don’t memorize it.


– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes)– (50,75,no)


The fifth instance gets properly classified, since it happens to be

closer with the first. So, we don’t memorize it.


– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes)– (50,75,no)– (50,120,no)


The sixth instance gets wrongly classified, since it happens to be

closer with the second. So, we memorize it.

Instance-based learning: IB2• Continuing in a similar

way, we finally get, the figure in the right. – The colored points are

the one that get memorized.

This is the final answer. I.e. we memorize only these 5 points.

Instance-based learning: IB3• IB3: deal with noise

– Discard instances that don’t perform well

– Keep a record of the number of correct and incorrect classification decisions that each exemplar makes.

– Two predetermined thresholds are set on success ratio.

– An instance is used for training: • If the number of incorrect classifications is the first threshold, and

• If the number of correct classifications the second threshold.

Instance-based learning: IB3• Suppose the lower

threshold is 0, and upper threshold is 1.

• Shuffle the data first– (25,60,no)– (85,140,yes)– (45,60,no)– (30,260,yes)– (50,75,no)– (50,120,no)– (70,110,yes)– (25,400,yes)– (50,100,no)– (45,350,yes)– (50,275,yes)– (60,260,yes)

Instance-based learning: IB3• Suppose the lower

threshold is 0, and upper threshold is 1.

• Shuffle the data first– (25,60,no) [1,1] – (85,140,yes) [1,1]– (45,60,no) [0,1]– (30,260,yes) [0,2]– (50,75,no) [0,1]– (50,120,no) [0,1]– (70,110,yes) [0,0]– (25,400,yes) [0,1]– (50,100,no) [0,0]– (45,350,yes) [0,0]– (50,275,yes) [0,1]– (60,260,yes) [0,0]

Instance-based learning: IB3• The points that will be

used in classification are:– (45,60,no) [0,1]– (30,260,yes) [0,2]– (50,75,no) [0,1]– (50,120,no) [0,1]– (25,400,yes) [0,1]– (50,275,yes) [0,1]

Rectangular generalizations• When a new exemplar is classified correctly, it is generalized by

simply merging it with the nearest exemplar.

• The nearest exemplar may be either a single instance or a hyper-rectangle.

Rectangular generalizations• Data:

– (25,60,no)– (85,140,yes)– (45,60,no)– (30,260,yes)– (50,75,no)– (50,120,no)– (70,110,yes)– (25,400,yes)– (50,100,no)– (45,350,yes)– (50,275,yes)– (60,260,yes)

Classification

Class 1

Class

2

Separation line

• If the new instance lies within a rectangle then output the rectangle class

• If the new instance lies in the overlap of several rectangles, then output the class of the rectangle whose center is the closest to the new data instance.

• If the new instance lies outside any of the rectangles, output the class of the rectangle, which is the closest to the data instance.

• The distance of a point from a rectangle is:

1. If an instance lies within rectangle, d=0

2. If outside, d = distance from the closest rectangle part, i.e. distance from some point in the rectangle boundary.

Instance Based Learning

Documents