Top Banner
Learning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency M. Magdon-Ismail CSCI 4100/6100
25

Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Learning From DataLecture 17

Memory and Efficiency in Nearest Neighbor

MemoryEfficiency

M. Magdon-IsmailCSCI 4100/6100

Page 2: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

recap: Similarity and Nearest Neighbor

Similarity

d(x,x′) = ||x− x′ ||

1-NN rule

21-NN rule

1. Simple.

2. No training.

3. Near optimal Eout:

k →∞, k/N → 0 =⇒ Eout → E∗out.

4. Good ways to choose k:

k = 3; k =⌈√

N⌉

; validation/cross validation.

5. Easy to justify classification to customer.

6. Can easily do multi-class.

7. Can easily adapt to regression or logistic regression

g(x) =1

k

k∑

i=1

y[i](x) g(x) =1

k

k∑

i=1

q

y[i](x) = +1y

8. Computationally demanding.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 2 /25 Computational demands −→

Page 3: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Computational Demands of Nearest Neighbor

Memory.

Need to store all the data, O(Nd) memory.

N = 106, d = 100, double precision≈ 1GB

Finding the nearest neighbor of a test point.

Need to compute distance to every data point, O(Nd).

N = 106, d = 100, 3GHz processor ≈ 3ms (compute g(x))

N = 106, d = 100, 3GHz processor ≈ 1hr (compute CV error)

N = 106, d = 100, 3GHz processor > 1month (choose best k from among 1000 using CV)

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 3 /25 Two basic approaches −→

Page 4: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Two Basic Approaches

Reduce the amount of data.

The 5-year old does not remember every horse he has seen, only a few representative horses.

Store the data in a specialized data structure.

Ongoing research field to develop geometric data structures to make finding nearest neighbors fast.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 4 /25 Irrelevant data −→

Page 5: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Throw Away Irrelevant Data

−−−−−−→

−−−→

k = 1

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 5 /25 Decision boundary consistent −→

Page 6: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Decision Boundary Consistent

−−−−−−→

−−−→

g(x) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 6 /25 Training set consistent −→

Page 7: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Training Set Consistent

−−−−−−→

−−−→

g(xn) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 7 /25 Comparing −→

Page 8: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Decision Boundary Vs. Training Set Consistent

DB−−−−−−→

TS

−−−→

g(x) unchangedversus

g(xn) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 8 /25 Consistent 6=⇒ (g(xn) = yn) −→

Page 9: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Consistent Does Not Mean g(xn) = yn

DB−−−−−−→

TS

−−−→

k = 3

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 9 /25 Training set consistent (k = 3) −→

Page 10: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Training Set Consistent (k = 3)

−−−−−−→

−−−→

g(xn) unchanged

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 10 /25 CNN −→

Page 11: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

CNN: Condensed Nearest Neighbor (k = 3)

+ +

++

add this point

Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D

Add a red point:i. not already selected

ii. closest to the inconsistent point

1. Randomly select k data points into S .

2. Classify all data according to S .

3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.

4. Add the closest point to x∗ not in S that has class y∗.

5. Iterate until S classifies all points consistently with D.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 11 /25 CNN: add red point −→

Page 12: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

CNN: Condensed Nearest Neighbor

+ +

++

add this point

Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D

Add a red point:i. not already selected

ii. closest to the inconsistent point

1. Randomly select k data points into S .

2. Classify all data according to S .

3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.

4. Add the closest point to x∗ not in S that has class y∗.

5. Iterate until S classifies all points consistently with D.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 12 /25 CNN: algorithm −→

Page 13: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

CNN: Condensed Nearest Neighbor

+ +

++

add this point

Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D

Add a red point:i. not already selected

ii. closest to the inconsistent point

1. Randomly select k data points into S .

2. Classify all data according to S .

3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.

4. Add the closest point to x∗ not in S that has class y∗.

5. Iterate until S classifies all points consistently with D.

Minimum consistent set (MCS)?← NP-hard

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 13 /25 Digits Data −→

Page 14: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Nearest Neighbor on Digits Data

1-NN rule 21-NN rule

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 14 /25 Condensing the Digits Data −→

Page 15: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Condensing the Digits Data

1-NN rule 21-NN rule

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 15 /25 Finding the nearest neighbor −→

Page 16: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding the Nearest Neighbor

1. S1, S2 are ‘clusters’ with centers µ1,µ2 and radii r1, r2.

2. [Branch] Search S1 first→ x̂[1].

3. The distance from x to any point in S2 is at least

||x− µ2 || − r2

4. [Bound] So we are done if

||x− x̂[1] || ≤ ||x− µ2 || − r2

A branch and bound algorithm

Can be applied recursively

S1

S2

x

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 16 /25 When does the bound hold? −→

Page 17: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

When Does the Bound Hold?

Bound condition: ||x− x̂[1] || ≤ ||x− µ2 || − r2.

||x− x̂[1] || ≤ ||x− µ1 ||+ r1

So, it suffices that

r1 + r2 ≤ ||x− µ2 || − ||x− µ1 ||.

||x− µ1 || ≈ 0 means ||x− µ2 || ≈ ||µ2 − µ2 ||.

It suffices that

r1 + r2 ≤ ||µ2 − µ1 ||. S1

S2

x

within cluster spread should be less than between cluster spread

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 17 /25 Finding clusters – Lloyd’s algorithm −→

Page 18: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 18 /25 Furtherest away point −→

Page 19: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 19 /25 Next furtherest away point −→

Page 20: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 20 /25 All centers picked −→

Page 21: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 21 /25 Construct Voronoi regions −→

Page 22: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 22 /25 Update centers −→

Page 23: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 23 /25 Update Voronoi regions −→

Page 24: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Finding Clusters – Lloyd’s Algorithm

1. Pick well separated centers for each cluster.

2. Compute Voronoi regions as the clusters.

3. Update the Centers.

4. Update the Voronoi regions.

5. Compute centers and radii:

µj =1

|Sj|∑

xn∈Sj

xn; rj = maxxn∈Sj

||xn − µj ||.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 24 /25 Preview RBF −→

Page 25: Learning From Data Lecture 17 Memory and Efficiency in ...magdon/courses/LFD-Slides/SlidesLect17.pdfLearning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency

Radial Basis Functions (RBF)

k-Nearest Neighbor: Only considers k-nearest neighbors.each neighbor has equal weight

What about using all data to compute g(x)?

RBF: Use all data.data further away from x have less weight.

c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 25 /25