Top Banner
Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004
71

Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Evolutionary Search

Artificial Intelligence

CSPP 56553

January 28, 2004

Page 2: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Agenda

• Motivation:– Evolving a solution

• Genetic Algorithms– Modeling search as evolution

• Mutation

• Crossover

• Survival of the fittest

• Survival of the most diverse

• Conclusions

Page 3: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Genetic Algorithms Applications

• Search parameter space for optimal assignment– Not guaranteed to find optimal, but can approach

• Classic optimization problems:– E.g. Traveling Salesman Problem

• Program design (“Genetic Programming”)

• Aircraft carrier landings

Page 4: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Genetic Algorithms Procedure

• Create an initial population (1 chromosome)• Mutate 1+ genes in 1+ chromosomes

– Produce one offspring for each chromosome

• Mate 1+ pairs of chromosomes with crossover• Add mutated & offspring chromosomes to pop• Create new population

– Best + randomly selected (biased by fitness)

Page 5: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Fitness

• Natural selection: Most fit survive

• Fitness= Probability of survival to next gen

• Question: How do we measure fitness?– “Standard method”: Relate fitness to quality

• :0-1; :1-9:

Chromosome Quality Fitness

1 43 11 21 1

4321

0.40.30.20.1

if iq j jii qqf

Page 6: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Crossover

• Genetic design: – Identify sets of features: 2 genes:

flour+sugar;1-9

• Population: How many chromosomes?– 1 initial, 4 max

• Mutation: How frequent?– 1 gene randomly selected, randomly mutated

• Crossover: Allowed? Yes, select random mates; cross at middle

• Duplicates? No• Survival: Standard method

Page 7: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Basic Cookie GA+Crossover Results

• Results are for 1000 random trials– Initial state: 1 1-1, quality 1 chromosome

• On average, reaches max quality (9) in 14 generations

• Conclusion:– Faster with crossover: combine good in each gene– Key: Global max achievable by maximizing each

dimension independently - reduce dimensionality

Page 8: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Solving the Moat Problem

• Problem:– No single step mutation

can reach optimal values using standard fitness (quality=0 => probability=0)

• Solution A:– Crossover can combine fit

parents in EACH gene

• However, still slow: 155 generations on average

1 2 3 4 5 4 3 2 123454321

0 0 0 0 0 0 0 20 0 0 0 0 0 0 30 0 7 8 7 0 0 40 0 8 9 8 0 0 50 0 7 8 7 0 0 40 0 0 0 0 0 0 30 0 0 0 0 0 0 22 3 4 5 4 3 2 1

Page 9: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Questions

• How can we avoid the 0 quality problem?

• How can we avoid local maxima?

Page 10: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Rethinking Fitness

• Goal: Explicit bias to best– Remove implicit biases based on quality

scale

• Solution: Rank method– Ignore actual quality values except for ranking

• Step 1: Rank candidates by quality• Step 2: Probability of selecting ith candidate, given

that i-1 candidate not selected, is constant p. – Step 2b: Last candidate is selected if no other has been

• Step 3: Select candidates using the probabilities

Page 11: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Rank Method

Chromosome Quality Rank Std. Fitness Rank Fitness

1 41 31 25 27 5

4 3 2 1 0

1 2345

0.4 0.30.20.10.0

0.667 0.2220.0740.0250.012

Results: Average over 1000 random runs on Moat problem- 75 Generations (vs 155 for standard method)

No 0 probability entries: Based on rank not absolute quality

Page 12: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Diversity

• Diversity: – Degree to which chromosomes exhibit

different genes– Rank & Standard methods look only at quality– Need diversity: escape local min, variety for

crossover– “As good to be different as to be fit”

Page 13: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Rank-Space Method

• Combines diversity and quality in fitness• Diversity measure:

– Sum of inverse squared distances in genes

• Diversity rank: Avoids inadvertent bias• Rank-space:

– Sort on sum of diversity AND quality ranks– Best: lower left: high diversity & quality

i id

2

1

Page 14: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Rank-Space Method

Chromosome Q D D Rank Q Rank Comb Rank R-S Fitness

1 43 11 21 17 5

4 3 2 1 0

1 5342

1 2345

0.667 0.0250.2220.0120.074

Diversity rank breaks tiesAfter select others, sum distances to bothResults: Average (Moat) 15 generations

0.04 0.25 0.059 0.062 0.05

1 4253

W.r.t. highest ranked 5-1

Page 15: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Genetic Algorithms

• Evolution mechanisms as search technique– Produce offspring with variation

• Mutation, Crossover

– Select “fittest” to continue to next generation• Fitness: Probability of survival

– Standard: Quality values only– Rank: Quality rank only– Rank-space: Rank of sum of quality & diversity ranks

• Large population can be robust to local max

Page 16: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning:Nearest Neighbor &

Information Retrieval SearchArtificial Intelligence

CSPP 56553

January 28, 2004

Page 17: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Agenda

• Machine learning: Introduction• Nearest neighbor techniques

– Applications: Robotic motion, Credit rating– Information retrieval search

• Efficient implementations:– k-d trees, parallelism

• Extensions: K-nearest neighbor• Limitations:

– Distance, dimensions, & irrelevant attributes

Page 18: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning

• Learning: Acquiring a function, based on past inputs and values, from new inputs to values.

• Learn concepts, classifications, values– Identify regularities in data

Page 19: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning Examples

• Pronunciation: – Spelling of word => sounds

• Speech recognition:– Acoustic signals => sentences

• Robot arm manipulation:– Target => torques

• Credit rating:– Financial data => loan qualification

Page 20: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning Characterization

• Distinctions:– Are output values known for any inputs?

• Supervised vs unsupervised learning– Supervised: training consists of inputs + true output

value» E.g. letters+pronunciation

– Unsupervised: training consists only of inputs» E.g. letters only

• Course studies supervised methods

Page 21: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning Characterization

• Distinctions:– Are output values discrete or continuous?

• Discrete: “Classification”– E.g. Qualified/Unqualified for a loan application

• Continuous: “Regression”– E.g. Torques for robot arm motion

• Characteristic of task

Page 22: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning Characterization

• Distinctions:– What form of function is learned?

• Also called “inductive bias”• Graphically, decision boundary• E.g. Single, linear separator

– Rectangular boundaries - ID trees– Vornoi spaces…etc…

+ + + - - -

Page 23: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning Functions

• Problem: Can the representation effectively model the class to be learned?

• Motivates selection of learning algorithm

++ + + + +

- - - - - - - - -

For this function,Linear discriminant is GREAT!Rectangular boundaries (e.g. ID trees)

TERRIBLE!

Pick the right representation!

Page 24: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning Features

• Inputs: – E.g.words, acoustic measurements, financial

data– Vectors of features:

• E.g. word: letters – ‘cat’: L1=c; L2 = a; L3 = t

• Financial data: F1= # late payments/yr : Integer• F2 = Ratio of income to expense:

Real

Page 25: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Machine Learning Features

• Question: – Which features should be used?– How should they relate to each other?

• Issue 1: How do we define relation in feature space if features have different scales? – Solution: Scaling/normalization

• Issue 2: Which ones are important?– If differ in irrelevant feature, should ignore

Page 26: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Complexity & Generalization

• Goal: Predict values accurately on new inputs• Problem:

– Train on sample data– Can make arbitrarily complex model to fit– BUT, will probably perform badly on NEW data

• Strategy:– Limit complexity of model (e.g. degree of equ’n)– Split training and validation sets

• Hold out data to check for overfitting

Page 27: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor

• Memory- or case- based learning

• Supervised method: Training– Record labeled instances and feature-value vectors

• For each new, unlabeled instance– Identify “nearest” labeled instance– Assign same label

• Consistency heuristic: Assume that a property is the same as that of the nearest reference case.

Page 28: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor Example

• Problem: Robot arm motion– Difficult to model analytically

• Kinematic equations – Relate joint angles and manipulator positions

• Dynamics equations– Relate motor torques to joint angles

– Difficult to achieve good results modeling robotic arms or human arm

• Many factors & measurements

Page 29: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor Example

• Solution: – Move robot arm around– Record parameters and trajectory segment

• Table: torques, positions,velocities, squared velocities, velocity products, accelerations

– To follow a new path:• Break into segments • Find closest segments in table• Get those torques (interpolate as necessary)

Page 30: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor Example

• Issue: Big table– First time with new trajectory

• “Closest” isn’t close• Table is sparse - few entries

• Solution: Practice– As attempt trajectory, fill in more of table

• After few attempts, very close

Page 31: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Roadmap

• Problem: – Matching Topics and Documents

• Methods:– Classic: Vector Space Model

• Challenge I: Beyond literal matching– Expansion Strategies

• Challenge II: Authoritative source– Page Rank– Hubs & Authorities

Page 32: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Matching Topics and Documents

• Two main perspectives:– Pre-defined, fixed, finite topics:

• “Text Classification”

– Arbitrary topics, typically defined by statement of information need (aka query)

• “Information Retrieval”

Page 33: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Three Steps to IR● Three phases:

– Indexing: Build collection of document representations

– Query construction:● Convert query text to vector

– Retrieval:● Compute similarity between query and doc

representation● Return closest match

Page 34: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Matching Topics and Documents

• Documents are “about” some topic(s)• Question: Evidence of “aboutness”?

– Words !!• Possibly also meta-data in documents

– Tags, etc

• Model encodes how words capture topic– E.g. “Bag of words” model, Boolean matching– What information is captured?– How is similarity computed?

Page 35: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Models for Retrieval and Classification

• Plethora of models are used

• Here:– Vector Space Model

Page 36: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Vector Space Information Retrieval

• Task:– Document collection– Query specifies information need: free text– Relevance judgments: 0/1 for all docs

• Word evidence: Bag of words– No ordering information

Page 37: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Vector Space Model

Computer

Tv

Program

Two documents: computer program, tv programQuery: computer program : matches 1 st doc: exact: distance=2 vs 0 educational program: matches both equally: distance=1

Page 38: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Vector Space Model

• Represent documents and queries as– Vectors of term-based features

• Features: tied to occurrence of terms in collection

– E.g.

• Solution 1: Binary features: t=1 if present, 0 otherwise– Similiarity: number of terms in common

• Dot product

),...,,();,...,,( ,,2,1,,2,1 kNkkkjNjjj tttqtttd

ji

N

ikijk ttdqsim ,

1,),(

Page 39: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Question

• What’s wrong with this?

Page 40: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Vector Space Model II

• Problem: Not all terms equally interesting– E.g. the vs dog vs Levow

• Solution: Replace binary term features with weights– Document collection: term-by-document matrix

– View as vector in multidimensional space• Nearby vectors are related

– Normalize for vector length

),...,,();,...,,( ,,2,1,,2,1 kNkkkjNjjj wwwqwwwd

Page 41: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Vector Similarity Computation

• Similarity = Dot product

• Normalization:– Normalize weights in advance– Normalize post-hoc

ji

N

ikijkjk wwdqdqsim ,

1,),(

N

i ji

N

i ki

N

i jikijk

ww

wwdqsim

1

2,1

2,

1 ,,),(

Page 42: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Term Weighting

• “Aboutness”– To what degree is this term what document is about?– Within document measure– Term frequency (tf): # occurrences of t in doc j

• “Specificity”– How surprised are you to see this term?– Collection frequency– Inverse document frequency (idf):

)log(i

i n

Nidf

ijiji idftfw ,,

Page 43: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Term Selection & Formation

• Selection:– Some terms are truly useless

• Too frequent, no content– E.g. the, a, and,…

– Stop words: ignore such terms altogether

• Creation:– Too many surface forms for same concepts

• E.g. inflections of words: verb conjugations, plural

– Stem terms: treat all forms as same underlying

Page 44: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Key Issue

• All approaches operate on term matching– If a synonym, rather than original term, is used,

approach fails

• Develop more robust techniques– Match “concept” rather than term

• Expansion approaches– Add in related terms to enhance matching

• Mapping techniques– Associate terms to concepts

» Aspect models, stemming

Page 45: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Expansion Techniques

• Can apply to query or document

• Thesaurus expansion– Use linguistic resource – thesaurus, WordNet

– to add synonyms/related terms

• Feedback expansion– Add terms that “should have appeared”

• User interaction– Direct or relevance feedback

• Automatic pseudo relevance feedback

Page 46: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Query Refinement

• Typical queries very short, ambiguous– Cat: animal/Unix command– Add more terms to disambiguate, improve

• Relevance feedback– Retrieve with original queries– Present results

• Ask user to tag relevant/non-relevant

– “push” toward relevant vectors, away from nr

– β+γ=1 (0.75,0.25); r: rel docs, s: non-rel docs– “Roccio” expansion formula

S

kk

R

jjii sS

rR

qq11

1

Page 47: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Compression Techniques

• Reduce surface term variation to concepts• Stemming

– Map inflectional variants to root• E.g. see, sees, seen, saw -> see• Crucial for highly inflected languages – Czech, Arabic

• Aspect models– Matrix representations typically very sparse– Reduce dimensionality to small # key aspects

• Mapping contextually similar terms together• Latent semantic analysis

Page 48: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Authoritative Sources

• Based on vector space alone, what would you expect to get searching for “search engine”?– Would you expect to get Google?

Page 49: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Issue

Text isn’t always best indicator of content

Example:

• “search engine” – Text search -> review of search engines

• Term doesn’t appear on search engine pages• Term probably appears on many pages that point

to many search engines

Page 50: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Hubs & Authorities

• Not all sites are created equal– Finding “better” sites

• Question: What defines a good site?– Authoritative– Not just content, but connections!

• One that many other sites think is good• Site that is pointed to by many other sites

– Authority

Page 51: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Conferring Authority

• Authorities rarely link to each other– Competition

• Hubs:– Relevant sites point to prominent sites on topic

• Often not prominent themselves• Professional or amateur

• Good Hubs Good Authorities

Page 52: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Computing HITS

• Finding Hubs and Authorities

• Two steps:– Sampling:

• Find potential authorities

– Weight-propagation:• Iteratively estimate best hubs and authorities

Page 53: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Sampling

• Identify potential hubs and authorities– Connected subsections of web

• Select root set with standard text query

• Construct base set:– All nodes pointed to by root set– All nodes that point to root set

• Drop within-domain links

– 1000-5000 pages

Page 54: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Weight-propagation

• Weights:– Authority weight: – Hub weight:

• All weights are relative

• Updating:

• Converges • Pages with high x: good authorities; y: good hubs

pxpy

pqqpp

pqqpp

xy

yx

,

,

Page 55: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Google’s PageRank

• Identifies authorities– Important pages are those pointed to by many other

pages• Better pointers, higher rank

– Ranks search results

– t:page pointing to A; C(t): number of outbound links• d:damping measure

– Actual ranking on logarithmic scale– Iterate

))(/)(...)(/)(()1()( 11 nn tCtprtCtprddApr

Page 56: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Contrasts

• Internal links– Large sites carry more weight

• If well-designed

– H&A ignores site-internals

• Outbound links explicitly penalized

• Lots of tweaks….

Page 57: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Web Search

• Search by content– Vector space model

• Word-based representation• “Aboutness” and “Surprise”• Enhancing matches• Simple learning model

• Search by structure– Authorities identified by link structure of web

• Hubs confer authority

Page 58: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor Example II

• Credit Rating:– Classifier: Good /

Poor– Features:

• L = # late payments/yr; • R = Income/Expenses

Name L R G/P

A 0 1.2 G

B 25 0.4 P

C 5 0.7 G

D 20 0.8 PE 30 0.85 P

F 11 1.2 G

G 7 1.15 GH 15 0.8 P

Page 59: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor Example II

Name L R G/P

A 0 1.2 G

B 25 0.4 P

C 5 0.7 G

D 20 0.8 PE 30 0.85 P

F 11 1.2 G

G 7 1.15 GH 15 0.8 P L

R

302010

1 A

B

C D E

FG

H

Page 60: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor Example II

L 302010

1 A

B

C D E

FG

HR

Name L R G/P

I 6 1.15

J 22 0.45

K 15 1.2

G

IP

J

??

K

Distance Measure:

Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2))

- Scaled distance

Page 61: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Efficient Implementations

• Classification cost:– Find nearest neighbor: O(n)

• Compute distance between unknown and all instances

• Compare distances

– Problematic for large data sets

• Alternative:– Use binary search to reduce to O(log n)

Page 62: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Efficient Implementation: K-D Trees

• Divide instances into sets based on features– Binary branching: E.g. > value– 2^d leaves with d split path = n

• d= O(log n)

– To split cases into sets,• If there is one element in the set, stop• Otherwise pick a feature to split on

– Find average position of two middle objects on that dimension

» Split remaining objects based on average position» Recursively split subsets

Page 63: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

K-D Trees: Classification

R > 0.825?

L > 17.5? L > 9 ?

No Yes

R > 0.6? R > 0.75? R > 1.025 ?R > 1.175 ?

NoYes No Yes

No

Poor Good

Yes No Yes

Good Poor

No Yes

Good Good

No

Poor

Yes

Good

Page 64: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Efficient Implementation:Parallel Hardware

• Classification cost:– # distance computations

• Const time if O(n) processors

– Cost of finding closest• Compute pairwise minimum, successively• O(log n) time

Page 65: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor: Issues

• Prediction can be expensive if many features

• Affected by classification, feature noise– One entry can change prediction

• Definition of distance metric– How to combine different features

• Different types, ranges of values

• Sensitive to feature selection

Page 66: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor Analysis

• Problem: – Ambiguous labeling, Training Noise

• Solution:– K-nearest neighbors

• Not just single nearest instance

• Compare to K nearest neighbors– Label according to majority of K

• What should K be?– Often 3, can train as well

Page 67: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor: Analysis

• Issue: – What is a good distance metric?– How should features be combined?

• Strategy:– (Typically weighted) Euclidean distance– Feature scaling: Normalization

• Good starting point: – (Feature - Feature_mean)/Feature_standard_deviation– Rescales all values - Centered on 0 with std_dev 1

Page 68: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor: Analysis

• Issue: – What features should we use?

• E.g. Credit rating: Many possible features– Tax bracket, debt burden, retirement savings, etc..

– Nearest neighbor uses ALL – Irrelevant feature(s) could mislead

• Fundamental problem with nearest neighbor

Page 69: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Nearest Neighbor: Advantages

• Fast training:– Just record feature vector - output value set

• Can model wide variety of functions– Complex decision boundaries– Weak inductive bias

• Very generally applicable

Page 70: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Summary

• Machine learning:– Acquire function from input features to value

• Based on prior training instances

– Supervised vs Unsupervised learning• Classification and Regression

– Inductive bias: • Representation of function to learn• Complexity, Generalization, & Validation

Page 71: Evolutionary Search Artificial Intelligence CSPP 56553 January 28, 2004.

Summary: Nearest Neighbor

• Nearest neighbor:– Training: record input vectors + output value– Prediction: closest training instance to new

data

• Efficient implementations

• Pros: fast training, very general, little bias

• Cons: distance metric (scaling), sensitivity to noise & extraneous features