Top Banner
Concept Learning Examples Word meanings Edible foods Abstract structures (e.g., irony) glorch glorch not glorch not glorch
35

Concept Learning Examples Word meanings Edible foods Abstract structures (e.g., irony) glorch not glorch not glorch.

Jan 14, 2016

Download

Documents

Ralph West
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Concept Learning

Examples

Word meanings

Edible foods

Abstract structures (e.g., irony)

glorch glorch notglorch

notglorch

Page 2: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Supervised Approach To Concept Learning

Both positive and negative examples provided

Typical models (both in ML and Cog Sci) circa 2000 required both positive and negative examples

++ +

++ ++ +

+

++ -

-

- - -

- -

-

Page 3: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Contrast With Human Learning Abiliites

Learning from positive examples only

Learning from a small number of examples

E.g., word meanings

E.g., learning appropriate social behavior

E.g., instruction on some skill

What would it mean to learn from asmall number of positive examples?

+

+

+

Page 4: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Tenenbaum (1999)

Two dimensional continuous feature space

Concepts defined by axis-parallel rectangles

e.g., feature dimensions

cholesterol level

insulin level

e.g., concept

healthy

Page 5: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

+

+

+

Learning Problem

Given a set of given a set of n examples,X = {x1, x2, x3, …, xn}, which are instances of the concept…

Will some unknown example Y also be an instance of the concept?

Problem of generalization

12

3

Page 6: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Hypothesis (Model) Space

H: all rectangles on the plane,parameterized by (l1, l2, s1, s2)

h: one particular hypothesis Note: |H| = ∞

Consider all hypotheses in parallel In contrast to non-Bayesian approach of maintaining only the best hypothesisat any point in time.

Page 7: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Prediction Via Model Averaging

Will some unknown input y be in the concept given examples X = {x1, x2, x3, …, xn}?

Q: y is a positive example of the concept domain(Q) = {true, false}

P(Q | X) = ⌠h p(Q & h | X) dh

P(Q & h | X) = p(Q | h, X) p(h | X)

P(Q | h, X) = P(Q | h) = 1 if y is in h

p(h | X) ~ P(X | h) p(h)

priorlikelihood

Chain rule

Marginalization

Conditional independence and deterministic concepts

Bayes rule

Page 8: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Prior p(h)Prior should be location invariant

Uninformative prior

depends only onrectangle area

Expected size prior

Other possibilities too…

xExpected size prior

Page 9: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Likelihood Function p(X | h)

X = set of n examples

Size principle

Page 10: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Generalization Gradients

MIN: smallest hypothesis consistent with data

weak Bayes: instead of using size principle, assumes examples are produced by process independent of the true class

Dark line =50% prob.

Page 11: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Experimental Design

Subjects shown n dots on screen that are “randomly chosen examples from some rectangle of healthy levels”

n drawn from {2, 3, 4, 6, 10, 50}

Dots varied in horizontal and vertical range

r drawn from {.25, .5, 1, 2, 4, 8} units in a 24 unit window

Task

draw the ‘true’ rectangle around the dots

Page 12: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Experimental Results

Page 13: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Number Game

Experimenter picks integer arithmetic concept C

E.g., prime number

E.g., number between 10 and 20

E.g., multiple of 5

Experimenter presents positive examples drawn at random from C, say, in range [1, 100]

Participant asked whether some new test case belongs in C

Page 14: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Empirical Predictive Distributions

Page 15: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Hypothesis Space

Even numbers

Odd numbers

Squares

Multiples of n

Ends in n

Powers of n

All numbers

Intervals [n, m] for n>0, m<101

Powers of 2, plus 37

Powers of 2, except for 32

Page 16: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

•Observation = 16

•Likelihood function Size principle

•Prior Intuition

Page 17: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

•Observation = 16 8 2 64

•Likelihood function Size principle

•Prior Intuition

Page 18: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Posterior Distribution After Observing 16

Page 19: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Model Vs. Human Data

MODEL

HUMANDATA

Page 20: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Summary of Tenenbaum (1999)

Method

Pick prior distribution (includes hypothesis space)

Pick likelihood function (size principle)

Leads to predictions for generalization as a function of r (range) and n (number of examples)

Claims people generalize optimally given assumptions about priors and likelihood

Bayesian approach provides best description of how people generalize on rectangle task.

Explains how people can learn from a small number of examples, and only positive examples.

Page 21: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Important Ideas in Bayesian Models

Generative theory captures process that produces observations

Prior

Likelihood Consideration of multiple hypotheses in parallel

Potentially infinite hypothesis space Inference

Role of priors diminishes with amount of evidence

Prediction via model (hypothesis) averaging

Explaining away Learning

just another form of inference

trade off between model simplicity and fit to data Bayesian Occam’s Razor

Page 22: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Ockham's Razor

If two hypotheses are equally consistent with the data, prefer the simpler one.

Simplicity

can accommodate fewer observations

smoother

fewer parameters

restricts predictions more(“sharper” predictions)

Examples 1st vs. 4th order polynomial small rectangle vs. large rectangle

in Tenenbaum model

H 1

H 0

H 1H 0

medieval philosopherand monk

tool for cutting(metaphorical)

Page 23: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Motivating Ockham's Razor

Aesthetic considerations

A theory with mathematical beauty is more likely to be right (or believed) than an ugly one, given that both fit the same data.

Past empirical success of the principle

Develop inference techniques (e.g., Bayesian reasoning) that automatically incorporate Ockham's razor

Two theories H1 and H

2

PRIORS

LIKELIHOODS

Page 24: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Ockham's Razor with Priors

Jeffreys (1939) probabililty text more complex hypotheses should have lower priors

Requires a numerical rule for assessing complexity

 e.g., number of free parameters e.g., Vapnik-Chervonenkis (VC) dimension

Page 25: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Subjective vs. Objective Priors

subjective or informative prior specific, definite information about a random variable

objective or uninformative prior vague, general information

Philosophical arguments for certain priors as uninformative Maximum entropy / least committment

e.g., interval [a b]: uniform

e.g., interval [0, ∞) with mean 1/λ: exponential distribution

e.g., mean μ and std deviation σ: Gaussian

Independence of measurement scale

e.g., Jeffrey’s prior 1/(θ(1-θ)) for θ in [0,1]expresses same belief whether we talkabout θ or logθ

Page 26: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Ockham’s Razor Via Likelihoods

Coin flipping example H

1: coin has two heads

H2: coin has a head and a tail

Consider 5 flips producing HHHHH H

1 could produce only this sequence

H2 could produce HHHHH, but also HHHHT,

HHHTH, ... TTTTT P(HHHHH | H

1) = 1, P(HHHHH | H

2) = 1/32

H2 pays the price of having a lower likelihood via the fact it can accommodate a greater range of observations

H1 is more readily rejected by observations

Page 27: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Simple and Complex Hypotheses

H2

H1

Page 28: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Bayes Factor

BIC is approximation to Bayes factor

A.k.a. likelihood ratio

Note: “model”and “hypothesis”are generallyinterchangeable

Page 29: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Hypothesis Classes Varying In ComplexityE.g., 1st, 2nd, and 3d order polynomials

Hypothesis class is parameterized by w

v

Page 30: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.
Page 31: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Rissanen (1976)Minimum Description Length

Prefer models that can communicate the data in the smallest number of bits.

The preferred hypothesis H for explaining data D minimizes:

 (1) length of the description of the hypothesis  (2) length of the description of the data with the help of the chosen theory

L: length

Page 32: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

MDL & BayesL: some measure of length (complexity)

MDL: prefer hypothesis that min. L(H) + L(D|H)

Bayes rule implies MDL principle P(H|D) = P(D|H)P(H) / P(D) –log P(H|D) = –log P(D|H) – log P(H) + log P(D) = L(D|H) + L(H) + const

Page 33: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.
Page 34: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Relativity ExampleExplain deviation in Mercury's orbit at perihelion with respect to prevailing theory

 E: Einstein's theory α = true deviationF: fudged Newtonian theory a = observed deviation

Page 35: Concept Learning Examples  Word meanings  Edible foods  Abstract structures (e.g., irony) glorch not glorch not glorch.

Relativity Example (Continued)

Subjective Ockham's razor result depends on one's belief about P(α|F)

Objective Ockham's razor

 for Mercury example, RHS is 15.04

Applies to generic situation