Top Banner
Part III Hierarchical Bayesian Models
114

Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Part IIIHierarchical Bayesian Models

Page 2: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

][

][][

Phrase structure

Utterance

Speech signal

Grammar

Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

Page 3: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

(Han and Zhu, 2006)

Vision

Page 4: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Principles

Structure

Data

Whole-object principleShape biasTaxonomic principleContrast principleBasic-level bias

Word learning

Page 5: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Hierarchical Bayesian models

• Can represent and reason about knowledge at multiple levels of abstraction.

• Have been used by statisticians for many years.

Page 6: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Hierarchical Bayesian models• Can represent and reason about knowledge

at multiple levels of abstraction.• Have been used by statisticians for many

years.• Have been applied to many cognitive

problems:– causal reasoning (Mansinghka et al, 06)

– language (Chater and Manning, 06)

– vision (Fei-Fei, Fergus, Perona, 03)

– word learning (Kemp, Perfors, Tenenbaum,06)

– decision making (Lee, 06)

Page 7: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Outline

• A high-level view of HBMs

• A case study– Semantic knowledge

Page 8: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

][

][][

Phrase structure

Utterance

Speech signal

Grammar

Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

P(phrase structure | grammar)

P(utterance | phrase structure)

P(speech | utterance)

P(grammar | UG)

Page 9: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Hierarchical Bayesian model

P(G|U)

P(s|G)

P(u|s)

Page 10: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy:

P({ui}, {si}, G | U)

= P ({ui} | {si}) P({si} | G) P(G|U)

Hierarchical Bayesian model

P(G|U)

P(s|G)

P(u|s)

Page 11: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Knowledge at multiple levels

1. Top-down inferences: – How does abstract knowledge guide

inferences at lower levels?

2. Bottom-up inferences:– How can abstract knowledge be acquired?

3. Simultaneous learning at multiple levels of abstraction

Page 12: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Top-down inferences

Given grammar G and a collection of utterances, construct a phrase structure for each utterance.

Page 13: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer {si} given {ui}, G:

P( {si} | {ui}, G) α P( {ui} | {si} ) P( {si} |G)

Top-down inferences

Page 14: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Bottom-up inferences

Given a collection of phrase structures, learn a grammar G.

Page 15: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer G given {si} and U:

P(G| {si}, U) α P( {si} | G) P(G|U)

Bottom-up inferences

Page 16: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Given a set of utterances {ui} and innate knowledge U, construct a grammar G and a phrase structure for each utterance.

Simultaneous learning at multiple levels

Page 17: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Simultaneous learning at multiple levels

• A chicken-or-egg problem: – Given a grammar, phrase structures can be

constructed– Given a set of phrase structures, a grammar

can be learned

Page 18: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer G and {si} given {ui} and U:

P(G, {si} | {ui}, U) α P( {ui} | {si} )P({si} |G)P(G|U)

Simultaneous learning at multiple levels

Page 19: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Hierarchical Bayesian model

P(G|U)

P(s|G)

P(u|s)

Page 20: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Knowledge at multiple levels

1. Top-down inferences: – How does abstract knowledge guide

inferences at lower levels?

2. Bottom-up inferences:– How can abstract knowledge be acquired?

3. Simultaneous learning at multiple levels of abstraction

Page 21: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Outline

• A high-level view of HBMs

• A case study: Semantic knowledge

Page 22: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Folk Biology

R: principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

The relationships between living kinds are well described by tree-structured representations

“Gorillas have hands”

Page 23: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Folk Biology

R: principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Structural form: tree

Page 24: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Outline

• A high-level view of HBMs

• A case study: Semantic knowledge– Property induction– Learning structured representations– Learning the abstract organizing principles of a

domain

Page 25: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Property induction

R: principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Structural form: tree

Page 26: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Property Induction

R: Principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

Structural form: treeStochastic process: diffusion

mouse ●squirrel ?

chimp ?

gorilla ?

Approach: work with the distribution P(D|S,R)

Page 27: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Property Induction

Horses have T4 cells.Elephants have T4 cells.

All mammals have T4 cells.

Horses have T4 cells.Seals have T4 cells.

All mammals have T4 cells.

Previous approaches: Rips (75), Osherson et al (90), Sloman (93), Heit (98)

Page 28: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Horse ● ○ ● ● … ● … ○Cow ? ○ ○ ● ○ ○Chimp ? ○ ○ ○ ● ○Gorilla ? ○ ○ ○ ○ ○Mouse ? ○ ○ ○ ○ ○Squirrel ? ○ ○ ○ ○ ○Dolphin ? ○ ○ ○ ○ ○Seal ? ○ ○ ○ ○ ○Rhino ? ○ ○ ○ ○ ○Elephant ● ○ ○ ○ … ○ … ○

0.04 0.01 0.02 0.001 0.04

Hypotheses

Bayesian Property Induction

Page 29: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Horse ● ○ ● ● … ● … ●Cow ? ○ ○ ● ○ ●Chimp ? ○ ○ ○ ● ●Gorilla ? ○ ○ ○ ○ ●Mouse ? ○ ○ ○ ○ ●Squirrel ? ○ ○ ○ ○ ●Dolphin ? ○ ○ ○ ○ ●Seal ? ○ ○ ○ ○ ●Rhino ? ○ ○ ○ ○ ●Elephant ● ○ ○ ○ … ○ … ●

0.04 0.01 0.02 0.001 0.04

Hypotheses

Bayesian Property Induction

Page 30: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Horse ● ○ ● ● … ● … ●Cow ? ○ ○ ● ○ ●Chimp ? ○ ○ ○ ● ●Gorilla ? ○ ○ ○ ○ ●Mouse ? ○ ○ ○ ○ ●Squirrel ? ○ ○ ○ ○ ●Dolphin ? ○ ○ ○ ○ ●Seal ? ○ ○ ○ ○ ●Rhino ? ○ ○ ○ ○ ●Elephant ● ○ ○ ○ … ○ … ●

0.04 0.01 0.02 0.001 0.04

Horses have T4 cells.Elephants have T4 cells.

Cows have T4 cells.

D

C

}

Page 31: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Choosing a prior

Chimps have T4 cells.

Gorillas have T4 cells.

Poodles can bite through wire.

Dobermans can bite through wire.

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Taxonomic similarity

Jaw strength

Food web relations

Page 32: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Bayesian Property Induction

• A challenge:– We have to specify the prior, which typically

includes many numbers

• An opportunity:– The prior can capture knowledge about the

problem.

Page 33: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Property Induction

R: Principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

Structural form: treeStochastic process: diffusion

mouse ●squirrel ?

chimp ?

gorilla ?

Page 34: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Biological properties

• Structure:– Living kinds are organized into a tree

• Stochastic process:– Nearby species in the tree tend to share

properties

Page 35: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure:

Page 36: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure:

Page 37: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Smooth Not smooth

Stochastic Process• Nearby species in the tree tend to share

properties.

• In other words, properties tend to be smooth over the tree.

Page 38: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Horse ● ○ ● ● … ● … ●Cow ? ○ ○ ● ○ ●Chimp ? ○ ○ ○ ● ●Gorilla ? ○ ○ ○ ○ ●Mouse ? ○ ○ ○ ○ ●Squirrel ? ○ ○ ○ ○ ●Dolphin ? ○ ○ ○ ○ ●Seal ? ○ ○ ○ ○ ●Rhino ? ○ ○ ○ ○ ●Elephant ● ○ ○ ○ … ○ … ●

0.04 0.01 0.02 0.001 0.04

Hypotheses

Stochastic process

Page 39: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Horse 0.5 ●Cow 0.4 ●Chimp -0.7 ○Gorilla -0.1 ○Mouse -0.4 ○Squirrel -0.4 ○Dolphin -1.5 ○Seal -1.5 ○Rhino -0.1 ○Elephant -0.1 ○

Generating a property

y h where y tends to be smooth over the tree:

threshold

Page 40: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

S

Page 41: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

The diffusion process

where Ө(yi) is 1 if yi ≥ 0 and 0 otherwise

the covariance K encourages y to be smooth over the graph S

Page 42: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Let yi be the feature value at node i}

i

j

p(y|S,R): Generating a property

(Zhu, Lafferty, Ghahramani 03)

Page 43: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Biological properties

R: Principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

Structural form: treeStochastic process: diffusion

mouse ●squirrel ?

chimp ?

gorilla ?

Approach: work with the distribution P(D|S,R)

Page 44: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Horse ● ○ ● ● … ● … ●Cow ? ○ ○ ● ○ ●Chimp ? ○ ○ ○ ● ●Gorilla ? ○ ○ ○ ○ ●Mouse ? ○ ○ ○ ○ ●Squirrel ? ○ ○ ○ ○ ●Dolphin ? ○ ○ ○ ○ ●Seal ? ○ ○ ○ ○ ●Rhino ? ○ ○ ○ ○ ●Elephant ● ○ ○ ○ … ○ … ●

0.04 0.01 0.02 0.001 0.04

Horses have T4 cells.Elephants have T4 cells.

Cows have T4 cells.

D

C

}

Page 45: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Results

Dolphins have property P.Seals have property P.

Horses have property P. (Osherson et al)

Cows have property P.Elephants have property P.

Horses have property P.

Model

Human

Page 46: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Results

Cows have property P.Elephants have property P.Horses have property P.

All mammals have property P.

Gorillas have property P.Mice have property P.Seals have property P.

All mammals have property P.

Model

Human

Page 47: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Spatial model

R: principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

Structural form: 2D spaceStochastic process: diffusion

mouse ●squirrel ?

chimp ?

gorilla ?

Page 48: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure:

Page 49: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure:

Page 50: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Tree vs 2D

“horse” “all mammals”

Tree + diffusion

2D + diffusion

Page 51: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Biological Properties

R: Principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

Structural form: treeStochastic process: diffusion

mouse ●squirrel ?

chimp ?

gorilla ?

Page 52: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Class C

Class A

Class D

Class E

Class G

Class F

Class BClass C

Class A

Class D

Class E

Class G

Class F

Class B

Class C

Class G

Class F

Class E

Class D

Class B

Class A

Three inductive contexts

R:

S:

tree +diffusion process

chain +driftprocess

network +causal transmission

“has T4 cells”“can bite through wire”

“carries E. Spirus bacteria”

Page 53: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Threshold properties• “can bite through wire”

• “has skin that is more resistant to penetration than most synthetic fibers”

HippoCat Lion Camel Elephant

Poodle Collie Doberman

(Osherson et al; Blok et al)

Page 54: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Threshold properties

• Structure:– The categories can be organized along a

single dimension

• Stochastic process:– Categories towards one end of the

dimension are more likely to have the novel property

Page 55: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Results “has skin that is more resistant to

penetration than most synthetic fibers”

(Blok et al, Smith et al)

1D + drift

1D + diffusion

Page 56: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Class C

Class A

Class D

Class E

Class G

Class F

Class BClass C

Class A

Class D

Class E

Class G

Class F

Class B

Class C

Class G

Class F

Class E

Class D

Class B

Class A

Three inductive contexts

R:

S:

tree +diffusion process

chain +driftprocess

network +causal transmission

“has T4 cells”“can bite through wire”

“carries E. Spirus bacteria”

Page 57: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Causally transmitted properties

(Medin et al; Shafto and Coley)

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Salmon

Grizzly bear

Page 58: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Causally transmitted properties

• Structure:– The categories can be organized into a

directed network

• Stochastic process:– Properties are generated by a noisy

transmission process

Page 59: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Experiment: disease properties

Island Mammals

(Shafto et al)

Page 60: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Results: disease properties

MammalsIsland

Web +transmission

Page 61: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Class C

Class A

Class D

Class E

Class G

Class F

Class BClass C

Class A

Class D

Class E

Class G

Class F

Class B

Class C

Class G

Class F

Class E

Class D

Class B

Class A

Three inductive contexts

R:

S:

tree +diffusion process

chain +driftprocess

network +causal transmission

“has T4 cells”“can bite through wire”

“carries E. Spirus bacteria”

Page 62: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Property Induction

R: Principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

Structural form: treeStochastic process: diffusion

mouse ●squirrel ?

chimp ?

gorilla ?

Approach: work with the distribution P(D|S,R)

Page 63: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Conclusions : property induction

• Hierarchical Bayesian models help to explain how abstract knowledge can be used for induction

Page 64: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Outline

• A high-level view of HBMs

• A case study: Semantic knowledge– Property induction– Learning structured representations– Learning the abstract organizing principles of a

domain

Page 65: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure learning

R: Principles

S: structure

D: data

Structural form: treeStochastic process: diffusion

mouse

squirrel

chimp

gorilla

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Page 66: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure learning

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Goal: find S that maximizes P(S|D,R)

Structural form: treeStochastic process: diffusion

Page 67: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure learning

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)

Structural form: treeStochastic process: diffusion

Page 68: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure learning

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)

The distributionpreviously used for property induction

Structural form: treeStochastic process: diffusion

Page 69: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

mouse

squirrel

chimp

gorilla

Generating features over the tree

mouse ●squirrel ?

chimp ?

gorilla ?

Page 70: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

mouse

squirrel

chimp

gorilla

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Generating features over the tree

Page 71: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure learning

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)

Structural form: treeStochastic process: diffusion

Page 72: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

P(S|R): Generating structures

mouse

squirrel

chimp

gorilla

mouse

squirrel

chimp

gorilla

mousesquirrel

chimp gorilla

Consistent with R Inconsistent with R

Page 73: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

P(S|R): Generating structures

mouse

squirrel

chimp

gorilla

mouse

squirrel

chimp

gorilla

mousesquirrel

chimp gorilla

Complex Simple

Page 74: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

P(S|R): Generating structures

if S inconsistent with R

otherwise

• Each structure is weighted by the number of nodes it contains:

where is the number of nodes in S

mouse

squirrel

chimp

gorilla

mouse

squirrel

chimp

gorilla

mousesquirrel

chimp gorilla

Page 75: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure Learning

P(S|D,R) will be high when:– The features in D vary smoothly

over S – S is a simple graph (a graph with

few nodes)

Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R)

R: principles

S: structure

D: data

Page 76: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structure Learning

P(S|D,R) will be high when:– The features in D vary smoothly

over S – S is a simple graph (a graph with

few nodes)

Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R)

R: principles

S: structure

D: data

Page 77: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

• Participants rated the goodness of 85 features for 48 animals

• E.g., elephant: gray hairless toughskin big bulbous longleg tail chewteeth tusks smelly walks slow strong muscle quadrapedal inactive vegetation grazer oldworld bush jungle ground timid smart group

Structure learning example

(Osherson et al)

Page 78: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Biological Data

Features

Ani

mal

s

Page 79: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Tree:

Page 80: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Spatial model

R: principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

Structural form: 2D spaceStochastic process: diffusion

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Page 81: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

2D space:

Page 82: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Conclusions: structure learning

• Hierarchical Bayesian models provide a unified framework for the acquisition and use of structured representations

Page 83: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Outline

• A high-level view of HBMs

• A case study: Semantic knowledge– Property induction– Learning structured representations– Learning the abstract organizing principles of a

domain

Page 84: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

R: principles

S: structure

D: data

mouse

squirrel

chimp

gorilla

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Structural form: treeStochastic process: diffusion

Page 85: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Ostrich

Robin

Croco

dile

Snake

Bat

Orangu

tan

Turtle

Which form is best?

Page 86: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Structural forms

Order Chain RingPartition

Hierarchy Tree Grid Cylinder

Page 87: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Goal: find S,F that maximize P(S,F|D)

could betree,2D space,ring, ….

Structural form: FStochastic process: diffusion

Page 88: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Aim: find S,F that maximize P(S,F|D) α P(D|S)P(S|F) P(F)

Uniform distribution on the set of forms

Structural form: FStochastic process: diffusion

Page 89: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

The distribution used for property induction

Structural form: FStochastic process: diffusion

Page 90: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

R: principles

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

Structural form: FStochastic process: diffusion

The distribution used for structure learning

Page 91: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

P(S|F): Generating structures from forms

if S inconsistent with F

otherwise

• Each structure is weighted by the number of nodes it contains:

where is the number of nodes in S

mouse

squirrel

chimp

gorilla

mouse

squirrel

chimp

gorilla

mousesquirrel

chimp gorilla

Page 92: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

• Simpler forms are preferred

A B C

P(S|F): Generating structures from forms

D

All possible graph structures S

P(S|F)

Chain

Grid

Page 93: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

F: form

S: structure

D: data

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

?

Goal: find S,F that maximize P(S,F|D)

?

Page 94: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

• P(S,F|D) will be high when:– The features in D vary smoothly

over S – S is a simple graph (a graph with

few nodes)– F is a simple form (a form that can

generate only a few structures)

F: form

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

Page 95: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning structural form

• P(S,F|D) will be high when:– The features in D vary smoothly

over F – S is a simple graph (a graph with

few nodes)– F is a simple form (a form that can

generate only a few structures)

F: form

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

Page 96: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Form learning: Biological Data

Features

Ani

mal

s

● 33 animals, 110 features

Page 97: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Form learning: Biological Data

Page 98: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Supreme Court (Spaeth)●Votes on 1600 cases (1987-2005)

Page 99: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Color (Ekman)

Page 100: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Outline

• A high-level view of HBMs

• A case study: Semantic knowledge– Property induction– Learning structured representations– Learning the abstract organizing principles of a

domain

Page 101: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Where do priors come from?

mouse ●squirrel ?

chimp ?

gorilla ?

Page 102: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

mouse

squirrel

chimp

gorilla

mouse ●squirrel ?

chimp ?

gorilla ?

Stochastic process: diffusion

Page 103: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

mouse

squirrel

chimp

gorilla

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Structural form: treeStochastic process: diffusion

Page 104: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

mouse

squirrel

chimp

gorilla

F1 F2 F3 F4

mouse ● ○ ○ ●squirrel ● ○ ○ ?

chimp ○ ● ● ?

gorilla ○ ● ● ?

Structural form: treeStochastic process: diffusion

Page 105: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Order Chain RingPartition

Hierarchy Tree Grid Cylinder

Where do structural forms come from?

Page 106: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Where do structural forms come from?Form FormProcess Process

Page 107: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Node-replacement graph grammars

Production(Chain) Derivation

Page 108: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Node-replacement graph grammars

Production(Chain) Derivation

Page 109: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Node-replacement graph grammars

Production(Chain) Derivation

Page 110: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Where do structural forms come from?Form FormProcess Process

Page 111: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

The complete space of grammars

1

4096

... ...

Page 112: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

When can we stop adding levels?

• When the knowledge at the top level is simple or general enough that it can be plausibly assumed to be innate.

Page 113: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Conclusions

• Hierarchical Bayesian models provide a unified framework which can

– Explain how abstract knowledge is used for induction

– Explain how abstract knowledge can be acquired

Page 114: Part III Hierarchical Bayesian Models. Phrase structure Utterance Speech signal Grammar Universal Grammar Hierarchical phrase structure grammars (e.g.,

Learning abstract knowledge

Applications of hierarchical Bayesian models at this conference:

1. Semantic knowledge: Schmidt et al.– Learning the M-constraint

2. Syntax: Perfors et al.– Learning that language is hierarchically organized

3. Word learning: Kemp et al.– Learning the shape bias