Top Banner
The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Grammar Induction Through Machine Learning Part 1 Linguistic Nativism Reconsidered Alexander Clark 1 and Shalom Lappin 2 1 Department of Computer Science Royal Holloway, University of London 2 Department of Philosophy King’s College, London February 5, 2008 Department of Philosophy King’s College, London Clark and Lappin Grammar Induction Through Machine Learning Part 1
69

Grammar Induction Through Machine Learning Part 1 ...

Nov 01, 2014

Download

Documents

butest

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Grammar Induction Through Machine LearningPart 1

Linguistic Nativism Reconsidered

Alexander Clark1 and Shalom Lappin2

1Department of Computer ScienceRoyal Holloway, University of London

2Department of PhilosophyKing’s College, London

February 5, 2008Department of Philosophy

King’s College, LondonClark and Lappin Grammar Induction Through Machine Learning Part 1

Page 2: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Outline

1 The Machine Learning Paradigm

2 Supervised vs. Unsupervised Learning

3 Supervised Learning with a Probabilistic Grammar

4 A Bayesian Reply to the APS

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 3: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Machine Learning Algorithms and Models of theLearning Domain

A machine learning system implements a learningalgorithm that defines a function from a domain of inputsamples to a range of output values.A corpus of examples is divided into a training and a testset.The learning algorithm is specified in conjunction with amodel of the phenomenon to be learned.This model defines the space of possible hypotheses thatthe algorithm can generate from the input data.When the values of its parameters are set through trainingof the algorithm on the test set, an element of thehypothesis space is selected.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 4: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Machine Learning Algorithms and Models of theLearning Domain

A machine learning system implements a learningalgorithm that defines a function from a domain of inputsamples to a range of output values.A corpus of examples is divided into a training and a testset.The learning algorithm is specified in conjunction with amodel of the phenomenon to be learned.This model defines the space of possible hypotheses thatthe algorithm can generate from the input data.When the values of its parameters are set through trainingof the algorithm on the test set, an element of thehypothesis space is selected.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 5: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Machine Learning Algorithms and Models of theLearning Domain

A machine learning system implements a learningalgorithm that defines a function from a domain of inputsamples to a range of output values.A corpus of examples is divided into a training and a testset.The learning algorithm is specified in conjunction with amodel of the phenomenon to be learned.This model defines the space of possible hypotheses thatthe algorithm can generate from the input data.When the values of its parameters are set through trainingof the algorithm on the test set, an element of thehypothesis space is selected.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 6: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Machine Learning Algorithms and Models of theLearning Domain

A machine learning system implements a learningalgorithm that defines a function from a domain of inputsamples to a range of output values.A corpus of examples is divided into a training and a testset.The learning algorithm is specified in conjunction with amodel of the phenomenon to be learned.This model defines the space of possible hypotheses thatthe algorithm can generate from the input data.When the values of its parameters are set through trainingof the algorithm on the test set, an element of thehypothesis space is selected.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 7: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Machine Learning Algorithms and Models of theLearning Domain

A machine learning system implements a learningalgorithm that defines a function from a domain of inputsamples to a range of output values.A corpus of examples is divided into a training and a testset.The learning algorithm is specified in conjunction with amodel of the phenomenon to be learned.This model defines the space of possible hypotheses thatthe algorithm can generate from the input data.When the values of its parameters are set through trainingof the algorithm on the test set, an element of thehypothesis space is selected.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 8: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Evaluating a Parsing AlgorithmSupervised Learning

If one has a gold standard of correct parses in a corpus,then it is possible to compute the percentage of correctparses that the algorithm produces for a blind test subpartof this corpus.A more common procedure for scoring an ML algorithm ona test set is to determine its performance for recall andprecision.The recall of a parsing algorithm A is the percentage oflabelled brackets of the test set that it correctly identifies.A’s precision is the percentage of the brackets that itreturns which correspond to those in the gold standard.A unified score for A, known as an F score, can becomputed as an average of its recall and its precision.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 9: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Evaluating a Parsing AlgorithmSupervised Learning

If one has a gold standard of correct parses in a corpus,then it is possible to compute the percentage of correctparses that the algorithm produces for a blind test subpartof this corpus.A more common procedure for scoring an ML algorithm ona test set is to determine its performance for recall andprecision.The recall of a parsing algorithm A is the percentage oflabelled brackets of the test set that it correctly identifies.A’s precision is the percentage of the brackets that itreturns which correspond to those in the gold standard.A unified score for A, known as an F score, can becomputed as an average of its recall and its precision.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 10: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Evaluating a Parsing AlgorithmSupervised Learning

If one has a gold standard of correct parses in a corpus,then it is possible to compute the percentage of correctparses that the algorithm produces for a blind test subpartof this corpus.A more common procedure for scoring an ML algorithm ona test set is to determine its performance for recall andprecision.The recall of a parsing algorithm A is the percentage oflabelled brackets of the test set that it correctly identifies.A’s precision is the percentage of the brackets that itreturns which correspond to those in the gold standard.A unified score for A, known as an F score, can becomputed as an average of its recall and its precision.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 11: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Evaluating a Parsing AlgorithmSupervised Learning

If one has a gold standard of correct parses in a corpus,then it is possible to compute the percentage of correctparses that the algorithm produces for a blind test subpartof this corpus.A more common procedure for scoring an ML algorithm ona test set is to determine its performance for recall andprecision.The recall of a parsing algorithm A is the percentage oflabelled brackets of the test set that it correctly identifies.A’s precision is the percentage of the brackets that itreturns which correspond to those in the gold standard.A unified score for A, known as an F score, can becomputed as an average of its recall and its precision.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 12: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Evaluating a Parsing AlgorithmSupervised Learning

If one has a gold standard of correct parses in a corpus,then it is possible to compute the percentage of correctparses that the algorithm produces for a blind test subpartof this corpus.A more common procedure for scoring an ML algorithm ona test set is to determine its performance for recall andprecision.The recall of a parsing algorithm A is the percentage oflabelled brackets of the test set that it correctly identifies.A’s precision is the percentage of the brackets that itreturns which correspond to those in the gold standard.A unified score for A, known as an F score, can becomputed as an average of its recall and its precision.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 13: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Learning Biases and Priors

The choice of parameters and their possible values definesa bias for the language model by imposing prior constraintson the set of learnable hypotheses.All learning requires some sort of bias to restrict the set ofpossible hypotheses for the phenomenon to be learned.This bias can express strong assumptions about the natureof the domain of learning.Alternatively, it can define comparatively weakdomain-specific constraints, with learning driven primarilyby domain-general procedures and conditions.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 14: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Learning Biases and Priors

The choice of parameters and their possible values definesa bias for the language model by imposing prior constraintson the set of learnable hypotheses.All learning requires some sort of bias to restrict the set ofpossible hypotheses for the phenomenon to be learned.This bias can express strong assumptions about the natureof the domain of learning.Alternatively, it can define comparatively weakdomain-specific constraints, with learning driven primarilyby domain-general procedures and conditions.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 15: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Learning Biases and Priors

The choice of parameters and their possible values definesa bias for the language model by imposing prior constraintson the set of learnable hypotheses.All learning requires some sort of bias to restrict the set ofpossible hypotheses for the phenomenon to be learned.This bias can express strong assumptions about the natureof the domain of learning.Alternatively, it can define comparatively weakdomain-specific constraints, with learning driven primarilyby domain-general procedures and conditions.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 16: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Learning Biases and Priors

The choice of parameters and their possible values definesa bias for the language model by imposing prior constraintson the set of learnable hypotheses.All learning requires some sort of bias to restrict the set ofpossible hypotheses for the phenomenon to be learned.This bias can express strong assumptions about the natureof the domain of learning.Alternatively, it can define comparatively weakdomain-specific constraints, with learning driven primarilyby domain-general procedures and conditions.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 17: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Prior Probability Distributions on a Hypothesis Space

One way of formalising this learning bias is a priorprobability distribution on the elements of the hypothesisspace that favours some hypotheses as more likely thanothers.The paradigm of Bayesian learning in cognitive scienceimplements this approach.The simplicity and compactness measure that Perfors et al.(2006) use is an example of a very general prior.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 18: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Prior Probability Distributions on a Hypothesis Space

One way of formalising this learning bias is a priorprobability distribution on the elements of the hypothesisspace that favours some hypotheses as more likely thanothers.The paradigm of Bayesian learning in cognitive scienceimplements this approach.The simplicity and compactness measure that Perfors et al.(2006) use is an example of a very general prior.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 19: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Prior Probability Distributions on a Hypothesis Space

One way of formalising this learning bias is a priorprobability distribution on the elements of the hypothesisspace that favours some hypotheses as more likely thanothers.The paradigm of Bayesian learning in cognitive scienceimplements this approach.The simplicity and compactness measure that Perfors et al.(2006) use is an example of a very general prior.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 20: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Learning Bias and the Poverty of Stimulus

The poverty of stimulus issue can be formulated as follows:

What are the minimal domain-specific linguistic biases thatmust be assumed for a reasonable learning algorithm tosupport language acquisition on the basis of the trainingset available to the child?

If a model with relatively weak language-specific biasescan sustain effective grammar induction, then this resultundermines poverty of stimulus arguments for a rich theoryof universal grammar.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 21: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Learning Bias and the Poverty of Stimulus

The poverty of stimulus issue can be formulated as follows:

What are the minimal domain-specific linguistic biases thatmust be assumed for a reasonable learning algorithm tosupport language acquisition on the basis of the trainingset available to the child?

If a model with relatively weak language-specific biasescan sustain effective grammar induction, then this resultundermines poverty of stimulus arguments for a rich theoryof universal grammar.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 22: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Learning Bias and the Poverty of Stimulus

The poverty of stimulus issue can be formulated as follows:

What are the minimal domain-specific linguistic biases thatmust be assumed for a reasonable learning algorithm tosupport language acquisition on the basis of the trainingset available to the child?

If a model with relatively weak language-specific biasescan sustain effective grammar induction, then this resultundermines poverty of stimulus arguments for a rich theoryof universal grammar.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 23: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Supervised Learning

When the samples of the training set are annotated withthe classifications and structures that the learningalgorithm is intended to produce as output for the test set,then learning is supervised.Supervised grammar induction involves training an MLprocedure on a corpus annotated with the parse structuresof the gold standard.The learning algorithm infers a function for assigning inputsentences to appropriate parse output on the basis of atraining set of sentence argument-parse value pairs.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 24: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Supervised Learning

When the samples of the training set are annotated withthe classifications and structures that the learningalgorithm is intended to produce as output for the test set,then learning is supervised.Supervised grammar induction involves training an MLprocedure on a corpus annotated with the parse structuresof the gold standard.The learning algorithm infers a function for assigning inputsentences to appropriate parse output on the basis of atraining set of sentence argument-parse value pairs.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 25: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Supervised Learning

When the samples of the training set are annotated withthe classifications and structures that the learningalgorithm is intended to produce as output for the test set,then learning is supervised.Supervised grammar induction involves training an MLprocedure on a corpus annotated with the parse structuresof the gold standard.The learning algorithm infers a function for assigning inputsentences to appropriate parse output on the basis of atraining set of sentence argument-parse value pairs.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 26: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Unsupervised Learning

If the test set is not marked with the properties to bereturned as output for the test set, then learning isunsupervised.

Unsupervised learning involves using clustering patternsand distributional regularities in a training set to identifystructure in the data.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 27: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Unsupervised Learning

If the test set is not marked with the properties to bereturned as output for the test set, then learning isunsupervised.

Unsupervised learning involves using clustering patternsand distributional regularities in a training set to identifystructure in the data.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 28: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Supervised Learning and Language Acquisition

It could be argued that supervised grammar induction isnot directly relevant to poverty of stimulus arguments.It requires that target parse structures be represented inthe training set, while children have no access to suchrepresentations in the data they are exposed to.If negative evidence of the sort identified by Saxton (1997),and Chouinard and Clark (2003) is available and plays arole in grammar induction, then it is possible to model theacquisition process as a type of supervised learning.If, however, children achieve language solely on the basisof positive evidence, then it is necessary to treatacquisition as unsupervised learning.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 29: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Supervised Learning and Language Acquisition

It could be argued that supervised grammar induction isnot directly relevant to poverty of stimulus arguments.It requires that target parse structures be represented inthe training set, while children have no access to suchrepresentations in the data they are exposed to.If negative evidence of the sort identified by Saxton (1997),and Chouinard and Clark (2003) is available and plays arole in grammar induction, then it is possible to model theacquisition process as a type of supervised learning.If, however, children achieve language solely on the basisof positive evidence, then it is necessary to treatacquisition as unsupervised learning.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 30: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Supervised Learning and Language Acquisition

It could be argued that supervised grammar induction isnot directly relevant to poverty of stimulus arguments.It requires that target parse structures be represented inthe training set, while children have no access to suchrepresentations in the data they are exposed to.If negative evidence of the sort identified by Saxton (1997),and Chouinard and Clark (2003) is available and plays arole in grammar induction, then it is possible to model theacquisition process as a type of supervised learning.If, however, children achieve language solely on the basisof positive evidence, then it is necessary to treatacquisition as unsupervised learning.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 31: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Supervised Learning and Language Acquisition

It could be argued that supervised grammar induction isnot directly relevant to poverty of stimulus arguments.It requires that target parse structures be represented inthe training set, while children have no access to suchrepresentations in the data they are exposed to.If negative evidence of the sort identified by Saxton (1997),and Chouinard and Clark (2003) is available and plays arole in grammar induction, then it is possible to model theacquisition process as a type of supervised learning.If, however, children achieve language solely on the basisof positive evidence, then it is necessary to treatacquisition as unsupervised learning.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 32: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Probabilistic Context-Free Grammars

A Probabilistic Context-Free Grammar (PCFG) conditionsthe probability of a child nonterminal sequence on that ofthe parent nonterminal.It provides conditional probabilities of the formP(X1 · · ·Xn | N) for each nonterminal N and sequenceX1 · · ·Xn of items from the vocabulary of the grammar.It also specifies a probability distribution over the label ofthe root of the tree Ps(N).The conditional probabilities P(X1 · · ·Xn | N) correspond toprobabilistic parameters that govern the expansion of anode in a parse tree according to a context free ruleN → X1 · · ·Xn.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 33: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Probabilistic Context-Free Grammars

A Probabilistic Context-Free Grammar (PCFG) conditionsthe probability of a child nonterminal sequence on that ofthe parent nonterminal.It provides conditional probabilities of the formP(X1 · · ·Xn | N) for each nonterminal N and sequenceX1 · · ·Xn of items from the vocabulary of the grammar.It also specifies a probability distribution over the label ofthe root of the tree Ps(N).The conditional probabilities P(X1 · · ·Xn | N) correspond toprobabilistic parameters that govern the expansion of anode in a parse tree according to a context free ruleN → X1 · · ·Xn.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 34: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Probabilistic Context-Free Grammars

A Probabilistic Context-Free Grammar (PCFG) conditionsthe probability of a child nonterminal sequence on that ofthe parent nonterminal.It provides conditional probabilities of the formP(X1 · · ·Xn | N) for each nonterminal N and sequenceX1 · · ·Xn of items from the vocabulary of the grammar.It also specifies a probability distribution over the label ofthe root of the tree Ps(N).The conditional probabilities P(X1 · · ·Xn | N) correspond toprobabilistic parameters that govern the expansion of anode in a parse tree according to a context free ruleN → X1 · · ·Xn.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 35: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Probabilistic Context-Free Grammars

A Probabilistic Context-Free Grammar (PCFG) conditionsthe probability of a child nonterminal sequence on that ofthe parent nonterminal.It provides conditional probabilities of the formP(X1 · · ·Xn | N) for each nonterminal N and sequenceX1 · · ·Xn of items from the vocabulary of the grammar.It also specifies a probability distribution over the label ofthe root of the tree Ps(N).The conditional probabilities P(X1 · · ·Xn | N) correspond toprobabilistic parameters that govern the expansion of anode in a parse tree according to a context free ruleN → X1 · · ·Xn.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 36: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Probabilistic Context-Free Grammars

The probabilistic parameter values of a PCFG can belearned from a parse annotated training corpus bycomputing the frequency of CFG rules in accordance witha Maximum Likelihood Expectation (MLE) condition.

c(A→β1...βk )c(A→γ)

Statistical models of this kind have achieved F-measures inthe low 70% range against the Penn Tree Bank.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 37: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Probabilistic Context-Free Grammars

The probabilistic parameter values of a PCFG can belearned from a parse annotated training corpus bycomputing the frequency of CFG rules in accordance witha Maximum Likelihood Expectation (MLE) condition.

c(A→β1...βk )c(A→γ)

Statistical models of this kind have achieved F-measures inthe low 70% range against the Penn Tree Bank.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 38: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Probabilistic Context-Free Grammars

The probabilistic parameter values of a PCFG can belearned from a parse annotated training corpus bycomputing the frequency of CFG rules in accordance witha Maximum Likelihood Expectation (MLE) condition.

c(A→β1...βk )c(A→γ)

Statistical models of this kind have achieved F-measures inthe low 70% range against the Penn Tree Bank.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 39: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Lexicalized Probabilistic Context-Free Grammars

It is possible to significantly improve the performance of aPCFG by adding additional bias to the language model thatit defines.Collins (1999) constructs a Lexicalized ProbabilisticContext-Free Grammar (LPCFG) in which the probabilitiesof the CFG rules are conditioning on lexical heads of thephrases that nonterminal symbols represent.In Collins’ LPCFGs nonterminals are replaced bynonterminal/head pairs.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 40: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Lexicalized Probabilistic Context-Free Grammars

It is possible to significantly improve the performance of aPCFG by adding additional bias to the language model thatit defines.Collins (1999) constructs a Lexicalized ProbabilisticContext-Free Grammar (LPCFG) in which the probabilitiesof the CFG rules are conditioning on lexical heads of thephrases that nonterminal symbols represent.In Collins’ LPCFGs nonterminals are replaced bynonterminal/head pairs.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 41: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Lexicalized Probabilistic Context-Free Grammars

It is possible to significantly improve the performance of aPCFG by adding additional bias to the language model thatit defines.Collins (1999) constructs a Lexicalized ProbabilisticContext-Free Grammar (LPCFG) in which the probabilitiesof the CFG rules are conditioning on lexical heads of thephrases that nonterminal symbols represent.In Collins’ LPCFGs nonterminals are replaced bynonterminal/head pairs.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 42: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Lexicalized Probabilistic Context-Free Grammars

The probability distributions of the model are of the formPs(N/h) and P(X1/h1 · · ·H/h · · ·Xn/hn | N/h).Collins’ LPCFG achieves an F-measure performance ofapproximately 88%.Charniak and Johnson (2005) present a LPCFG with an Fscore of approximately 91%.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 43: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Lexicalized Probabilistic Context-Free Grammars

The probability distributions of the model are of the formPs(N/h) and P(X1/h1 · · ·H/h · · ·Xn/hn | N/h).Collins’ LPCFG achieves an F-measure performance ofapproximately 88%.Charniak and Johnson (2005) present a LPCFG with an Fscore of approximately 91%.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 44: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Lexicalized Probabilistic Context-Free Grammars

The probability distributions of the model are of the formPs(N/h) and P(X1/h1 · · ·H/h · · ·Xn/hn | N/h).Collins’ LPCFG achieves an F-measure performance ofapproximately 88%.Charniak and Johnson (2005) present a LPCFG with an Fscore of approximately 91%.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 45: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bias in a LPCFG

Rather than encoding a particular categorical bias into hislanguage model by excluding certain context-free rules,Collins allows all such rules.He incorporates bias by adjusting the prior distribution ofprobabilities over the lexicalized CFG rules.The model imposes the requirements that

sentences have hierarchical constituent structure,constituents have heads that select for their siblings, andthis selection is determined by the head words of thesiblings.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 46: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bias in a LPCFG

Rather than encoding a particular categorical bias into hislanguage model by excluding certain context-free rules,Collins allows all such rules.He incorporates bias by adjusting the prior distribution ofprobabilities over the lexicalized CFG rules.The model imposes the requirements that

sentences have hierarchical constituent structure,constituents have heads that select for their siblings, andthis selection is determined by the head words of thesiblings.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 47: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bias in a LPCFG

Rather than encoding a particular categorical bias into hislanguage model by excluding certain context-free rules,Collins allows all such rules.He incorporates bias by adjusting the prior distribution ofprobabilities over the lexicalized CFG rules.The model imposes the requirements that

sentences have hierarchical constituent structure,constituents have heads that select for their siblings, andthis selection is determined by the head words of thesiblings.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 48: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bias in a LPCFG

Rather than encoding a particular categorical bias into hislanguage model by excluding certain context-free rules,Collins allows all such rules.He incorporates bias by adjusting the prior distribution ofprobabilities over the lexicalized CFG rules.The model imposes the requirements that

sentences have hierarchical constituent structure,constituents have heads that select for their siblings, andthis selection is determined by the head words of thesiblings.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 49: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bias in a LPCFG

Rather than encoding a particular categorical bias into hislanguage model by excluding certain context-free rules,Collins allows all such rules.He incorporates bias by adjusting the prior distribution ofprobabilities over the lexicalized CFG rules.The model imposes the requirements that

sentences have hierarchical constituent structure,constituents have heads that select for their siblings, andthis selection is determined by the head words of thesiblings.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 50: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

LPCFG as a Weak Bias Model

The bias that Collins, and Charniak and Johnson specifyfor their respective LPCFGs do not express the complexsyntactic parameters that have been proposed aselements of a strong bias view of universal grammar.So, for example, these models do not contain ahead-complement directionality parameter.However, they still learn the correct generalizationsconcerning head-complement order.The bias of a statistical parsing model has implications forthe design of UG.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 51: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

LPCFG as a Weak Bias Model

The bias that Collins, and Charniak and Johnson specifyfor their respective LPCFGs do not express the complexsyntactic parameters that have been proposed aselements of a strong bias view of universal grammar.So, for example, these models do not contain ahead-complement directionality parameter.However, they still learn the correct generalizationsconcerning head-complement order.The bias of a statistical parsing model has implications forthe design of UG.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 52: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

LPCFG as a Weak Bias Model

The bias that Collins, and Charniak and Johnson specifyfor their respective LPCFGs do not express the complexsyntactic parameters that have been proposed aselements of a strong bias view of universal grammar.So, for example, these models do not contain ahead-complement directionality parameter.However, they still learn the correct generalizationsconcerning head-complement order.The bias of a statistical parsing model has implications forthe design of UG.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 53: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

LPCFG as a Weak Bias Model

The bias that Collins, and Charniak and Johnson specifyfor their respective LPCFGs do not express the complexsyntactic parameters that have been proposed aselements of a strong bias view of universal grammar.So, for example, these models do not contain ahead-complement directionality parameter.However, they still learn the correct generalizationsconcerning head-complement order.The bias of a statistical parsing model has implications forthe design of UG.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 54: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bayesian Learning

Let D be data, and H a hypothesis.Maximum likelihood chooses the H which makes the Dmost likely: argmaxH P(D|H)

Posterior probability is proportional to the prior probabilitytimes the likelihood.P(H|D) ∝ P(H)P(D|H)

The maximum a posteriori approach chooses the H whichmaximises the posterior probability: argmaxH P(H)P(D|H)

The bias is explicitly represented in the prior P(H).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 55: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bayesian Learning

Let D be data, and H a hypothesis.Maximum likelihood chooses the H which makes the Dmost likely: argmaxH P(D|H)

Posterior probability is proportional to the prior probabilitytimes the likelihood.P(H|D) ∝ P(H)P(D|H)

The maximum a posteriori approach chooses the H whichmaximises the posterior probability: argmaxH P(H)P(D|H)

The bias is explicitly represented in the prior P(H).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 56: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bayesian Learning

Let D be data, and H a hypothesis.Maximum likelihood chooses the H which makes the Dmost likely: argmaxH P(D|H)

Posterior probability is proportional to the prior probabilitytimes the likelihood.P(H|D) ∝ P(H)P(D|H)

The maximum a posteriori approach chooses the H whichmaximises the posterior probability: argmaxH P(H)P(D|H)

The bias is explicitly represented in the prior P(H).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 57: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bayesian Learning

Let D be data, and H a hypothesis.Maximum likelihood chooses the H which makes the Dmost likely: argmaxH P(D|H)

Posterior probability is proportional to the prior probabilitytimes the likelihood.P(H|D) ∝ P(H)P(D|H)

The maximum a posteriori approach chooses the H whichmaximises the posterior probability: argmaxH P(H)P(D|H)

The bias is explicitly represented in the prior P(H).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 58: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Bayesian Learning

Let D be data, and H a hypothesis.Maximum likelihood chooses the H which makes the Dmost likely: argmaxH P(D|H)

Posterior probability is proportional to the prior probabilitytimes the likelihood.P(H|D) ∝ P(H)P(D|H)

The maximum a posteriori approach chooses the H whichmaximises the posterior probability: argmaxH P(H)P(D|H)

The bias is explicitly represented in the prior P(H).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 59: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition with a Weak BiasPerfors et al. (2006)

Perfors defines a very general prior that does not have abias towards constituent structure.It includes both grammars that impose hierarchicalconstituent structure and those that don’t.In general it favours smaller, simpler grammars, asexpressed in terms of the number of rules and symbols.Perfors et al. compute the posterior probability of threetypes of grammar for a subset of the CHILDES corpus.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 60: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition with a Weak BiasPerfors et al. (2006)

Perfors defines a very general prior that does not have abias towards constituent structure.It includes both grammars that impose hierarchicalconstituent structure and those that don’t.In general it favours smaller, simpler grammars, asexpressed in terms of the number of rules and symbols.Perfors et al. compute the posterior probability of threetypes of grammar for a subset of the CHILDES corpus.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 61: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition with a Weak BiasPerfors et al. (2006)

Perfors defines a very general prior that does not have abias towards constituent structure.It includes both grammars that impose hierarchicalconstituent structure and those that don’t.In general it favours smaller, simpler grammars, asexpressed in terms of the number of rules and symbols.Perfors et al. compute the posterior probability of threetypes of grammar for a subset of the CHILDES corpus.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 62: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition with a Weak BiasPerfors et al. (2006)

Perfors defines a very general prior that does not have abias towards constituent structure.It includes both grammars that impose hierarchicalconstituent structure and those that don’t.In general it favours smaller, simpler grammars, asexpressed in terms of the number of rules and symbols.Perfors et al. compute the posterior probability of threetypes of grammar for a subset of the CHILDES corpus.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 63: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition without a Constituent Structure Bias

The three types of grammar that Perfors et al consider are:

a flat grammar that generates strings directly from S withoutintermediate non-terminal symbols,a probabilistic regular grammar (PRG), anda probabilistic context free grammar (PCFG).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 64: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition without a Constituent Structure Bias

The three types of grammar that Perfors et al consider are:

a flat grammar that generates strings directly from S withoutintermediate non-terminal symbols,a probabilistic regular grammar (PRG), anda probabilistic context free grammar (PCFG).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 65: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition without a Constituent Structure Bias

The three types of grammar that Perfors et al consider are:

a flat grammar that generates strings directly from S withoutintermediate non-terminal symbols,a probabilistic regular grammar (PRG), anda probabilistic context free grammar (PCFG).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 66: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition without a Constituent Structure Bias

The three types of grammar that Perfors et al consider are:

a flat grammar that generates strings directly from S withoutintermediate non-terminal symbols,a probabilistic regular grammar (PRG), anda probabilistic context free grammar (PCFG).

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 67: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition without a Constituent Structure Bias

The PCFG receives a higher posterior probability valueand covers significantly more sentence types in the corpusthan either the PRG or the flat grammar.The grammar with maximum a posteriori probability makesthe correct generalisation.This result suggests that it may be possible to decideamong radically distinct types of grammars on the basis ofa probabilistic model with relatively weak learning priors,for a realistic data set.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 68: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition without a Constituent Structure Bias

The PCFG receives a higher posterior probability valueand covers significantly more sentence types in the corpusthan either the PRG or the flat grammar.The grammar with maximum a posteriori probability makesthe correct generalisation.This result suggests that it may be possible to decideamong radically distinct types of grammars on the basis ofa probabilistic model with relatively weak learning priors,for a realistic data set.

Clark and Lappin Grammar Induction Through Machine Learning Part 1

Page 69: Grammar Induction Through Machine Learning Part 1 ...

The Machine Learning ParadigmSupervised vs. Unsupervised Learning

Supervised Learning with a Probabilistic GrammarA Bayesian Reply to the APS

Acquisition without a Constituent Structure Bias

The PCFG receives a higher posterior probability valueand covers significantly more sentence types in the corpusthan either the PRG or the flat grammar.The grammar with maximum a posteriori probability makesthe correct generalisation.This result suggests that it may be possible to decideamong radically distinct types of grammars on the basis ofa probabilistic model with relatively weak learning priors,for a realistic data set.

Clark and Lappin Grammar Induction Through Machine Learning Part 1