Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting ides taken from Helmut Schmid, Rada Mihalcea, Bonnie osseim, Peter Flach and others
Jan 05, 2016
Part II Statistical NLP
Advanced Artificial Intelligence
Introduction and Grammar Models
Wolfram Burgard Luc De Raedt Bernhard Nebel Kristian Kersting
Some slides taken from Helmut Schmid Rada Mihalcea Bonnie Dorr Leila Kosseim Peter Flach and others
Topic
Statistical Natural Language Processing Applies
bull Machine Learning Statistics to Learning the ability to improve onersquos behaviour at a specific
task over time - involves the analysis of data (statistics)
bull Natural Language Processing Following parts of the book
bull Statistical NLP (Manning and Schuetze) MIT Press 1999
Contents
Motivation Zipfrsquos law Some natural language processing tasks Non-probabilistic NLP models
bull Regular grammars and finite state automatabull Context-Free Grammarsbull Definite Clause Grammars
Motivation for statistical NLP Overview of the rest of this part
Rationalism versus Empiricism
Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular
science book) Empiricist
bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important
Rationalism versus Empiricism
Noam Chomskybull But it must be recognized that the notion of ldquoprobability of
a sentencerdquo is an entirely useless one under any known interpretation of this term
Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate
goes upbull (Alternative Every time I fire a linguist the recognizer
improves)
This course
Empiricist approach bull Focus will be on probabilistic models for learning
of natural language No time to treat natural language in depth
bull (though this would be quite useful and interesting)
bull Deserves a full course by itself Covered in more depth in Logic Language and
Learning (SS 05 prob SS 06)
Ambiguity
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Topic
Statistical Natural Language Processing Applies
bull Machine Learning Statistics to Learning the ability to improve onersquos behaviour at a specific
task over time - involves the analysis of data (statistics)
bull Natural Language Processing Following parts of the book
bull Statistical NLP (Manning and Schuetze) MIT Press 1999
Contents
Motivation Zipfrsquos law Some natural language processing tasks Non-probabilistic NLP models
bull Regular grammars and finite state automatabull Context-Free Grammarsbull Definite Clause Grammars
Motivation for statistical NLP Overview of the rest of this part
Rationalism versus Empiricism
Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular
science book) Empiricist
bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important
Rationalism versus Empiricism
Noam Chomskybull But it must be recognized that the notion of ldquoprobability of
a sentencerdquo is an entirely useless one under any known interpretation of this term
Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate
goes upbull (Alternative Every time I fire a linguist the recognizer
improves)
This course
Empiricist approach bull Focus will be on probabilistic models for learning
of natural language No time to treat natural language in depth
bull (though this would be quite useful and interesting)
bull Deserves a full course by itself Covered in more depth in Logic Language and
Learning (SS 05 prob SS 06)
Ambiguity
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Contents
Motivation Zipfrsquos law Some natural language processing tasks Non-probabilistic NLP models
bull Regular grammars and finite state automatabull Context-Free Grammarsbull Definite Clause Grammars
Motivation for statistical NLP Overview of the rest of this part
Rationalism versus Empiricism
Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular
science book) Empiricist
bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important
Rationalism versus Empiricism
Noam Chomskybull But it must be recognized that the notion of ldquoprobability of
a sentencerdquo is an entirely useless one under any known interpretation of this term
Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate
goes upbull (Alternative Every time I fire a linguist the recognizer
improves)
This course
Empiricist approach bull Focus will be on probabilistic models for learning
of natural language No time to treat natural language in depth
bull (though this would be quite useful and interesting)
bull Deserves a full course by itself Covered in more depth in Logic Language and
Learning (SS 05 prob SS 06)
Ambiguity
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Rationalism versus Empiricism
Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular
science book) Empiricist
bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important
Rationalism versus Empiricism
Noam Chomskybull But it must be recognized that the notion of ldquoprobability of
a sentencerdquo is an entirely useless one under any known interpretation of this term
Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate
goes upbull (Alternative Every time I fire a linguist the recognizer
improves)
This course
Empiricist approach bull Focus will be on probabilistic models for learning
of natural language No time to treat natural language in depth
bull (though this would be quite useful and interesting)
bull Deserves a full course by itself Covered in more depth in Logic Language and
Learning (SS 05 prob SS 06)
Ambiguity
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Rationalism versus Empiricism
Noam Chomskybull But it must be recognized that the notion of ldquoprobability of
a sentencerdquo is an entirely useless one under any known interpretation of this term
Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate
goes upbull (Alternative Every time I fire a linguist the recognizer
improves)
This course
Empiricist approach bull Focus will be on probabilistic models for learning
of natural language No time to treat natural language in depth
bull (though this would be quite useful and interesting)
bull Deserves a full course by itself Covered in more depth in Logic Language and
Learning (SS 05 prob SS 06)
Ambiguity
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
This course
Empiricist approach bull Focus will be on probabilistic models for learning
of natural language No time to treat natural language in depth
bull (though this would be quite useful and interesting)
bull Deserves a full course by itself Covered in more depth in Logic Language and
Learning (SS 05 prob SS 06)
Ambiguity
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Ambiguity
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Statistical Disambiguation
bull Define a probability model for the data
bull Compute the probability of each alternative
bull Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past
Statistical Methods require training data
The data in Statistical NLP are the Corpora
NLP and Statistics
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Corpus text collection for linguistic purposes
TokensHow many words are contained in Tom Sawyer 71370
TypesHow many different words are contained in TS 8018
Hapax Legomenawords appearing only once
Corpora
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
The most frequent words are function words
word freq word freq
the 3332 in 906
and 2972 that 877
a 1775 he 877
to 1725 I 783
of 1440 his 772
was 1161 you 686
it 1027 Tom 679
Word Counts
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
f nf
1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102
How many words appear f times
Word Counts
About half of the words occurs just onceAbout half of the text consists of the
100 most common wordshellip
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Word Counts (Brown corpus)
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Word Counts (Brown corpus)
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000
Zipflsquos Law f~1r (fr = const)
Zipflsquos Law
Minimize effort
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Language and sequences
Natural language processingbull Is concerned with the analysis of
sequences of words sentencesbullConstruction of language models
Two types of modelsbullNon-probabilisticbull Probabilistic
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Human Language is highly ambiguous at all levels
bull acoustic levelrecognize speech vs wreck a nice beach
bull morphological levelsaw to see (past) saw (noun) to saw (present inf)
bull syntactic levelI saw the man on the hill with a telescope
bull semantic levelOne book has to be read by every student
Key NLP Problem Ambiguity
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Language Model
A formal model about language Two types
bull Non-probabilistic Allows one to compute whether a certain sequence
(sentence or part thereof) is possible Often grammar based
bull Probabilistic Allows one to compute the probability of a certain
sequence Often extends grammars with probabilities
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Example of bad language model
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
A bad language model
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
A bad language model
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
A good language model
Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible
Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Why language models
Consider a Shannon Gamebull Predicting the next word in the sequence
Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip
Model at the sentence level
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Applications
Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Spelling errors
They are leaving in about fifteen minuets to go to her house
The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Handwriting recognition
Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)
NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a
higher probability in the context of a bank
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
For Spell Checkers
Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere
ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Another dimension in language models
Do we mainly want to infer (probabilities) of legal sentences sequences bull So far
Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL
Letrsquos look at some tasks
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Sequence Tagging
Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun
Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Sequence Tagging
Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Parsing
Given a sentence find its parse tree Important step in understanding NL
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Parsing
In bioinformatics allows to predict (elements of) structure from sequence
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Language models based on Grammars
Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars
A particular type of Unification Based Grammar (Prolog)
Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about
words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)
bull Grammar encode rules
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing
(more than just recognition) Result of parsing mostly parse tree
showing the constituents of a sentence eg verb or noun phrases
Syntax usually specified in terms of a grammar consisting of grammar rules
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Regular Grammars and Finite State Automata
Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no
argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an
argumentbull Adj (adjective)
Now acceptbull The cat sleptbull Det N Vi
As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]
Lexicon bull The - Detbull Cat - Nbull Slept - Vi
bull hellip
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Finite State Automaton
Sentences
bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Phrase structure
S
NP
D N
VP
NPV
D N
PP
P NP
D N
the dog chased a cat into the garden
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Notation
S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]
Terminals ~ Lexicon
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Phrase structure
Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the
Recursionbull bdquoThe girl thought the dog chased the catldquo
VP -gt V SN -gt [girl]V -gt [thought]
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Top-down parsing
S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Context-free grammarSS --gt --gt NPNPVPVP
NPNP --gt PN --gt PN Proper nounProper noun
NPNP --gt Art Adj N--gt Art Adj N
NPNP --gt ArtN--gt ArtN
VPVP --gt VI --gt VI intransitive verbintransitive verb
VPVP --gt VT --gt VT NPNP transitive verbtransitive verb
ArtArt --gt [the]--gt [the]
AdjAdj --gt [lazy]--gt [lazy]
AdjAdj --gt [rapid]--gt [rapid]
PNPN --gt [achilles]--gt [achilles]
NN --gt [turtle]--gt [turtle]
VIVI --gt [sleeps]--gt [sleeps]
VTVT --gt [beats]--gt [beats]
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Parse tree
SS
NPNP VPVP
ArtArt AdjAdj NN VtVt NPNP
PNPN
achillesachillesbeatsbeatsturtleturtlerapidrapidthethe
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Definite Clause GrammarsNon-terminals may have arguments
SS --gt --gt NPNP((NN))VPVP((NN))
NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))
VP(VP(NN)) --gt VI(--gt VI(NN))
Art(Art(singularsingular)) --gt [a]--gt [a]
Art(Art(singularsingular)) --gt [the]--gt [the]
Art(Art(pluralplural)) --gt [the]--gt [the]
N(N(singularsingular)) --gt [turtle]--gt [turtle]
N(N(pluralplural)) --gt [turtles]--gt [turtles]
VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]
VI(VI(pluralplural)) --gt [sleep]--gt [sleep]
Number Agreement
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
DCGs
Non-terminals may have argumentsbull Variables (start with capital)
Eg Number Any
bull Constants (start with lower case) Eg singular plural
bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)
Parsing needs to be adapted bull Using unification
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Unification in a nutshell (cf AI course)
Substitutions
Eg Num singular T vp(VNP)
Applying substitution bull Simultaneously replace variables by
corresponding termsbull S(Num) Num singular = S(singular)
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Unification
Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)
Gives Num singular
bull Art(singular) and Art(plural) Fails
bull Art(Num1) and Art(Num2) Num1 Num2
bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Parsing with DCGs
Now require successful unification at each step
S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps
S-gt a turtle sleep fails
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Case Marking
PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]
PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]
PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]
PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]
S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)
VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)
VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)
VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)
He sees her She sees him They see her
But not Them see he
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
DCGs
Are strictly more expressive than CFGs Can represent for instance
bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Probabilistic Models
Traditional grammar models are very rigid bull essentially a yes no decision
Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative
Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Illustration
Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known
bull Constructed by handbull Can be used to derive stochastic context free
grammarsbull SCFG assign probability to parse trees
Compute the most probable parse tree
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Sequences are omni-present
Therefore the techniques we will see also apply tobull Bioinformatics
DNA proteins mRNA hellip can all be represented as strings
bullRobotics Sequences of actions states hellip
bullhellip
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars
Rest of the Course
Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata
All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields
bull As an example of using undirected graphical models
bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars