Dependency Language Models Joseph Gubbins
Dependency Language ModelsJoseph Gubbins
Language Models
Assign a probability to a sentence or phrase
)...( 21 nwwwP
Language Models
Are used in:
Machine translation
Language Models
Are used in:
Speech recognition
Language Models
Are used in:
- Information Retrieval
- Predictive text entry
- Handwriting recognition
N-gram Language Models
Chain rule decomposition:
Assumption: Markov property
n
iiin wwwwPwwwP
112121 )...|()...(
)...|()...|( 121121 iNiNiiii wwwwPwwwwP
N-gram Language Models
Estimate from corpus
Problem: unobserved N-grams cause probability estimate to be zero
Solution: use smoothing techniques
)...(
)...()...|(
121
121121
~
iNiNi
iiNiNiiNiNii wwwCount
wwwwCountwwwwP
N-gram Language Models
Weak point of N-gram language models:
Long range syntactic dependencies are ignored
Sentence Completion Problems
Choose the most probable from a list of possible sentences
Used in standardised tests such as SAT and GRE
Sentence Completion Problems
When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.
- tall- loud- invisible- quick- formidable
Source: Microsoft Sentence Completion Challenge
Sentence Completion Problems
When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.
- tall- loud- invisible- quick- formidable
Source: Microsoft Sentence Completion Challenge
Sentence Completion with 5-grams
5-gram probability: context of 4 words before and after
When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.-> relationship between problem and formidable is missed
Dependency Grammar
Syntactic analysis of sentence
Each word “depends” on another word
For example:Subject and object depend on verbAdjectives depend on what they describe
Dependency Grammar
Dependency relations form a tree structure
For example, for the sentence:
When his body had been carried from the cellar, we found ourselves with a problem which was almost as formidable as that with which we had started.
Dependency Grammar
Dependency Grammar
On the dependency tree, problem and formidable are adjacent.-> Idea: Create dependency language model
Dependency Language Model
Model for the “lexicalisation” of a given dependency tree.
Takes inspiration from N-gram language models.
Dependency Language Model
We denote the ancestor sequence of a word by For example,
Dependency Language Model
We assume:1. each word is conditionally independent of the words outside of its ancestor sequence, given the ancestor sequence
2. the words are independent of the grammatical labels
Dependency Language Model
Let be a breadth-first enumeration of the words in the dependency tree.
Under our assumptions, using the chain rule, we have
Dependency Language Model
Markov Assumption:
where is the sequence of (N – 1) closest ancestors of w.
This leads to:
)()1( wA N
Training Dependency LMDependency parse a large corpus.
Count sequences in dependency tree.
Estimate probability by maximum likelihood estimator:
)),(( w,nsobservatio#
)),(( nsobservatio#))(|(
)1(
)1()1(
~
wwA
wwAwAwP
iN
iiN
iN
i
Using LabelsOur model assigns the same probability to an apple ate you and you ate an apple
Using LabelsSolution: incorporate labels
Assume that each word/label pair is conditionally independent of the rest of the tree given the words/labels in its ancestor sequence.
Use maximum likelihood estimators.
Microsoft Research Sentence Completion Challenge
1040 sentence completion problems, each with 5 possible answers
Training data set of 520 19th century novels
Parsed with MaltParser and trained unlabelled and labelled order N dependency language models for N=2,3,4,5
Microsoft Research Sentence Completion Challenge
Best result of any method apart from Neural Networks
Conclusion
Developed two new language models based on dependency grammar
Competitive results on MSR Sentence Completion Challenge
Questions?