Top Banner
Dependency Language Models Joseph Gubbins
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Interview presentation

Dependency Language ModelsJoseph Gubbins

Page 2: Interview presentation

Language Models

Assign a probability to a sentence or phrase

)...( 21 nwwwP

Page 3: Interview presentation

Language Models

Are used in:

Machine translation

Page 4: Interview presentation

Language Models

Are used in:

Speech recognition

Page 5: Interview presentation

Language Models

Are used in:

- Information Retrieval

- Predictive text entry

- Handwriting recognition

Page 6: Interview presentation

N-gram Language Models

Chain rule decomposition:

Assumption: Markov property

n

iiin wwwwPwwwP

112121 )...|()...(

)...|()...|( 121121 iNiNiiii wwwwPwwwwP

Page 7: Interview presentation

N-gram Language Models

Estimate from corpus

Problem: unobserved N-grams cause probability estimate to be zero

Solution: use smoothing techniques

)...(

)...()...|(

121

121121

~

iNiNi

iiNiNiiNiNii wwwCount

wwwwCountwwwwP

Page 8: Interview presentation

N-gram Language Models

Weak point of N-gram language models:

Long range syntactic dependencies are ignored

Page 9: Interview presentation

Sentence Completion Problems

Choose the most probable from a list of possible sentences

Used in standardised tests such as SAT and GRE

Page 10: Interview presentation

Sentence Completion Problems

When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.

- tall- loud- invisible- quick- formidable

Source: Microsoft Sentence Completion Challenge

Page 11: Interview presentation

Sentence Completion Problems

When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.

- tall- loud- invisible- quick- formidable

Source: Microsoft Sentence Completion Challenge

Page 12: Interview presentation

Sentence Completion with 5-grams

5-gram probability: context of 4 words before and after

When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.-> relationship between problem and formidable is missed

Page 13: Interview presentation

Dependency Grammar

Syntactic analysis of sentence

Each word “depends” on another word

For example:Subject and object depend on verbAdjectives depend on what they describe

Page 14: Interview presentation

Dependency Grammar

Dependency relations form a tree structure

For example, for the sentence:

When his body had been carried from the cellar, we found ourselves with a problem which was almost as formidable as that with which we had started.

Page 15: Interview presentation

Dependency Grammar

Page 16: Interview presentation

Dependency Grammar

On the dependency tree, problem and formidable are adjacent.-> Idea: Create dependency language model

Page 17: Interview presentation

Dependency Language Model

Model for the “lexicalisation” of a given dependency tree.

Takes inspiration from N-gram language models.

Page 18: Interview presentation

Dependency Language Model

We denote the ancestor sequence of a word by For example,

Page 19: Interview presentation

Dependency Language Model

We assume:1. each word is conditionally independent of the words outside of its ancestor sequence, given the ancestor sequence

2. the words are independent of the grammatical labels

Page 20: Interview presentation

Dependency Language Model

Let be a breadth-first enumeration of the words in the dependency tree.

Under our assumptions, using the chain rule, we have

Page 21: Interview presentation

Dependency Language Model

Markov Assumption:

where is the sequence of (N – 1) closest ancestors of w.

This leads to:

)()1( wA N

Page 22: Interview presentation

Training Dependency LMDependency parse a large corpus.

Count sequences in dependency tree.

Estimate probability by maximum likelihood estimator:

)),(( w,nsobservatio#

)),(( nsobservatio#))(|(

)1(

)1()1(

~

wwA

wwAwAwP

iN

iiN

iN

i

Page 23: Interview presentation

Using LabelsOur model assigns the same probability to an apple ate you and you ate an apple

Page 24: Interview presentation

Using LabelsSolution: incorporate labels

Assume that each word/label pair is conditionally independent of the rest of the tree given the words/labels in its ancestor sequence.

Use maximum likelihood estimators.

Page 25: Interview presentation

Microsoft Research Sentence Completion Challenge

1040 sentence completion problems, each with 5 possible answers

Training data set of 520 19th century novels

Parsed with MaltParser and trained unlabelled and labelled order N dependency language models for N=2,3,4,5

Page 26: Interview presentation

Microsoft Research Sentence Completion Challenge

Best result of any method apart from Neural Networks

Page 27: Interview presentation

Conclusion

Developed two new language models based on dependency grammar

Competitive results on MSR Sentence Completion Challenge

Questions?