CS 479, section 1: Natural Language Processing

CS 479, section 1:Natural Language Processing

Lecture #31: Dependency Parsing

Thanks to Joakim Nivre and Sandar Kuebler for many of the materials used in this lecture,with additions by Dan Roth.

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

http://creativecommons.org/licenses/by-sa/3.0/



Announcements

Final Project Three options:

Propose (possibly as a team) No proposal – just decide

Project #4 Project #5

Proposals Early: today Due: Friday

Note: must discuss with me before submitting written proposal

Objectives Become acquainted with dependency parsing, in contrast

to constituent parsing

See the relationship between the two approaches

Understand an algorithm for non-projective dependency parsing

Have a starting point to understand the rest of the dependency parsing literature

Think about uses of dependency parsing

Your Questions

Big Ideas fromMcDonald et al., 2006?

Dependency parsing Non-projective vs. projective parse trees Generalization to other languages Labeled vs. unlabeled dependencies Problem: Maximum Spanning Tree Algorithm: Chu-Liu-Edmonds Edge scores Machine Learning: MIRA Large Margin Learners Online vs. Batch learning Feature engineering

Big Ideas fromMcDonald et al., 2006?

Outline

Dependency Parsing: Formalism Dependency Parsing algorithms

Semantic Role Labeling Dependency Formalism

=5

=5

• Formalization by Lucien Tesniere [Tesniere, 1959]• Idea known long before (e.g., Panini, India, >2000 yrs ago)• Studied extensively in the Prague School approach in syntax• (in US, research was focused more on constituent formalism)

=5

=5

(or Constituent Structure)

Constituent vs Dependency There are advantages of dependency

structures: for free (or semi-free) order languages easier to convert to predicate-argument structure ...

But there are drawbacks too... You can try to convert one representation into

another but, in general, these formalisms are not equivalent

=5

Dependency structures for NLP tasks

Most of approaches have been focused on constituent tree-based features

But now dependency parsing is in the spotlight: Machine Translation (e.g., Menezes & Quirk, 07) Summarization and sentence compression (e.g.,

Fillippova & Strube, 08) Opinion mining, (e.g., Lerman et al, 08) Information extraction, Question Answering (e.g.,

Bouma et al, 06)

=5

All these conditions will be violated for semantic dependency graphs we will consider later

You can think of it as (related) planarity

Algorithms

Global inference algorithms: graph-based approaches transition-based approaches

We will not consider rule-based systems constraint satisfaction

=5

Converting to Constituent Formalism

Idea: Convert dependency structures to constituent

structures easy for projective dependency structures

Apply algorithms for constituent parsing to them E.g., CKY/ PCKY

=5

Converting to Constituent Formalism

Different independence assumption lead to different statistical models both accuracy and parsing time (dynamic

programming) varies

=5

•

• Features f(i,j) can include dependence on any words in the sentence, i.e. f(i, j, sent)

• But still the score decomposes over edges in the graph• Strong independence assumption

Online Learning:Structured Perceptron

Joint feature representation: we will talk about it more later

Algorithm:

x̂

Here we run MST or Eisner’s algorithm

Features over edges only

x̂

Parsing Algorithms Here, when we say parsing algorithm (=derivation

order) we often mean mapping: Given a tree map it to a sequence of actions which create

this tree

Tree T is equivalent to these sequence of actions: d1, ..., dn Therefore, P(T) = P(d1, ..., dn) P(T) = P(d1, ..., dn) = P(d1) P(d2|d1)... P(dn|dn-1, ..., d1) Ambigous: some times “parsing algorithms” refers to the

decoding algorithm to find the most likely sequence

You can use classifiers here and search for most likely sequence.

• Most algorithms are restricted to projective structures, but not all

It can handle only projective structures

x̂

How to learn in this case? Your training examples are

-- collections of parsing contexts Your want to predict correct actions

How to define feature representation of You can think instead of () in terms of:

partial tree corresponding to them current contents of queue (Q) and stack (S) The most important features are top of S and front of Q (only between

them you can potentially create links)

Inference: Greedily With beam search

x̂

Results: Transition-based vs Graph-Based

CoNLL-2006 Shared Task, Average over 12 langs (Labeled Attachment Score)

McDonald et al (MST): 80.27 Nivre et al (Transitions): 80.19

Results are the same A lot of research in both directions,

e.g., Latent Variable Models for Transition Based Parsing (Titov and Henderson, 07) – best single-model system in CoNLL-2007 (third overall)

x̂

Non-Projective Parsing

Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall

and Novak, 05) Transition-Based Algorithms which handle

non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)

Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

x̂

Non-Projective Parsing

Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall

and Novak, 05) Transition-Based Algorithms which handle

non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)

Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

Next Document Clustering

Unsupervised learning Expectation Maximization (EM)

Machine Translation! Word alignment Phrase alignment

Semantics Co-reference

CS 479, section 1: Natural Language Processing

Documents

constituent parsing

recent dependency

dependency parsingthanks

nonprojective dependency

semantic dependency

constituent formalism

lucien tesniere tesniere

natural language preparation