Top Banner
CS 479, section 1: Natural Language Processing Lecture #31: Dependency Parsing akim Nivre and Sandar Kuebler for many of the materials used in thi ns by Dan Roth. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License .
42

CS 479, section 1: Natural Language Processing

Feb 24, 2016

Download

Documents

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture # 31: Dependency Parsing. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 479, section 1: Natural Language Processing

CS 479, section 1:Natural Language Processing

Lecture #31: Dependency Parsing

Thanks to Joakim Nivre and Sandar Kuebler for many of the materials used in this lecture,with additions by Dan Roth.

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Page 2: CS 479, section 1: Natural Language Processing

Announcements

Final Project Three options:

Propose (possibly as a team) No proposal – just decide

Project #4 Project #5

Proposals Early: today Due: Friday

Note: must discuss with me before submitting written proposal

Page 3: CS 479, section 1: Natural Language Processing

Objectives Become acquainted with dependency parsing, in contrast

to constituent parsing

See the relationship between the two approaches

Understand an algorithm for non-projective dependency parsing

Have a starting point to understand the rest of the dependency parsing literature

Think about uses of dependency parsing

Page 4: CS 479, section 1: Natural Language Processing

Your Questions

Page 5: CS 479, section 1: Natural Language Processing

Big Ideas fromMcDonald et al., 2006?

Page 6: CS 479, section 1: Natural Language Processing

Dependency parsing Non-projective vs. projective parse trees Generalization to other languages Labeled vs. unlabeled dependencies Problem: Maximum Spanning Tree Algorithm: Chu-Liu-Edmonds Edge scores Machine Learning: MIRA Large Margin Learners Online vs. Batch learning Feature engineering

Big Ideas fromMcDonald et al., 2006?

Page 7: CS 479, section 1: Natural Language Processing

Outline

Dependency Parsing: Formalism Dependency Parsing algorithms

Semantic Role Labeling Dependency Formalism

=5

Page 8: CS 479, section 1: Natural Language Processing

=5

• Formalization by Lucien Tesniere [Tesniere, 1959]• Idea known long before (e.g., Panini, India, >2000 yrs ago)• Studied extensively in the Prague School approach in syntax• (in US, research was focused more on constituent formalism)

Page 9: CS 479, section 1: Natural Language Processing

=5

Page 10: CS 479, section 1: Natural Language Processing

=5

Page 11: CS 479, section 1: Natural Language Processing

(or Constituent Structure)

Page 12: CS 479, section 1: Natural Language Processing
Page 13: CS 479, section 1: Natural Language Processing

Constituent vs Dependency There are advantages of dependency

structures: for free (or semi-free) order languages easier to convert to predicate-argument structure ...

But there are drawbacks too... You can try to convert one representation into

another but, in general, these formalisms are not equivalent

=5

Page 14: CS 479, section 1: Natural Language Processing

Dependency structures for NLP tasks

Most of approaches have been focused on constituent tree-based features

But now dependency parsing is in the spotlight: Machine Translation (e.g., Menezes & Quirk, 07) Summarization and sentence compression (e.g.,

Fillippova & Strube, 08) Opinion mining, (e.g., Lerman et al, 08) Information extraction, Question Answering (e.g.,

Bouma et al, 06)

=5

Page 15: CS 479, section 1: Natural Language Processing
Page 16: CS 479, section 1: Natural Language Processing
Page 17: CS 479, section 1: Natural Language Processing

All these conditions will be violated for semantic dependency graphs we will consider later

Page 18: CS 479, section 1: Natural Language Processing

You can think of it as (related) planarity

Page 19: CS 479, section 1: Natural Language Processing

Algorithms

Global inference algorithms: graph-based approaches transition-based approaches

We will not consider rule-based systems constraint satisfaction

=5

Page 20: CS 479, section 1: Natural Language Processing

Converting to Constituent Formalism

Idea: Convert dependency structures to constituent

structures easy for projective dependency structures

Apply algorithms for constituent parsing to them E.g., CKY/ PCKY

=5

Page 21: CS 479, section 1: Natural Language Processing

Converting to Constituent Formalism

Different independence assumption lead to different statistical models both accuracy and parsing time (dynamic

programming) varies

=5

Page 22: CS 479, section 1: Natural Language Processing
Page 23: CS 479, section 1: Natural Language Processing

• Features f(i,j) can include dependence on any words in the sentence, i.e. f(i, j, sent)

• But still the score decomposes over edges in the graph• Strong independence assumption

Page 24: CS 479, section 1: Natural Language Processing

Online Learning:Structured Perceptron

Joint feature representation: we will talk about it more later

Algorithm:

Here we run MST or Eisner’s algorithm

Features over edges only

Page 25: CS 479, section 1: Natural Language Processing

Parsing Algorithms Here, when we say parsing algorithm (=derivation

order) we often mean mapping: Given a tree map it to a sequence of actions which create

this tree

Tree T is equivalent to these sequence of actions: d1, ..., dn Therefore, P(T) = P(d1, ..., dn) P(T) = P(d1, ..., dn) = P(d1) P(d2|d1)... P(dn|dn-1, ..., d1) Ambigous: some times “parsing algorithms” refers to the

decoding algorithm to find the most likely sequence

You can use classifiers here and search for most likely sequence.

Page 26: CS 479, section 1: Natural Language Processing
Page 27: CS 479, section 1: Natural Language Processing

• Most algorithms are restricted to projective structures, but not all

Page 28: CS 479, section 1: Natural Language Processing

It can handle only projective structures

Page 29: CS 479, section 1: Natural Language Processing
Page 30: CS 479, section 1: Natural Language Processing
Page 31: CS 479, section 1: Natural Language Processing
Page 32: CS 479, section 1: Natural Language Processing
Page 33: CS 479, section 1: Natural Language Processing
Page 34: CS 479, section 1: Natural Language Processing
Page 35: CS 479, section 1: Natural Language Processing
Page 36: CS 479, section 1: Natural Language Processing

How to learn in this case? Your training examples are

-- collections of parsing contexts Your want to predict correct actions

How to define feature representation of You can think instead of () in terms of:

partial tree corresponding to them current contents of queue (Q) and stack (S) The most important features are top of S and front of Q (only between

them you can potentially create links)

Inference: Greedily With beam search

Page 37: CS 479, section 1: Natural Language Processing

Results: Transition-based vs Graph-Based

CoNLL-2006 Shared Task, Average over 12 langs (Labeled Attachment Score)

McDonald et al (MST): 80.27 Nivre et al (Transitions): 80.19

Results are the same A lot of research in both directions,

e.g., Latent Variable Models for Transition Based Parsing (Titov and Henderson, 07) – best single-model system in CoNLL-2007 (third overall)

Page 38: CS 479, section 1: Natural Language Processing

Non-Projective Parsing

Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall

and Novak, 05) Transition-Based Algorithms which handle

non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)

Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

Page 39: CS 479, section 1: Natural Language Processing

Non-Projective Parsing

Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall

and Novak, 05) Transition-Based Algorithms which handle

non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)

Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

Page 40: CS 479, section 1: Natural Language Processing
Page 41: CS 479, section 1: Natural Language Processing
Page 42: CS 479, section 1: Natural Language Processing

Next Document Clustering

Unsupervised learning Expectation Maximization (EM)

Machine Translation! Word alignment Phrase alignment

Semantics Co-reference