CS 479, section 1: Natural Language Processing Lecture #31: Dependency Parsing akim Nivre and Sandar Kuebler for many of the materials used in thi ns by Dan Roth. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License .
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture # 31: Dependency Parsing. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CS 479, section 1:Natural Language Processing
Lecture #31: Dependency Parsing
Thanks to Joakim Nivre and Sandar Kuebler for many of the materials used in this lecture,with additions by Dan Roth.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Propose (possibly as a team) No proposal – just decide
Project #4 Project #5
Proposals Early: today Due: Friday
Note: must discuss with me before submitting written proposal
Objectives Become acquainted with dependency parsing, in contrast
to constituent parsing
See the relationship between the two approaches
Understand an algorithm for non-projective dependency parsing
Have a starting point to understand the rest of the dependency parsing literature
Think about uses of dependency parsing
Your Questions
Big Ideas fromMcDonald et al., 2006?
Dependency parsing Non-projective vs. projective parse trees Generalization to other languages Labeled vs. unlabeled dependencies Problem: Maximum Spanning Tree Algorithm: Chu-Liu-Edmonds Edge scores Machine Learning: MIRA Large Margin Learners Online vs. Batch learning Feature engineering
• Formalization by Lucien Tesniere [Tesniere, 1959]• Idea known long before (e.g., Panini, India, >2000 yrs ago)• Studied extensively in the Prague School approach in syntax• (in US, research was focused more on constituent formalism)
=5
=5
(or Constituent Structure)
Constituent vs Dependency There are advantages of dependency
structures: for free (or semi-free) order languages easier to convert to predicate-argument structure ...
But there are drawbacks too... You can try to convert one representation into
another but, in general, these formalisms are not equivalent
=5
Dependency structures for NLP tasks
Most of approaches have been focused on constituent tree-based features
But now dependency parsing is in the spotlight: Machine Translation (e.g., Menezes & Quirk, 07) Summarization and sentence compression (e.g.,
Fillippova & Strube, 08) Opinion mining, (e.g., Lerman et al, 08) Information extraction, Question Answering (e.g.,
Bouma et al, 06)
=5
All these conditions will be violated for semantic dependency graphs we will consider later
You can think of it as (related) planarity
Algorithms
Global inference algorithms: graph-based approaches transition-based approaches
We will not consider rule-based systems constraint satisfaction
=5
Converting to Constituent Formalism
Idea: Convert dependency structures to constituent
structures easy for projective dependency structures
Apply algorithms for constituent parsing to them E.g., CKY/ PCKY
=5
Converting to Constituent Formalism
Different independence assumption lead to different statistical models both accuracy and parsing time (dynamic
programming) varies
=5
•
• Features f(i,j) can include dependence on any words in the sentence, i.e. f(i, j, sent)
• But still the score decomposes over edges in the graph• Strong independence assumption
Online Learning:Structured Perceptron
Joint feature representation: we will talk about it more later
Algorithm:
x̂
Here we run MST or Eisner’s algorithm
Features over edges only
x̂
Parsing Algorithms Here, when we say parsing algorithm (=derivation
order) we often mean mapping: Given a tree map it to a sequence of actions which create
this tree
Tree T is equivalent to these sequence of actions: d1, ..., dn Therefore, P(T) = P(d1, ..., dn) P(T) = P(d1, ..., dn) = P(d1) P(d2|d1)... P(dn|dn-1, ..., d1) Ambigous: some times “parsing algorithms” refers to the
decoding algorithm to find the most likely sequence
You can use classifiers here and search for most likely sequence.
• Most algorithms are restricted to projective structures, but not all
It can handle only projective structures
x̂
How to learn in this case? Your training examples are
-- collections of parsing contexts Your want to predict correct actions
How to define feature representation of You can think instead of () in terms of:
partial tree corresponding to them current contents of queue (Q) and stack (S) The most important features are top of S and front of Q (only between
them you can potentially create links)
Inference: Greedily With beam search
x̂
Results: Transition-based vs Graph-Based
CoNLL-2006 Shared Task, Average over 12 langs (Labeled Attachment Score)
McDonald et al (MST): 80.27 Nivre et al (Transitions): 80.19
Results are the same A lot of research in both directions,
e.g., Latent Variable Models for Transition Based Parsing (Titov and Henderson, 07) – best single-model system in CoNLL-2007 (third overall)
x̂
Non-Projective Parsing
Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall
and Novak, 05) Transition-Based Algorithms which handle
non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)
Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)
x̂
Non-Projective Parsing
Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall
and Novak, 05) Transition-Based Algorithms which handle
non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)
Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)