Experiments with a Experiments with a Multilanguage Non- Multilanguage Non- Projective Dependency Projective Dependency Parser Parser Giuseppe Attardi Giuseppe Attardi Dipartimento di Dipartimento di Informatica Informatica Università di Pisa Università di Pisa Università di Pisa
42
Embed
Experiments with a Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Experiments with a Multilanguage Experiments with a Multilanguage Non-Projective Dependency ParserNon-Projective Dependency Parser
Giuseppe AttardiGiuseppe Attardi
Dipartimento di InformaticaDipartimento di Informatica
Università di PisaUniversità di Pisa
Università di Pisa
Aims and MotivationAims and Motivation
Efficient parser for use in demanding Efficient parser for use in demanding applications like QA, Opinion Miningapplications like QA, Opinion Mining
Can tolerate small drop in accuracyCan tolerate small drop in accuracyCustomizable to the need of the Customizable to the need of the
Annotator for Italian TreeBankAnnotator for Italian TreeBank
Statistical ParsersStatistical Parsers
Probabilistic Generative Model of Probabilistic Generative Model of Language which include parse Language which include parse structure (e.g. Collins 1997)structure (e.g. Collins 1997)
XX: set of sentences: set of sentences YY:: set of possible parse trees set of possible parse trees Learn function Learn function FF: : XX →→ YY Choose the highest scoring tree as the most Choose the highest scoring tree as the most
plausible:plausible:
Involves just learning weights Involves just learning weights WW
WyxFxGENy
)(argmax)()(
Feature VectorFeature Vector
A set of functions h1…hd define a feature vector
(x) = <h1(x), h2(x) … hd(x)>
Constituent ParsingConstituent Parsing
GENGEN: e.g. CFG: e.g. CFGhhii((xx) ) are based on aspects of the treeare based on aspects of the tree
e.g.e.g.
h(x) = # of times occurs in xA
B C
Dependency ParsingDependency Parsing
GENGEN generates all possible maximum generates all possible maximum spanning treesspanning trees
First order factorization:First order factorization:
((yy) = <) = <hh(0, 1), … (0, 1), … hh((nn-1, -1, nn)>)>Second order factorization Second order factorization
Traditional statistical parsers are Traditional statistical parsers are trained directly on the trained directly on the task of task of selecting a parse tree for a sentenceselecting a parse tree for a sentence
Instead a Shift/Reduce parser is Instead a Shift/Reduce parser is trained and trained and learns the sequence of learns the sequence of parse actionsparse actions required to build the required to build the parse treeparse tree
Grammar Not RequiredGrammar Not Required
A traditional parser requires a A traditional parser requires a grammar for generating candidate grammar for generating candidate treestrees
A Shift/Reduce parser needs no A Shift/Reduce parser needs no grammargrammar
Parsing as ClassificationParsing as Classification
Parsing based on Shift/Reduce Parsing based on Shift/Reduce actionsactions
Learn from annotated corpus which Learn from annotated corpus which action to perform at each stepaction to perform at each step
Proposed by (Yamada-Matsumoto Proposed by (Yamada-Matsumoto 2003) and (Nivre 2003)2003) and (Nivre 2003)
Uses only local information, but can Uses only local information, but can exploit historyexploit history
Let Let RR = { = {rr11, … , , … , rrmm}} be the set of permissible be the set of permissible dependency typesdependency types
A dependency graph for a sequence of A dependency graph for a sequence of wordswords
WW = = ww11 … … wwnn is a labeled directed graph is a labeled directed graphD = (W, A)D = (W, A), where, where(a) (a) WW is the set of nodes, i.e. word tokens in is the set of nodes, i.e. word tokens in
the input string,the input string,(b) (b) AA is a set of labeled arcs is a set of labeled arcs ((wwii, , rr, , wwjj),),
wwii, , wwjj WW, , rr RR,,(c) (c) wwjj WW, there is at most one arc, there is at most one arc
((wwii, , rr, , wwjj) ) AA..
Parser StateParser State
The parser state is a quadrupleThe parser state is a quadrupleSS, , II, , TT, , AA, where, whereS is a stack of partially processed tokensI is a list of (remaining) input tokensT is a stack of temporary tokensA is the arc relation for the dependency
graph
(w, r, h) A represents an arc w → h, tagged with dependency r
Which Orientation for Arrows?Which Orientation for Arrows?
Some authors draw a dependency Some authors draw a dependency link as arrow from dependent to head link as arrow from dependent to head (Yamada-Matsumoto)(Yamada-Matsumoto)
Some authors draw a dependency Some authors draw a dependency link as arrow from head to dependent link as arrow from head to dependent (Nivre, McDonalds)(Nivre, McDonalds)
Causes confusions, since actions are Causes confusions, since actions are termed Left/Right according to termed Left/Right according to direction of arrowdirection of arrow
Parser ActionsParser Actions
ShiftShiftSS, , nn||II, , TT, , AAnn||SS, , II, , TT, , AA
RightRightss||SS, , nn||II, , TT, , AA
SS, , nn||II, , TT, , AA{({(ss, , rr, , nn)})}
LeftLeftss||SS, , nn||II, , TT, , AA
SS, , ss||II, , TT, , AA{({(nn, , rr, , ss)})}
Parser AlgorithmParser Algorithm
The parsing algorithm is fully The parsing algorithm is fully deterministic:deterministic:Input Sentence: (w1, p1), (w2, p2), … , (wn,
pn)S = <>I = <(w1, p1), (w2, p2), … , (wn, pn)>T = <>A = { }while I ≠ <> do begin
x = getContext(S, I, T, A);y = estimateAction(model, x);performAction(y, S, I, T, A);
end
Learning PhaseLearning Phase
Learning FeaturesLearning Features
feature Value
W word
L lemma
P part of speech (POS) tag
M morphology: e.g. singular/plural
W< word of the leftmost child node
L< lemma of the leftmost child node
P< POS tag of the leftmost child node, if present
M< whether the rightmost child node is singular/plural
W> word of the rightmost child node
L> lemma of the rightmost child node
P> POS tag of the rightmost child node, if present
M> whether the rightmost child node is singular/plural
ExtractExtractss11||ss22||SS, , nn||II, , TT, , AA
nn||ss11||SS, , II, , ss22||TT, , AA
InsertInsertSS, , II, , ss11||TT, , AA
ss11||SS, , II, , TT, , AA
ExampleExample
Right2Right2 ( (nejennejen → → aleale) and ) and Left3Left3 ( (faxfax → → VětšinuVětšinu) )
Většinu těchto přístrojů lze take používat nejen jako fax , ale
ExampleExample
Většinu těchto přístrojů lze take používat nejen fax ale
jako ,
ExamplesExamples
zou gemaakt moeten worden in
zou moeten worden gemaakt in
Extract followed by Insert
Effectiveness for Non-ProjectivityEffectiveness for Non-Projectivity
Training data for Czech contains Training data for Czech contains 28081 non-projective relations28081 non-projective relations
26346 (93%) can be handled by 26346 (93%) can be handled by Left2/Right2Left2/Right2
1683 (6%) by Left3/Right31683 (6%) by Left3/Right352 (0.2%) require Extract/Insert52 (0.2%) require Extract/Insert
ExperimentsExperiments
3 classifiers: one to decide between 3 classifiers: one to decide between Shift/Reduce, one to decide which Shift/Reduce, one to decide which Reduce action and a third one to Reduce action and a third one to chose the dependency in case of chose the dependency in case of Left/Right actionLeft/Right action
2 classifiers: one to decide which 2 classifiers: one to decide which action to perform and a second one action to perform and a second one to chose the dependencyto chose the dependency
CoNLL-X Shared TaskCoNLL-X Shared Task
To assign labeled dependency structures To assign labeled dependency structures for a range of languages by means of a for a range of languages by means of a fully automatic dependency parserfully automatic dependency parser
Input: tokenized and tagged sentencesInput: tokenized and tagged sentences Tags: token, lemma, POS, morpho Tags: token, lemma, POS, morpho
features, ref. to head, dependency labelfeatures, ref. to head, dependency label For each token, the parser must output its For each token, the parser must output its
head and the corresponding dependency head and the corresponding dependency relationrelation
Running Maltparser 0.4 on same Running Maltparser 0.4 on same Xeon 2.8 MHz machineXeon 2.8 MHz machine
Training on swedish/talbanken:Training on swedish/talbanken:– 390 min
Test on CoNLL swedish:Test on CoNLL swedish:– 13 min
Italian TreebankItalian Treebank
Official Announcement:Official Announcement:– CNR ILC has agreed to provide the SI-
TAL collection for use at CoNLLWorking on completing annotation Working on completing annotation
and converting to CoNLL formatand converting to CoNLL formatSemiautomated process: heuristics + Semiautomated process: heuristics +
manual fixupmanual fixup
DgAnnotatorDgAnnotator
A GUI tool for:A GUI tool for:– Annotating texts with dependency relations– Visualizing and comparing trees– Generating corpora in XML or CoNLL format– Exporting DG trees to PNG
DemoDemo Available at: Available at: http://http://
G. Attardi. 2006. Experiments with a G. Attardi. 2006. Experiments with a Multilanguage Non-projective Dependency Multilanguage Non-projective Dependency Parser. In Proc. CoNLL-X.Parser. In Proc. CoNLL-X.
H. Yamada, Y. Matsumoto. 2003. Statistical H. Yamada, Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Dependency Analysis with Support Vector Machines. In Machines. In Proc. of IWPT-2003Proc. of IWPT-2003..
J. Nivre. 2003. An efficient algorithm for projective dependency parsing. In Proc. of IWPT-2003, pages 149–160.