Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing

Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing

David Mareček and Zdeněk Žabokrtský

Institute of Formal and Applied LinguisticsCharles University in Prague

September 15, 2011, Hissar, Bulgaria

Motivations for unsupervised parsing

We want to parse texts for which we do not have any manually annotated treebanks texts from different domains different languages

We want to learn sentence structures from the corpus only What if the structures produced by linguists are not suitable for NLP? Annotations are expensive

It’s a challenge: can we beat the supervised techniques in some application?

Outline

Parser description Priors Models Sampling

Sampling constraints Treeness Root fertility Noun-root dependency repression

Evaluation on Czech treebank on all 19 CoNLL treebanks from shared task 2006-2007

Conclusions

Basic features of our approach

Learning is based on Gibbs sampling We approximate probability of a tree by a product of probabilities

of individual edges We used only POS tags for predicting a dependency relation

but we plan to use lexicalization and unsupervised POS tagging in the future

We introduce treeness as a hard constraint in the sampling procedure

It allows non-projective edges

Models

We use two simple models in our experiments the parent POS tag conditioned by the child POS tag

the edge length (signed distance between the two words) conditioned by the child POS tag

Gibbs sampling

We sample each dependency edge independently 50 iterations The rich get richer (self-reinforcing behavior)

counts are taken from the history Exchangability

we can deal with each edge as it was the last one in the corpus nominators and denominators in the product are exchangable

Dirichlet hyperparameters α1 α2 were set experimentally

Basic sampling

For each node, sample its parent with respect to the probability distribution

The sampling order of the nodes is random Problem: it may create cycles and discontinuous graphs

ROOT Její dcera byla včera v zoologické zahradě.5 3 2 6 7 1 4

0.010.02 0.05

0.04 0.030.07

0.05

Treeness constraint

In case a cycle is created: choose one edge in the cycle (by sampling) and delete it take the formed subtree and attach it to one of the remaining nodes (by

sampling)

ROOT Její dcera byla včera v zoologické zahradě.

0.02 0.010.02 0.04

0.02

0.02 0.05

0.02

Root fertility constraint Individual phrases tend to be attached to the technical root A sentence has usualy only one word (the main verb) that

dominate the others We constrain the root fertility to be one If it has more than one child, we do the resampling

sample one child that will stay under the root resample parents of other children

ROOT Její dcera byla včera v zoologické zahradě.

0.010.020.040.040.05

0.02 0.03

0.02

Nouns (especially subjects) often substitute verbs in the governing positions.

Majority of grammars are verbocentric Nouns can be easily recognized as the most frequent coarse-

grained tag category in the corpus We add the following model:

This model is useless when an unsupervised POS tagging is used

Noun-ROOT dependency repression

Evaluation measures

Evaluation of unsupervised parser on GOLD data is problematic many linguistics decisions must have been done before annotating each

corpus how to deal with coordination structures, auxiliary verbs, prepositions,

subordinating conjunctions?

We use three following measures: UAS (unlabeled attachment score) – standard metric for evaluating

dependency parsers UUAS (undirected unlabeled attachment score) – edge direction is

disregarded (it is not a mistake if governor and dependent are switched) NED (neutral edge direction, Schwartz et al, 2011) which treats not only a

node’s gold parent and child as the correct answer, but also its gold grandparent

UAS < UUAS < NED

Evaluation on Czech

Czech dependency treebank from CoNLL 2007 shared task Punctuation removed max 15-word sentences

Configuration UAS UUAS NED

Random baseline 12.0 19.9 27.5

LeftChain baseline 30.2 53.6 67.2

RightChain baseline 25.5 52.0 60.6

Base 36.7 50.1 55.1

Base+Treeness 36.2 46.6 50.0

Base+Treeness+RootFert 41.2 58.6 70.8

Base+Treeness+RootFert+NounRootRepression 49.8 62.6 73.0

Error analysis for Czech

Many errors are caused by the reversed dependencies preposition – noun subordinating conjunction – verb

Evaluation on 19 CoNLL languages

We have taken the dependency treebanks from CoNLL shared tasks 2006 and 2007

POS tags from the fifth column were used The parsing was run on concatenated trainining and

development sets Punctuation was removed Evaluation on the development sets only We compare our results with the state-of-the-art system, which

is based on DMV (Spitkovsky et al, 2011)

Evaluation on 19 CoNLL languages

Conclusions

We introduced a new approach to unsupervised dependency parsing

Even though only a couple of experiments were done so far and only POS tags with no lexicalization are used, the results seem to be competitive to the state-of-the-art unsuperrvised parsers (DMV)

We have better UAS for 12 languages out of 19 If we do not use noun-root dependency repression, which is useful

only with supervised POS tags, we have better scores for 7 languages out of 19

Future work

We would like to add: Word fertility model

to model number of children for each node

Lexicalization the word forms itself must be useful

Unsupervised POS taging some recent experiments show that using word classes instead of

supervised POS tags can improve the parsing accuracy

Thank you for your attention.

Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing

Documents