Persian Part Of Speech Tagging

1

Persian Part Of Speech Tagging

Mostafa Keikha

Database Research Group (DBRG)

ECE Department, University of Tehran

2

Decision Trees

Decision Tree (DT): Tree where the root and each internal node is

labeled with a question. The arcs represent each possible answer to the

associated question. Each leaf node represents a prediction of a

solution to the problem. Popular technique for classification; Leaf

node indicates class to which the corresponding tuple belongs.

3

Decision Tree Example

4

Decision Trees

A Decision Tree Model is a computational model consisting of three parts: Algorithm to create the tree Algorithm that applies the tree to data

Creation of the tree is the most difficult part. Processing is basically a search similar to that in

a binary search tree (although DT may not be binary).

5

Decision Tree Algorithm

6

Using DT in POS Tagging

Compute Ambiguity classes Each term may have

different tags Ambiguity class for each

term: set of all possible tags

compute # of occurrence for each tag in each ambiguity class

Ambiguity Class

# of occurrence

a b c d10 20 25 40

b c d 40 39 50

b d 60 55

7


Create Decision Tree on Ambiguity classes

In each level delete tag with minimum occurrence

a b c d10 20 25 40

b c d40 39 50

b d60 55

b

8


Advantage Easy to understand Easy to implement

Disadvantage Context independent

9


Known Tokens Results

Run PercentTokensCorrectAccuracy

197.9739392336376492.34%

298.0635563032896592.50%

397.9639752836778992.51%

497.9241056138157892.94%

597.9740307937230592.36%

Average97.976392144.2362880.292.474%

11

POS tagging using HMMs

Let W be a sequence of words W = w1 , w2 , … , wn

Let T be the corresponding tag sequence T = t1 , t2 , … , tn

Task : Find T which maximizes P ( T | W )

T’ = argmaxT P ( T | W )

12


By Bayes Rule,

P ( T | W ) = P ( W | T ) * P ( T ) / P ( W )

T’ = argmaxT P ( W | T ) * P ( T )

Transition Probability,

P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | t1 … tn-1 )

Applying Tri-gram approximation,

P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )

Introducing a dummy tag, $, to represent the beginning of a sentence,

P ( T ) = P ( t1 | $ ) * P ( t2 | $ t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )

13


Smoothing Transition Probabilities

Sparse data problem

Linear interpolation method

P'(ti | ti - 2 , ti - 1) = λ1 P( ti ) + λ2 P(ti | ti - 1 ) + λ3 P(ti | ti - 2 , ti - 1)

such that the s sum to 1

14


Calculation of λs

15


Emission Probability,

P(W | T ) ≈ P(w1 | t1) * P(w2 | t2) * . . . * P(wn | tn)

Context Dependency

To make more dependent on the context the emission probability is calculated as:

P(W | T ) ≈ P(w1 | $ t1) * P(w2 | t1 t2) ...* P(wn | tn-1 tn)

16


Smoothing technique is applied

P' (wi | ti-1 ti) = θ1 P(wi | ti) + θ2 P(wi | ti-1 ti) Sum of all θs is equal to 1

θs are different for different words.

17


1(

2(

3(

4(

5(

6(

18


19


20


Lexicon generation probability

21


22

P(N V ART N | files like a flower) = 4.37*10-6


23


Known Tokens Results


198.0739429038221196.94%

298.1634591334591397.18%

398.0439784934389496.96%

498.0241097039848796.96%

598.0740346039147597.03%

Average98.072390496.437239697.01%

24

Unknown Tokens Results


11.937760582975.12%

21.846689535780.09%

31.967956615377.34%

41.988283643577.69%

51.937945624678.62%

Average1.9287726.6600477.77%

25

Overall Results

Run TokensCorrectAccuracy

140205038804096.52%

236265835127096.86%

340580539189096.57%

441925340492296.58%

541140539772196.67%

Average400234.2386768.696.64%

Persian Part Of Speech Tagging

Documents

p t p w t

p t w t

p t2 t1

p tn tn

p tn t1 tn

argmaxt p w t

p t3 t1 t2

argmaxt p t w pos tagging