Constraint based Dependency Telugu Parser Guided by - Dr.Rajeev Sangal Dr.Dipti Misra Samar Hussain Team members - Phani Chaitanya Ravi kiran
Apr 02, 2015
Constraint based Dependency Telugu Parser
Guided by -Dr.Rajeev
SangalDr.Dipti MisraSamar Hussain
Team members -Phani ChaitanyaRavi kiran
Overview
• Motivation• A word about the language• Overview of constraint based parser• Analysis of special cases– Genitives– Copula– “ani” construction– Conjuncts
• Future work
Motivation
– We thought about a question answering system in Telugu mainly for medical and tourism domain which could help native Telugu speakers (as a preliminary diagnosis tool and a travel guide). And we were in need of a parser to make things easier.
A word about the language
• Telugu is a South Asian language• Features– Morphologically rich– Free word order– Agglutinative
• challenges– No Treebank– No parser– No wordnet
Overview of constraint based parserTelugu : rAmudu iMtiki vaccAka paMdu ni wiMtadu
Gloss :Rama home after_coming apple eats
English :Ram eats an apple after coming home
Overview of constraint based parser1 (( NP1.1 rAmudu NN <af=rAma,n,,,,0,,adj_vAdu,>
))2 (( NP2.1 iMtiki NN <af=illu,n,,s,,0,,ki,>
))3 (( VG3.1 vaccAka VRB <af=vaccu,v,,,any,0,,ina_Aka,>
))4 (( NP4.1 paMdu NN <af=paMdu,n,,s,,0,,0,>|<af=paMdu,n,,s,,0,,obl,>4.2 ni PREP <af=ni,n,,s,,0,,0,>
))5 (( VG5.1 wiMtAdu VFM <af=winu,v,,,3_p,0,,wA,>5.2 . SYM
))))
Overview of constraint based parser1 (( NP Source1.1 rAmudu NN <af=rAma,n,,,,0,,adj_vAdu,>
))2 (( NP Source2.1 iMtiki NN <af=illu,n,,s,,0,,ki,>
))3 (( VG Demand3.1 vaccAka VRB <af=vaccu,v,,,any,0,,ina_Aka,>
))4 (( NP Source4.1 paMdu NN <af=paMdu,n,,s,,0,,0,>4.2 ni PREP <af=ni,n,,s,,0,,0,>
))5 (( VG Demand5.1 wiMtAdu VFM <af=winu,v,,,3_p,0,,wA,>5.2 . SYM
))))
Overview of constraint based parserFrame for winu (eat in basic form so no transformation required)-------------------------------------------------------------------arc-label |necessity| vibhakti|lextype |posn|reln-------------------------------------------------------------------k1 m 0 n l ck2 m ni n l c k1 k2--------------------------------------------------------------------
Frame for vaccu (come)-------------------------------------------------------------------arc-label |necessity| vibhakti|lextype |posn|reln Vmod-------------------------------------------------------------------k1 m 0 n l cK2 m ki n l c------------------------------------------------------------------- k1 k2
Transformation charts [ina_aka (after+ing)]----------------------------------------------------------------------------arc-label |necessity| vibhakti|lextype |posn|reln|op----------------------------------------------------------------------------K1 m 0 n l c removeVmod m - v r p insert-----------------------------------------------------------------------------
Winu[wa] (eat)
rAmudu(Ram) paMdu (fruit)
(after coming )Vaccu[ina_aka]
(House)iMtiki rAmudu
Overview of constraint based parserFrame for vaccAka (after transformation)arc-label necessity vibhakti lextype posn relnk2 m ki n l cVmod m - v r p
-------------------------------------------------------------Frame for winuk1 m 0 n l ck2 m ni n l c----------------------------------------------------------------------------------------
rAmudu iMtiki vaccAka paMduni wiMtadu
X1:k1
X3:k2 X2:k2
X4:vmod
Overview of constraint based parser
C1 : For each of the mandatory karakas in a karaka chart for each demand group, there should be exactly one outgoing edge labeled by the karaka by the demand group.
C2 : for each of the optional or desirable karakas in a karaka chart for each demand group, there should be at most one outgoing edge labeled by the karaka by the demand group.
C3 : There should be exactly one incoming arc into each source group
Equations formed by applying the above constraints are :C1 : X1 = 1
X2 = 1X3 = 1X4 = 1
C2 : No optional field found
C3 : X1 = 1X2 = 1X3 = 1X4 = 1
Overview of constraint based parser1 (( NP < af=rAma,n,,,,0,,adj_vAdu,/drel=k1:5/name=1>1.1 rAmudu NN <af=rAma,n,,,,0,,adj_vAdu,>
))2 (( NP <af=illu,n,,s,,0,,ki,/drel = k2:3/name=2>2.1 iMtiki NN <af=illu,n,,s,,0,,ki,>
))3 (( VG <af=vaccu,v,,,any,0,,ina_Aka,/drel = vmod:5/name=3>3.1 vaccAka VRB <af=vaccu,v,,,any,0,,ina_Aka,>
))4 (( NP <af=paMdu,n,,s,,0,,0,/drel = k2:5/name=4>4.1 paMdu NN <af=paMdu,n,,s,,0,,0,>|<af=paMdu,n,,s,,0,,obl,>4.2 ni PREP <af=ni,n,,s,,0,,0,>
))5 (( VG <af=winu,v,,,3_p,0,,wA,/name = 5>5.1 wiMtAdu VFM <af=winu,v,,,3_p,0,,wA,>5.2 . SYM
))))
Analysis of special cases
• Genitives• Copula• “ani” construction• Conjuncts
Genitives• Genitives is the case that marks a noun as being the
possessor of another noun (ex – his, her, its …… etc)• Cases – Genitive marker exists
– Telugu : rAmudi yoVkka puswakaM– Gloss : ram 's book
• So when there is a marker then it is a straight forward that the noun preceding “yoVkka” holds an R6 relation with the noun succeeding “yoVkka”.
– Genitive marker is dropped– Telugu : rAmudi puswakaM– Gloss : ram book
• here is the suffix “udi” in “rAmudi” which gives the information about existence of genitive.
Genitive contd..
• Exceptions in case where genitive marker can be dropped• Telugu : raGu puswakaM rAmudiki icCadu• Gloss : Raghu book Ram gave• English (sense 1): Raghu gave book to sita.• English (sense 2): Raghu’s book is given to sita.
So for non-masculine nouns (Raghu and Sita)in Telugu we don’t have any markers for genitives.
• So we output all possible parses for this case. The parses include
raGu
icCAdu
puswakam
rAmudiki
puswakam
icCAdu
raGur6
k1k4
k2rAmudiki
k4 k2
Copula• Ex – is, are, were ….. Etc• Copula is generally dropped in Telugu
For ex-– Telugu : rAmudu maMci bAludu– gloss : RAM good boy– Eng : Ram is a good boy.
• So we handle these cases by introducing a “NULL_VG”Frame for NULL_VG--------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln--------------------------------------------------------------------------------------------k1 m 0 n l ck1S m 0 n l c--------------------------------------------------------------------------------------------
‘ani’ construction• ‘ani’ in telugu is some times similar to “that” in english.• There are three different ways of using “ani” as follows :
Used as complementizer :• Telugu : rAmudu paMdu wiMtAdu ani mohan ceVppAdu.• Gloss : Ram fruit will_eat that mohan said .• English : Ram said that Mohan will eat a fruit.
Used as verb :• Telugu : mohan rAmudu paMdu wiMtAdu ani vellipoyAdu .• English : mohan left saying ram eats an apple.
Used to state a reason :• Telugu : mohan rAmudu paMdu winnAdani vellipoyAdu.• Gloss : Mohan Ram fruit had_eaten went.• English : Mohan went because ram had eaten the fruit.
“ani” construction Contd …
So we created a demand frame for “ani”
Frame for ani--------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln--------------------------------------------------------------------------------------------Ccof m - v_fin l cCcof m - v_fin r p--------------------------------------------------------------------------------------------
Conjuncts • In Telugu conjuncts occur as suffixes (tam of the
verb) , DheergAs and as lexical items such as “inkA” , “anduke” , “mariyu” , “kAni” , “aiwe” and “anwe”.
Suffixes : Here , just applying the corresponding transformation
chart of the verb solves the case. Telugu : nenu iMtiki velwe nixrapowAnu.
Gloss : I home if gowill_sleep .
English: I will sleep if I go home.
Contd …• Lexical items :
Here we will have frame for each lexical entry which will do the corresponding job.
In case of “mariyu” :
Frame 1 :--------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln--------------------------------------------------------------------------------------------Ccof m - v l cCcof m - v r c--------------------------------------------------------------------------------------------
Frame 2 :--------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln--------------------------------------------------------------------------------------------Ccof m - n l c
Ccof m - n r c--------------------------------------------------------------------------------------------
Contd …• DheergAs :
Often by elongation of the vowel at the end of lexical items the conjuncts information is implicit there without the need of explicit lexical entries such as “mariyu”.• Telugu : rAmudU siwA iMtiki vellAru.• Gloss : Ram (implicit conj) sita home went .• English : Ram and Sita went home .
In such cases a NULL_CCP is introduced which serves like explicit conjunct lexical entry and we have a frames for the NULL_CCP similar to the one in previous slide.
Future work !!
• A thorough analysis of Relative clauses.• Analysis and handling of NULL VERBS in case
of complex constructions.• And their implementation.• Verb and TAM Classification.
THANKS !!
Any Queries ??