Top Banner
Constraint Based Hindi Parser LTRC, IIIT Hyderabad
45
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Constraint Based Hindi Parser

LTRC, IIIT Hyderabad

Page 2: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Introduction

Broad coverage parser Very crucial IL-IL MT systems, IE, co-reference resolution, etc.

Page 3: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Why Dependency ?

Phrase Structures Intrinsically presumes order Context Free Grammar (CFG) not well-suited for

free-word order languages (Shieber, 1985) Particularly ill suited to Indian Languages

Dependency Structures Gives flexibility Common structures With appropriate labels, closer to Semantics

Page 4: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Computational Paninian Grammar (CPG)

Based on Panini’s Grammar (500 BC) Inspired by Inflectionally rich language

(Sanskrit) A dependency based analysis

Page 5: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Computational Paninian Grammar (The Basic Framework)

Treats a sentence as a set of modifier-modified relations Sentence has a primary modified or the root

(which is generally a verb) Gives us the framework to identify these

relations Relations between noun constituent and verb

called ‘karaka’ karakas are syntactico-semantic in nature Syntactic cues help us in identifying the karakas

Page 6: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

karta – karma karaka The boy opened the lock

k1 – karta k2 – karma

karta, karma usually correspond to agent, theme But not always

karakas are direct participants in the activity denoted by the verb

open

boy lock

k1 k2

Page 7: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Basic karaka relations karta – agent/doer/force

Relation label – k1 karma – object/patient

Relation label – k2 karana – instrument

Relation label – k3 sampradaan – beneficiary

Relation label – k4 apaadaan – source

Relation label – k5 adhikarana – location in place/time/other

Relation label – k7p/k7t/k7

For complete list of dependency relations: (Begum et al., 2008)

Page 8: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Basic karaka relations

raama phala khaataa hai ‘Ram eats fruit’

Page 9: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Basic karaka relations

raama chaaku se saiv kaatataa hai ‘Ram cuts the apple with knife’

Page 10: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Basic karaka relations

raama ne mohana ko pustaka dii‘Ram gave a book to Mohan’

Page 11: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Why Paninian Labels Other choices for labels could be

Grammatical relations Subject, Object, etc. Behavioral tests (Mohanan, 1994)

Thematic roles Agent, patient, etc. No concrete cues

Difficult to extract them automatically Karakas can be computationally exploited

Syntactically grounded, Semantically loaded Gives a level of interface

Page 12: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Levels of Language Analysis Morphological analysis (Morph Info.) Analysis in local context (POS tagging) Sentence analysis (Chunking, Parsing)

Semantic analysis (Word sense disambiguation, etc.)

Discourse processing (Anaphora resolution, Informational Structure, etc.)

Page 13: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Example

rAma ne mohana ko puswaka xI |

Page 14: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Example – Parsed Output

xI ‘give’

puswaka ‘book’

mohanarAma

k2k4k1

Page 15: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Parser

Two stage strategy Appropriate constraints formed

Stage I (Intra-clausal relations) Dependency relations marked Relations such as k1, k2, k3, etc. for each verb

Stage II (Inter-clausal relations & conjunct relations) Conjuncts, relative clauses, kriya mula, etc

Page 16: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Demand Frame for Verb

A demand frame or karaka frame for a verb indicates the demands the verb makes

It depends on the verb and its tense, aspect and modality (TAM) label.

A mapping is specified between karaka relations and vibhaktis (post-positions, suffix).

Page 17: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Karaka Frame

It specifies what karakas are mandatory or optional for the verb and what vibhaktis (post-positions) they take respectively

Each verb belongs to a specific verb class Each class has a basic karaka frame

Each TAM specifies a transformation rule

Page 18: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Example

rAma mohana ko puswaka xewA hE |

xewA hE ‘give is’

puswaka ‘book’

mohanarAma

k2k4k1

Parsed Dependency Tree

Page 19: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Transformations

Based on the TAM of the verb rAma ne mohana ko KilOnA xiyA | rAma ko mohana ko KilOnA xenA padZA | Appropriate transformation applied

Page 20: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Example

rAma ne mohana ko puswaka xI |

Page 21: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Karaka Frame – xe (give)

Page 22: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Transformation Rule – yA (TAM)

Page 23: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Karaka Frame

rAma ne mohana ko KilOnA xiyA |

yA TAM

----------------------------------------------------------------------------------------arc-label necessity vibhakti lextype src-pos arc-dir

---------------------------------------------------------------------------------------- k1 m ne n l c k2 m 0|ko n l c k3 d se n l c k4 d ko n l c----------------------------------------------------------------------------------------

Transformed frame for xe after applying the yA trasformation

0 ne

Page 24: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Parsed Output

xI ‘give’

puswaka ‘book’

mohanarAma

k2k4k1

Page 25: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Other frames

Adjectives

Page 26: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Steps in Parsing

Morph, POS tagging,Chunking

SENTENCE

Identify DemandGroups

Load Frames&

Transform

Find CandidatesApply

Constraints& Solve

Final Parse

Page 27: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Example:

rAma ne mohana ko KilOnA xiyA |

Page 28: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Identify the demand group,Load and Transform DF

xiyA Only verb

Transformed frame Use ‘yA’ TAM info.

----------------------------------------------------------------------------------------arc-label necessity vibhakti lextype src-pos arc-dir

---------------------------------------------------------------------------------------- k1 m ne n l c k2 m 0|ko n l c k3 d se n l c k4 d ko n l c----------------------------------------------------------------------------------------

Page 29: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Candidates

rAma ne mohana ko KilOnA xiyA _ROOT_ |

k1

k2

k4

k2

main

Page 30: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Constraints

C1: For each of the mandatory demands in a demand frame for each demand group, there should be exactly one outgoing edge labeled by the demand from the demand group.

C2: For each of the optional demands in a demand frame for each demand group, there should be at most one outgoing edge labeled by the demand from the demand group.

C3: There should be exactly one incoming arc into each source group.

Page 31: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Constraints

A parse of a sentence is obtained by satisfying all the above constraints

Ambiguous sentences have multiple parses Ill formed sentences have no parse.

Page 32: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Parse - I

rAma ne mohana ko KilOnA xiyA _ROOT_ |

k1

k4

k2

main

Page 33: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Parse - I

xiyA

KilOnAmohanarAma

k2k4k1

_ROOT_

main

Page 34: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Integer Programming Constraints

Xijk represents a possible arc from word group i to j with karaka label k

It takes a value 1 if the solution has that arc and 0 otherwise. It cannot take any other values.

The constraint rules are formulated into constraint equations.

Page 35: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Constraint Equations

C1: For each demand group i, for each of its mandatory demands k, the following equalities must hold:

Mik : j xikj = 1

C2: For each demand group i, for each of its optional or desirable demands k, the following inequalities must hold:

Oik:j xikj <= 1

C3: For each of the source groups j, the following equalities must hold:

Sj : ik xikj = 1

Page 36: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Multiple Frames

If more than one karaka frame for a verb Call Integer Programming package for each

frame If more than one demand groups (e.g.,

multiple verbs) in the sentence with multiple demand frames Call Integer Programming package for each

combination of such frames

Page 37: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Other frames

Common karaka frame Attached to each karaka frame Preference given to main frame if there are

clashes

Fallback karaka frame required karaka frame is missing Graceful degradation

Page 38: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Stage I: Types being handled

Simple Verbs Non-finite verbs

wA_huA wA_hI nA kara 0_rahe, etc.

Copula Genitive

Page 39: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Example (Complex Sentence)

rAma ne phala khaakara mohana ko

Ram ‘ERG’ fruit ‘having eaten’ Mohan ‘DAT’

KilOnA xiyA

toy gave

‘Having eaten the fruit Ram gave the toy to Mohan’

Page 40: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Candidates

rAma ne phala khaakara mohana ko KilOnA xiyA _ROOT_ |

X1: k1

X3: k2

X5: k4

X2: k2

X7: vmodX4: k2

X6: k2

X8: main

Page 41: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Constraint Equations Verb ‘xe’

Mandatory Demands (C1) k1 x1 = 1 k2 x2 + x3 + x4 = 1

Optional Demands (C2) k4 x5 <= 1

Verb ‘khaa’ Mandatory Demands (C1)

k2 x6 = 1 vmod x7 = 1

_ROOT_ C1

Main x8 = 1

Page 42: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Constraint Equations (contd.) Incoming Arcs into Source (C3)

rAma x1 = 1

phala x4 + x6 = 1

khaa x7 = 1

mohana x3 + x5 = 1

KilOnA x2 = 1

xe x8 = 1

Page 43: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

Solution Graph

xiyA

KilOnAmohanarAma

k2k4k1

_ROOT_

main

khaakara

phala

k2

vmod

Page 44: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

References Akshar Bharati and Rajeev Sangal. 1993. Parsing free word order languages in

Paninian Framework. ACL:93, Proc.of Annual Meeting of Association of Computational Linguistics, Association of Computational Linguistics, New Jersey. USA.

Akshar Bharati, Rajeev Sangal, T Papi Reddy. 2002. A Constraint Based Parser Using Integer Programming In Proc. of ICON-2002: International Conference on Natural Language Processing.

Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai and Rajeev Sangal. 2008. Dependency Annotation Scheme for Indian Languages. In Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP). Hyderabad, India.

S. M. Shieber. 1985. Evidence against the context-freeness of natural language. In Linguistics and Philosophy, p. 8, 334–343.

Tara Mohanan, 1994. Arguments in Hindi. CSLI Publications.

Page 45: Constraint Based Hindi Parser LTRC, IIIT Hyderabad.

THANKS!!