07/31/12 Pronominal Anaphora Resolution 1 Pronominal Anaphora Resolution in Nepali Language by Dev Bahadur Poudel(03314) Bivod Aale Magar(03307) Nepal Engineering College Changunaryan, Bhaktapur
07/31/12 Pronominal Anaphora Resolution1
Final Year Project onPronominal Anaphora
Resolution in Nepali Language
by
Dev Bahadur Poudel(03314) Bivod Aale Magar(03307)Nepal Engineering College
Changunaryan, Bhaktapur
07/31/12 Pronominal Anaphora Resolution2
Contents
Brief Introduction and Background Approach to Algorithm Implementation in Nepali Discourse Over-view of our system Scope of our system Conclusion
07/31/12 Pronominal Anaphora Resolution3
What is Anaphora?
Reference to an entity that has been previously introduced in the discourse.
07/31/12 Pronominal Anaphora Resolution4
What is Anaphora Resolution?
Process of determining the antecedent of an anaphor.
07/31/12 Pronominal Anaphora Resolution5
रा�म स्कू� ल जा�न्छ । ऊ घरा फकू� न्छ ।
Anaphor resolution in Nepali
AntecedentAnaphor
ऊ =रा�म
07/31/12 Pronominal Anaphora Resolution6
Can Machine resolve the anaphora?
Human intelligence can easily find out to which referents the anaphor belongs.
Can we built a system that can resolve the anaphora to the antecendents?
Corpus
collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language.
07/31/12 Pronominal Anaphora Resolution7
Unicode
an industry standard allowing computers to represent and manipulate text consistently
consists of about 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings
Unlike ASCII, which uses 7 bits for each character, Unicode uses 16 bits, which means that it can represent more than 65,000 unique characters.
07/31/12 Pronominal Anaphora Resolution8
07/31/12 Pronominal Anaphora Resolution9
Approach to the Algorithm
Non-Probabilistic– Lappin and Leass Algorithm(1994)– A Tree Search Algorithm- Hobbs(1978)
Probabilistic– Centering Algorithm– Mitkov’s weak knowledge algorithm
07/31/12 Pronominal Anaphora Resolution10
Approach to the Algorithm
Lappin and Leass Algorithm(1994)Algorithm based on the Sailence
factors given to the noun and pronoun.
07/31/12 Pronominal Anaphora Resolution11
Salience factors in Lappin andLeass's Algorithm.
Sentence recency 100
Subject emphasis 80
Existential emphasis 70 Accusative (direct object) emphasis
50 Indirect object and oblique complement
emphasis 40 Non-adverbial emphasis
50 Head noun emphasis 80
07/31/12 Pronominal Anaphora Resolution12
Implementation
Can be implemented using different languages
JAVA, PHP Our system uses JAVA
07/31/12 Pronominal Anaphora Resolution13
InputTokenizer and
Tagger Salience Factor
Assigner
Output
Block Diagram of the system
07/31/12 Pronominal Anaphora Resolution14
Flowchart START
Input Paragraph
Take A sentence
Tokenize
Take token
Check In Corpus
Classify as noun or pronoun
Classify subject/Object
Give Silence value
Calculate total weights
Next sentence ?
Determine correct referents
Half the salience values
Display Results
yes
no
END
Log Error
yes
no
User Interface
07/31/12 Pronominal Anaphora Resolution15
07/31/12 Pronominal Anaphora Resolution16
An Example in Nepali
! = /fd 38L lsGg rfxG5 .
@= xl/n] Tof] k;ndf b]Vof] .
#= p;n] p;nfO{ b]vfof] .
07/31/12 Pronominal Anaphora Resolution17
! = /fd 38L lsGg rfxG5 .
Decrease the salient values by factor 2 Decrease the salient values by factor 2 when reading next sentencewhen reading next sentence
07/31/12 Pronominal Anaphora Resolution18
@= xl/n] Tof] k;ndf b]Vof] .
xl/ gets (Rec: 100+ Sub: 80+ Non adv: 50+ HN:80 =310)Tof] get 280 (rec:100+ cobj:50+non-adv:50+ HN: 80) Tof] resolved to 38L due to high salience
value of 38L k;n will get (rec:100+non-adv 50+
HN:80)=230
07/31/12 Pronominal Anaphora Resolution19
Updated Discourse Model
Divide the previous salience factors by two
07/31/12 Pronominal Anaphora Resolution20
p;n] will be resolve to xl/ due to high salience factors. Add Salience factor (recency:100+ subpos: 80+ nonadv:50+HN:80)=310
p;nfO{ can not be xl/ due to syntactic constraints. So,
p;nfO{ will be resolved to /fd . (rec:100+indObj:40+non-adv 50+ HN:80)=270
#= p;n] p;nfO{ b]vfof] .
Updated Discourse ModelUpdated Discourse Model
Result
07/31/12 Pronominal Anaphora Resolution21
Paragraph
Using
Total Samples
Used
Total Antecedent
s
Total Anaphors
Correctly resolved
Incorrectly Resolved
Zero Anapho
rs
Efficiency
2-sentence 15 37 22 15 7 0 68%
3-sentence 15 50 37 28 9 0 75%
4-sentence 10 35 35 22 11 2 62.8%
5-sentence 10 43 41 25 14 2 60.9%
> 5-sentence 5 28 31 17 11 3 54%
Total 55 193 166 107 52 7 64%
07/31/12 Pronominal Anaphora Resolution22
Scope of the Project-Natural language processing-Question answering-Text Summarizing-Information Extraction-Interaction with query interfaces and dialogue interpretation-Natural Language Generation
Limitations
The lack of tagger and parser limits the system for large corpus and had to go for a hand annotated corpus.
The sentences are limited to the words defined in our corpus
The system is limited to the third person pronouns but not reflexive.
07/31/12 Pronominal Anaphora Resolution23
Further Works
Morphological analysis can be done The system can be enhanced further work on large
number of sentences. This project can be used with collaboration of other
NLP projects in Nepali language for further research. The statistical methods can be applied to get higher
efficiency.
07/31/12 Pronominal Anaphora Resolution24
07/31/12 Pronominal Anaphora Resolution25
Conclusion
Research to see how a basic approach like Lappin and Leass performs for Nepali language.
Applies to non reflexive third person pronouns. Emerging concept in Nepali Language Understanding the discourse - challenging to
computer intelligence Without tagger and parser our system is greatly
dictionary dependent Our work aid to future research in Nepali
language
07/31/12 Pronominal Anaphora Resolution26
Thank You.