Top Banner
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References Towards Building a Text Commitment System Gaurav Arora 1 200801229 Supervisor Prof. Prasenjit Majumder 1 1 Dhirubhai Ambani Institute of Information and Communication Technology
25

200801229 final presentation

May 25, 2015

Download

Education

samuelhard

Presentation on Textual commitment system for natural language understanding
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Towards Building a Text CommitmentSystem

Gaurav Arora1

200801229Supervisor

Prof. Prasenjit Majumder1

1Dhirubhai Ambani Institute of Information and Communication Technology

Page 2: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Outline

1 IntroductionProblem DefinationNatural Langauge understandingLiterature Survey and UsageApproach Overview

2 Structural Features

3 Our ApproachGenerating Model for simple sentencesExtracting Similar POS Patterns and sentencegenreration

4 Results and Analysis

Page 3: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Problem Definition

Textual Commitment

Publicly Held BeliefsTextual Commitment system simplifies complex sentence in aset of simple sentences which are public beliefs conveyed bythe complex sentence.

Textual Commitment OriginTextual Entailment was proposed by LCC(Language ComputerCorporation) to be used as core component module for NaturalLanguage Understanding.

Page 4: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Problem Definition

Textual Commitment Example

Example Complex SentenceText: "The Extra Girl" (1923) is a story of a small?town girl,SueGraham (played by Mabel Normand) who comes to Hollywoodto be in the pictures. This Mabel Normand vehicle, produced byMack Sennett, followed earlier films about the film industry andalso paved the way for later films about Hollywood, such asKing Vidor?s "Show People" (1928).

Simplified SentencesT1. "The Extra Girl" is a story of a small?town girl.T2. "The Extra Girl" is a story of Sue Graham.T3. Sue Graham is a small?town girl.T4. Sue Graham [was] played by Mabel Normand.T5. Sue Graham comes to Hollywood to be in the pictures.T6. A Mabel Normand vehicle was produced by Mack Sennett.

Page 5: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Natural Language understanding

Language Understanding Components

By machine reading or understanding text mean theformation of a coherent set of beliefs based on a textualcorpus and a background theory.Textual Entailment systems determine whether onesentence is entailed by another.

Language understanding FeaturesNoisyLimited scopeCorpus-wide statisticsMinimal reasoningBottom upGeneralVery Fast!

Page 6: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Literature Survey and Usage

Question Answering to QA4MRE

Question Answering(QA) System have a upper bound of60% of accuracy in systems performance.Current QA system have less emphasis on Understandingand analyzing text.To tackle 60% upper bound QA4MRE focuses onunderstanding single document and emphasis is oncomponent like Textual Commitment,Textual Entailment.

Page 7: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Literature Survey and Usage

Textual Entailment

Pascal Recognising Textual Entailment(RTE) Challenge isreputed evaluation campaign for research in TextualEntailment from past 7 years.Researcher use logic prover to detect entailment toovercome need of background knowledge with anperformance upper bound as 71%.LCC Proposed Textual commitment obtained 9%improvement over upper bound.

Page 8: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Literature Survey and Usage

Textual Entailment classes

Figure: Textual Entailment Classes

Page 9: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Literature Survey and Usage

Textual Commitment Approach

LCC Heuristic ApproachLCC’s TC system uses a series of ex-traction heuristics in orderto enumerate a subset of the discourse commitments that areinferable from either the text or hypothesis

Statistical approach for Textual CommitmentDue to unavailability of Heuristics, we decided to build a TextualCommitment system using statistical features of Language.

Page 10: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Approach Overview

Statistical approach for Textual Commitment

Learning Grammatical Structural rules of SimpleSentences(POS Tags).Converting Complex Sentences into Structural Elements.Finding Similar Rules for Generating Simple sentences.Generating simple sentences in natural language based onRules.

Example Part of Speech TaggingThey-PRP were-VBD easy-JJ as-IN they-PRP levelled-VBD

FeatureKey feature for statistical language , Textual Commitmentgeneration is Part of Speech tagging.

Page 11: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Simple Sentence Distribution

Figure: A Distribution of sentence in english

Page 12: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Comparison of POS Tags

Figure: A Distribution of POS Tags in simple sentences

Page 13: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Generating Model for simple sentences

Module 1 Block diagram

Page 14: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Generating Model for simple sentences

Basic Components

Tri-gram Language Model Generation on POS Tags.Artificial Generation of POS Patterns.Ranking of Artificially generated sentences based oncreated Language Model.

ExampleRanked POSTAG Patterns-53.7293 DT NN VBD VBN-54.0778 PRP VBP RB VBN-54.2327 NNP NN NNP NNP-54.7982 PRP VBP RB JJ-55.3234 NNP NNP NN NNPTotal Generated Rules: 9606406

Page 15: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Generating Model for simple sentences

Distribution of POSTAG in Simple Sentence Tokens

Figure: Textual Entailment Classes

Page 16: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Generating Model for simple sentences

Distribution of Simple Sentence based on LM score

Total Rules: 9606406Number of rules categorized by scoresRules > -100 ( 679545 )Rules > -90 ( 170662 )Rules > -80 ( 27280 )Rules > -70 ( 2328 )Rules > -65 ( 474 )Rules > -60 ( 76 )Rules > -70,Words length - 5 and 61594Rules > -83,Words length - 7 and 83110Total Rules considered for matching: 4704

Page 17: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Extracting Similar POS Patterns and sentence generation

Module 2 and 3 Block Diagram

Page 18: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Extracting Similar POS Patterns and sentence generation

Extracting Similar POS Patterns - Basic Components

Extraction of POS tags and Chunks from Complexsentences.Chunks are Noun Phrase,Together occurring words whichmust also occur together in simple sentences.Considering POS Rules from Module 1 as Virtualdocuments.Searching for Rules/Documents Similar to Chunks andPOSTAG in complex sentences.Xapian is used for search,Phrase search to ensureoccurrence of chunks tags together in Similar rules.

Page 19: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Extracting Similar POS Patterns and sentence generation

Extracting Similar POS Patterns - Module I/O

ExampleSentence:A Revenue Cutter, the ship was named for Harriet Lane, nieceof President James Buchanan,who served as Buchanan?sWhite House hostess.

ExampleFrequency of POS tags,chunks in Complex Sentences:POSTags: WP=1, VBN=1, IN=3, NNP=8, DT=2, VBD=2,..Chunks: DT NN NN=1, VBD VBN=1,NNP NNP NNP=1 ,..

ExampleExtracted Patterns from Xapian:91% NNP NNP VBD VBN IN DT NN RB86% NNP VBD RB VBN IN DT NN RB

Page 20: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Extracting Similar POS Patterns and sentence generation

Simple Sentence Generation - Basic Components

Replacement of all chunks in Similar POS Tag Rules withchunk value.Additional rules with different chunk values are added ifchunk maps to more than one value.After replacement of chunk, Left POS Tags are filled withvalues.Module Generate a lot of noisy sentences from thismodule.

Page 21: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Extracting Similar POS Patterns and sentence generation

Simple Sentence Generation - Module I/O

chunk value mappingNNP NNP-1=White House, NNP NNP-0=Harriet Lane,VBDVBN-0=was named, DT NN-0=the ship, RB IN-0=niece of

ExampleA Revenue Cutter, the ship was named for Harriet Lane, nieceof President James Buchanan,who served as Buchanan?sWhite House hostess.Simple Sentences:Harriet Lane President James Buchanan nieceHarriet Lane served for hostessHarriet Lane was for the shipBuchanan ? White House the ship hostessHarriet Lane was the ship

Page 22: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Recall System

Recall of System is important,System is input to textualentailment and other Natural Language UnderstandingModule.Recall of our statistical Textual Commitment system is0.23.System Recall calculated on 5 Complex Sentences.Recall value shows positive signs for SophisticatedStatistical Textual Commitment system.

Page 23: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Analysis

Require additional module to rank good sentences andremove noisy sentences.Sophisticated Natural Language Generation Module.Generating Simple sentence pattern from Complexsentences rather than Artificially generating Rules.Finding a more suitable Model- Combination of bi-gramand tri-gram , bigram model .

Page 24: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

Acknowledgement

I would like to express my sincere thanks to Prof. PrasenjitMajumder for providing opportunity to work under his esteemguidance and helping throughout the project and providing hisvaluable critical suggestion on my work.Additionally i would liketo thanks SRILM and Xapian team for helping me work withtheir open source software.

Page 25: 200801229 final presentation

Introduction Structural Features Our Approach Results and Analysis Acknowledgement References

References

Hickl: A discourse commitment-based framework forrecognizing textual entailmentAnselmo Peñas et. al. Overview of QA4MRE at CLEF2011: Question Answering for Machine ReadingEvaluation, Working Notes of CLEF (2011)L. Bentivogli (FBK-irst) et. al. The Sixth PASCALRecognizing Textual Entailment Challenge( 2010) Olly Betts,Xapian,version 1.2.9Asher Stern and Ido Dagan: A Confidence Model forSyntactically-Motivated Entailment Proofs. In Proceedingsof RANLP 2011Katrin Kirchhoff et. al. Factored Language Models Tutorial(2008)(2002) The IEEE website. [Online]. Available:http://www.ieee.org/(2010) SRILM-Language Modelling Toolkit.