Top Banner
Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006
22

Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Dec 30, 2015

Download

Documents

Roland Paul
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Identifying Comparative Sentences in Text Documents

Nitin Jindal and Bing Liu

University of Illinois

SIGIR 2006

Page 2: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Introduction

• Comparisons are one of the most convincing ways of evaluation.

• Much of such info is available on the Web (customer reviews), forum discussions, and blogs.

• Useful for product manufacturers and potential customers (to make purchasing decisions).

Page 3: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Comparisons vs. Opinions

• Comparisons can be both objective or subjective.

• Comparative sentences have different language constructs from typical opinion sentences.

• Comparative sentences may contain some indicators.

Car X is much better than Car Y

Car X is two feet longer than Car Y

Page 4: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Related Work

• Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification.

• Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.

Page 5: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Comparatives (Linguistic)

• Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property.

John is taller than he was

=>

John is tall to degree d

Page 6: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Comparatives (Linguistic)

• Two broad types:– Metalinguistic Comparatives: compare properti

es of one entity.

Ronaldo is angrier than upset.– Propositional Comparatives: compare between t

wo propositions. Three subcategories:

Page 7: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Comparatives (Propositional)

• Nominal Comparatives: (two sets of entities)

Paul ate more grapes than bananas.

• Adjectival Comparatives: (than, as good as)

Ford is cheaper than Volvo.

• Adverbial Comparatives: (occur after a verb phrase)

Tom ate more quickly than Jane.

Page 8: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Superlatives

• Adjectival Superlatives:

John is the tallest person.

• Adverbial Superlatives:

Jill did her homework most frequently.

• Equality: conjunctions like and, or, …

John and Sue, both like sushi.

Page 9: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

POS involved

• NN: Noun• NNP: Proper Noun• VBZ: Verb, present tense, 3rd person singular• JJ: Adjective• RB: Adverb• JJR Adjective, comparatives• JJS: Adjective, superlative• RBR: Adverb, comparative• RBS: Adverb, superlative

Page 10: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Limitations of linguistic classification.

• Non-comparatives with comparative words: many non-comparatives contain comparative words.

In the context of speed, faster means better.John has to try his best to win this game.

• Limited coverage: many comparatives contain no comparative words.

In market capital, Intel is way ahead of Amd.Nokia Samsung, both cell phones perform badly on heat dissipation index.

The M7500 earned a World bench score of 85, whereas Asus A3V posted

a mark of 89.

Page 11: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Enhancements

• First limitation: machine learning methods to distinguish comparatives and non-comparatives.

• Second limitation: – User preferences:

I prefer Intel to Amd = Intel is better than Amd

– Implicit comparatives:Camera X has 2 MP, whereas camera Y has 5 MP.

Page 12: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Types of Comparatives

• Non-Equal Gradable: greater or less than type, including user preferences.

• Equative (Gradable): equal to type• Superlative (Gradable): greater of less than

all others type• Non-Gradable:

– A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t

Page 13: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Tasks

• Identifying comparative sentences from a given text data set.

• Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)

Page 14: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Class Sequential Rules with Multiple Minimum Supports

• For sequential pattern mining, patterns to the left and class to the right.

• Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against)

• The performance of only using keywords are P=32%, R=94%.

Page 15: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Support and Confidence

• Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:

Page 16: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Building the Sequence DBthis/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD

{NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative

• Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%.

• 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..

Page 17: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Classification Learning

• Machine learning methods:

Feature Set = {X | X is the sequential pattern in

CSR X → y} ∪{Z | Z is the pattern in a manual rule

Z → y}

Page 18: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Data Preparation

• Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones.

• Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google.

• News articles on topics such as automobiles, ipods, and soccer vs. football.

Page 19: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Number of Sentences in Data Sets

Page 20: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Experimental Results (1)

Page 21: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Experimental Results (2)

• Review: R low P high -> short sentences, hard to find patterns

• Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.

Page 22: Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006.

Conclusion and Future Work

• Identifying comparative sentences.

• Analyzing different types of comparative sentences.

• Studying how to automatically classify subjective and objective comparisons.