Top Banner
“Alexandru Ioan Cuza” of Iași Faculty of computer Science SEMANTICA ȘI PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iași 09 Oct. 2014
25

“Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

Dec 17, 2015

Download

Documents

Sydney Lawson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

“Alexandru Ioan Cuza” of IașiFaculty of computer Science

SEMANTICA ȘI PRAGMATICA

LIMBAJULUI NATURAL

Daniela GÎFU

Iași09 Oct. 2014

Page 2: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

SENTIMENT ANALYSIS – AN OVERVIEW

Cursul nr. 2

Page 3: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

IMPACT OF TOPIC

Sentiment Analysis (SA) - one of the most current topics in NLP.

SA - offers possibility to monitor, to identify and understand in real time consumer's feelings and attitudes towards brands or topics in cyberspace and act accordingly.

SA - very popular in social media.

-Target:  academia and industry.

Page 4: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

PURPOSE AND MOTIVATION

- to create a complete SOTA in SA, with a focus on social media posts.- to enhance the results of context-based SA.

- to clarify the descriptive behavior of receptor, affected by the multitude of information on forums.- to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).

Page 5: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

CONTENT

1. Introduction2. A general view on the subject3. SA levels

3.1. SA at document level3.2. SA at clause/sentence level3.3. Features-based on SA3.4. Comparative sentiment analysis3.5. Sentiment lexicon acquisition3.6. Conclusions

4. Applications4.1. Business and government4.2. Review sites4.3. Other domains: politics and sociology4.4. Conclusions

5. Conclusions and discussions

Page 6: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

2. A general view on the subject

SA - a module of extracting opinions, sentiments and subjectivity of the text;

SA – terminology:

- subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986];- analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000];- affect [Batson, Shaw, and Oleson 1992];- point of view [Wiebe 1994; Scheibman 2002];- evaluation [Hunston and Thompson, 2001]- appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].

Page 7: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. Sentiment classification techniques

Fig. 1 Sentiment classification techniques

Page 8: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels - document

Positive Negative Neutral

Fig. 2 Supervised learning – for three classes

a) supervised approach

Page 9: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels - document

Fig. 2 Python NLTK Demos for Natural Language Text Processing

a) supervised approach

http://text-processing.com/demo/

Page 10: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels - document

a) unsupervised approach

Based on determining the semantic orientation (SO) of specific words/phrases.

1. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011]

1. Set of predefined POS models – [Turney, 2002]

Page 11: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels – clause/sentence

More complex – identifying if a sentence is opinionated and establishing the nature of opinion;

- using supervised methods;

1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003]

2. an approach based on minimal reductions. [Pang and Lee, 2004]

The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?

Page 12: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels – features

- more entities for each analyzed text or more attributes for each entity;- extraction of the attributes of an object;

Becali a ajutat mult săracii 1/, [dar] nimeni nu a ştiut exact 2/ [cum] a făcut atâţia bani 3/.

- extract and store all NPs;

- keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]

Page 13: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels – comparative

-When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006]

Dacia Logan arată mult mai bine decât Dacia Solenza. - adverbial adjectives: mai mult, mai puţin (En. - more, less)- superlative adjectives and adverbs: mai, cel puţin (En. - more, at least)- additional clauses: decât, împotriva (En. - rather than, against).

cover 98% of the comparative opinions

Page 14: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels – sentiment lexicon

a) manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004]

Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes.

<classes><class name="emotional" id="1"/><class name="positive" id="2" parent="1"/><class name="negative" id="3" parent="1"/><class name="anxiety" id="4" parent="3"/><class name="anger" id="5" parent="3"/><class name="sadness" id="6" parent="3"/><class name="spectacular" id="7" parent="2"/><class name="firmness” id="8" parent="2"/><class name="moderation" id="9" parent="2"/>

</classes>

Page 15: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels – sentiment lexicon

Our software performs part-of-speech (POS) tagging and lemmatization of words.

For example: <lexic name="Politic" lang="ro">

<word lemma="clevetitor" classes="1,3,6"/><word lemma="genial" classes="1,2,7"/>

…</lexic>

Page 16: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

3. SA levels – sentiment lexicon

a) corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain.

- a classical work [Hatzivassiloglou and McKeown, 1997] using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either).

Examples:  bărbat puternic şi armonios / bărbat puternic şi armonios

femeie senzuală sau inteligentă? / femeie sărmană sau înstărită?

băiatul nu e nici prost, nici deștept... / băiatul nu e nici prost, nici urât...

Page 17: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

4. Applications – business and government

“Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes. [Lee, 2004]

Two kinds of answers: - the subjective reasons about intangible qualities (e.g. the physical keyboard is tacky)

or - misperceptions (even though they are wrong)

Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].

Page 18: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

4. Applications – business and government

Solution based on a dictionary + semantic role of negations and pragmatic connectors: - classification of emotionally charged words into two classes: positive and negative (also a neutral class);

- more classes, associating to each word with a value in the range -5 to +5;

- [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3;

- [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.

Page 19: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT>

<P ID="1"><S ID="1"><W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative"Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative"offset="0"></W><NP HEADID="11.2" ID="0" ref="0"><W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr"Number="singular" POS="PRONOUN" Person="third" Type="negative"offset="1">Nimic</W><W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W><W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios"MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W><W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W> <W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W><W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W><W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine"ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE"offset="29">decât</W></NP><NP HEADID="11.9" ID="1" ref="1"><W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof"MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W><NP HEADID="11.10" ID="2" ref="2"><W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport"MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W><W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W><NP HEADID="11.12" ID="3" re f="3"><W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12"LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common"offset="53">platformă</W></NP></NP></NP></DOCUMENT>

Page 20: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

4. Process phases: POS-tagger & NER & Anaphora Resolution

Fig. 3 The interface of the EAT system

Page 21: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

4. Applications – business and government

- 46 rules for values.  <rule>

<word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS“ value=”ADJECTIVE”/>

</rule>

Ex: cel mai bun

<rule><word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS” value=”bun”/>

</rule>

Page 22: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

4. Applications – review sites

- to appreciate the reviews and ratings about your company or yourself;- to summarize reviews.

Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013]

6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter.

- we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.

Page 23: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

4. Applications – politics/sociology

Two dimensions in politics:1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008];2. to clarify the politicians’ positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b]

In sociology:- how ideas and innovations are propagated [Rosen, 1974]Ex: the polls on different issues

Page 24: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

CONCLUSIONS AND DISCUSSIONS

SA - a complex task;SA - an emerging discipline with promising academic and, most important, industrial applications;....the sentiment classification problem - more challenging

Future work...

- to develop an independent sentiment classifier using machine learning methods;- to compare the results obtained with machine learning to sentiment classification on traditional topic-based categorization;- to analyse the sentiment lexicon in old Romanian language in terms of diachronic semantics.

Page 25: “Alexandru Ioan Cuza” of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI NATURAL Daniela GÎFU Iai 09 Oct. 2014.

Thank you for your attention!

?