Top Banner
Text Analytics Teradata & Sabanci University April, 2015
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Text Analytics  Teradata & Sabanci University  April, 2015.

Text Analytics

Teradata & Sabanci University

April, 2015

Page 2: Text Analytics  Teradata & Sabanci University  April, 2015.

2

TEXTDATA

#2 Sentiment

Analysis

#3Relation between

words

#4 Association Rules between events

#1 Topic Categorization

Text analytics scope at a glance

Page 3: Text Analytics  Teradata & Sabanci University  April, 2015.

3

Main Project Diagram

Page 4: Text Analytics  Teradata & Sabanci University  April, 2015.

4

#1 Topic Categorization

Page 5: Text Analytics  Teradata & Sabanci University  April, 2015.

5

2 Main Categories

Şikayet

Bilgi/İşlem

19 Subcategorie

s

Ürün/Servis

Kampanya/ Paket

.........

Fatura ve Yükleme

Topic Categorization

Page 6: Text Analytics  Teradata & Sabanci University  April, 2015.

6

Dimension Reduction◦ Data

◦ High in volume

◦ Noisy

◦ For each category, significant words found (by using Aster tf_idf function)

◦ Project the data into the space of these words:

◦ Decrease the noise

◦ Thus, differentiate the classes better

SAMPLE

Topic Categorization

Page 7: Text Analytics  Teradata & Sabanci University  April, 2015.

7

Main Category Classification

We applied Naive Bayes Text Classification

CE660n660l

CE660n660l

CE660n660l

2-class Classification

Correct PredictionIncorrect Prediction

Page 8: Text Analytics  Teradata & Sabanci University  April, 2015.

8

Subcategory Classification

We applied Naive Bayes Text Classification

CE660n660l

CE660n660l

CE660n660l

19-class Classification

Correct PredictionIncorrect Prediction

Page 9: Text Analytics  Teradata & Sabanci University  April, 2015.

9

2 Main Categories 19 Subcategories

Topic Categorization

Page 10: Text Analytics  Teradata & Sabanci University  April, 2015.

10

#2 Sentiment Analysis

Page 11: Text Analytics  Teradata & Sabanci University  April, 2015.

11

Sentiment Classification Model

• Whole dataset is randomly splitted as train and test sets• The model is unbiasedly evaluated on the unseen test set• Aster’s Random Forest Function is available to apply to our Turkish Sentiment Features.

Page 12: Text Analytics  Teradata & Sabanci University  April, 2015.

12

Turkish Model

Page 13: Text Analytics  Teradata & Sabanci University  April, 2015.

13

Does this call have NEGATIVE sentiment?

• Classify the call center messages into two:– Negative

– Neutral (and Positive)

• Train and test sets are different for each evaluation:– The accuracy is not dependent on a specific test set

Classification

correct misclassified

> 81

<19

Page 14: Text Analytics  Teradata & Sabanci University  April, 2015.

14

Regression Model

• Same dataset used for the sentiment classification

• Output is different; not classes, but real values

• These values scale the negativity of the customers

Page 15: Text Analytics  Teradata & Sabanci University  April, 2015.

15

Negative Sentiment Breakdown

2%

93%

5%

highmediumlow

• Threshold values were chosen by using 68–95–99.7 rule in statistics (three-sigma rule of thumb heuristic supports this).

• Our thresholds are set as mean+(-)2*standart deviation

• Used for social sciences and business

Page 16: Text Analytics  Teradata & Sabanci University  April, 2015.

16

#3 Relation Extraction

Page 17: Text Analytics  Teradata & Sabanci University  April, 2015.

17

Relations between Words

• Relation scores are calculated between words

• It is based on confidence values.– Score(a,b) = confidence(a,b)*confidence(b*a)

Words were filtered:– We selected the words with the highest tf*idf values

Words are clustered in terms of their relations

Clusters are represented in a graph Cfilter (in Aster)function has been applied for this process.

Page 18: Text Analytics  Teradata & Sabanci University  April, 2015.

18

Complaining Customers Graph

Selected Cluster

Page 19: Text Analytics  Teradata & Sabanci University  April, 2015.

19

Closer Look to the Cluster

The cluster seems to indicate the customers facing internet-related problems

• Connection problem, or• Internet package is not sufficient?

Page 20: Text Analytics  Teradata & Sabanci University  April, 2015.

20

Complaint Word Cloud (Aster Lens)

Page 21: Text Analytics  Teradata & Sabanci University  April, 2015.

21

Information&Operation Word Cloud

Page 22: Text Analytics  Teradata & Sabanci University  April, 2015.

22

Bigram Cloud (ngram function)

Page 23: Text Analytics  Teradata & Sabanci University  April, 2015.

23

#4 Association Rules between Events

Page 24: Text Analytics  Teradata & Sabanci University  April, 2015.

24

Association Rule Mining

Topic A with a Sentiment score

Topic B with a Sentiment score

Topic C

Confidence(A, B --> C)

Page 25: Text Analytics  Teradata & Sabanci University  April, 2015.

25

QUESTIONS