Top Banner
TEXT ANALYTICS SENTIMENTAL ANALYSIS ON UNSTRUCTRED DATA
16

Text_Analytics_use_case

Jan 22, 2018

Download

Documents

Neha Sharma
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Text_Analytics_use_case

TEXT ANALYTICS

SENTIMENTAL ANALYSIS ON UNSTRUCTRED DATA

Page 2: Text_Analytics_use_case

OVERVIEW

Growth of Unstructured data

Introduction to Textual analytics

Deep dive in Document classification : Sentiment analysis

Use Case : Twitter Sentiment analysis

Discussion on Business Applications

Page 3: Text_Analytics_use_case

UNSTRUCTURED DATA

Page 4: Text_Analytics_use_case

WHAT IS TEXT ANALYTICS

Text analytics is a broadumbrella term describingrange of technologies foranalyzing and processingunstructured text data

Page 5: Text_Analytics_use_case

MAJOR AREAS OF TEXT ANALYTICS

Page 6: Text_Analytics_use_case

PRACTICE AREA FOR TEXT ANALYTICS

Page 7: Text_Analytics_use_case

SENTIMENT ANALYSIS

Sentiment analysis is a field of study for analyzingpeople ‘s opinions , emotions attitudes andsentiments from a written language

People review , comment about all sorts ofdifferent topics . Such allows for tracking attitudesand feelings on the social media

Products , Brands and people can be tracked anddetermine if they are viewed positively ornegatively

Page 8: Text_Analytics_use_case

SENTIMENT ANALYSISHILARY CLINTON LEAKED EMAILS

• Clinton during her time as secretary of state choose to use a private account for Sending official emails

• July 2015 :- Clinton staff handed over 30,000 emails to state department

• The emails were released by state department as PDF’s.

• Kaggle hosts the cleaned version of the released documents The leaked emails are hosted on kaggle in

a form of SQLite database

• Twitter sentiment analysis is performed to abstract social emotions for Hilary in regards to her email

controversy

Page 9: Text_Analytics_use_case

ACQUIRING THE DATA

R makes it possible to easily pull information related to twitter through the API. library(twitteR)

Page 10: Text_Analytics_use_case

DATA PROCESSING

Load Text Transform Text Text Corpus

Page 11: Text_Analytics_use_case

DATA PROCESSINGData Processing Step Brief Description

Explore Corpus through Exploratory

Data Analysis

Understand the types of variables, their functions, permissible values, and so

on. Some formats including html and xml contain tags and other data

structures that provide more metadata

Convert text to lowercase This is to avoid distinguish between words simply on case.

Remove Number(if required) Numbers may or may not be relevant to our analyses

Remove Punctuations Punctuation can provide grammatical context which supports understanding.

Often for initial analyses we ignore the punctuation.

Remove English stop words Stop words are common words found in a language. Words like for, of, are,

etc are common stop words

Remove Own stop words(if required) Along with English stop words, we could instead or in addition remove our

own stop words

Strip whitespace Eliminate extra white-spaces. Any additional space that is not the space that

occur within the sentence or between words

Stemming Stemming uses an algorithm that removes common word endings for English

words, such as “es”, “ed” and “'s”. Example, "computer" & "computers"

become "comput"

Lemmatization Transform to dictionary base form i.e., "produce" & "produced" become

"produce"

Page 12: Text_Analytics_use_case

DATA VISUALIZATION - WORD CLOUD

A word cloud is a valuable toolused to present a visual image ofthe magnitude of individual termsused throughout the overallpostings

‘Word Cloud1’ depicts the mostcommon words from tweetsmentioning the ‘Clinton emailscandals’ . ‘Word Cloud2’ showsthe most common words from thetweets fetched as a result ofsearch on “Clinton” + “Palin”.

Page 13: Text_Analytics_use_case

TWITTER SENTIMENT ANALYSIS

Using ‘syuzhet’ package(get_nrc_sentiment function ) toapply sentiment analysis to eachemail which scores text on theattributes: anger, anticipation,disgust, fear, joy, sadness,surprise, and trust

People sentiments were analysedin relation to her email scandalwhich were mostly negative aspeople believed that she hasmisused the position and wasn’tcapable of handling technology

Page 14: Text_Analytics_use_case

BUSINESS APPLICATIONS

Awareness of customer preferences

Are your product features the best ?

Page 15: Text_Analytics_use_case

BUSINESS APPLICATIONS

Are your competitors better than you?

Is there an influential customer?

Page 16: Text_Analytics_use_case