Text_Analytics_use_case

Post on 22-Jan-2018

136 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

TEXT ANALYTICS

SENTIMENTAL ANALYSIS ON UNSTRUCTRED DATA

OVERVIEW

Growth of Unstructured data

Introduction to Textual analytics

Deep dive in Document classification : Sentiment analysis

Use Case : Twitter Sentiment analysis

Discussion on Business Applications

UNSTRUCTURED DATA

WHAT IS TEXT ANALYTICS

Text analytics is a broadumbrella term describingrange of technologies foranalyzing and processingunstructured text data

MAJOR AREAS OF TEXT ANALYTICS

PRACTICE AREA FOR TEXT ANALYTICS

SENTIMENT ANALYSIS

Sentiment analysis is a field of study for analyzingpeople ‘s opinions , emotions attitudes andsentiments from a written language

People review , comment about all sorts ofdifferent topics . Such allows for tracking attitudesand feelings on the social media

Products , Brands and people can be tracked anddetermine if they are viewed positively ornegatively

SENTIMENT ANALYSISHILARY CLINTON LEAKED EMAILS

• Clinton during her time as secretary of state choose to use a private account for Sending official emails

• July 2015 :- Clinton staff handed over 30,000 emails to state department

• The emails were released by state department as PDF’s.

• Kaggle hosts the cleaned version of the released documents The leaked emails are hosted on kaggle in

a form of SQLite database

• Twitter sentiment analysis is performed to abstract social emotions for Hilary in regards to her email

controversy

ACQUIRING THE DATA

R makes it possible to easily pull information related to twitter through the API. library(twitteR)

DATA PROCESSING

Load Text Transform Text Text Corpus

DATA PROCESSINGData Processing Step Brief Description

Explore Corpus through Exploratory

Data Analysis

Understand the types of variables, their functions, permissible values, and so

on. Some formats including html and xml contain tags and other data

structures that provide more metadata

Convert text to lowercase This is to avoid distinguish between words simply on case.

Remove Number(if required) Numbers may or may not be relevant to our analyses

Remove Punctuations Punctuation can provide grammatical context which supports understanding.

Often for initial analyses we ignore the punctuation.

Remove English stop words Stop words are common words found in a language. Words like for, of, are,

etc are common stop words

Remove Own stop words(if required) Along with English stop words, we could instead or in addition remove our

own stop words

Strip whitespace Eliminate extra white-spaces. Any additional space that is not the space that

occur within the sentence or between words

Stemming Stemming uses an algorithm that removes common word endings for English

words, such as “es”, “ed” and “'s”. Example, "computer" & "computers"

become "comput"

Lemmatization Transform to dictionary base form i.e., "produce" & "produced" become

"produce"

DATA VISUALIZATION - WORD CLOUD

A word cloud is a valuable toolused to present a visual image ofthe magnitude of individual termsused throughout the overallpostings

‘Word Cloud1’ depicts the mostcommon words from tweetsmentioning the ‘Clinton emailscandals’ . ‘Word Cloud2’ showsthe most common words from thetweets fetched as a result ofsearch on “Clinton” + “Palin”.

TWITTER SENTIMENT ANALYSIS

Using ‘syuzhet’ package(get_nrc_sentiment function ) toapply sentiment analysis to eachemail which scores text on theattributes: anger, anticipation,disgust, fear, joy, sadness,surprise, and trust

People sentiments were analysedin relation to her email scandalwhich were mostly negative aspeople believed that she hasmisused the position and wasn’tcapable of handling technology

BUSINESS APPLICATIONS

Awareness of customer preferences

Are your product features the best ?

BUSINESS APPLICATIONS

Are your competitors better than you?

Is there an influential customer?