TEXT ANALYTICS SENTIMENTAL ANALYSIS ON UNSTRUCTRED DATA
TEXT ANALYTICS
SENTIMENTAL ANALYSIS ON UNSTRUCTRED DATA
OVERVIEW
Growth of Unstructured data
Introduction to Textual analytics
Deep dive in Document classification : Sentiment analysis
Use Case : Twitter Sentiment analysis
Discussion on Business Applications
UNSTRUCTURED DATA
WHAT IS TEXT ANALYTICS
Text analytics is a broadumbrella term describingrange of technologies foranalyzing and processingunstructured text data
MAJOR AREAS OF TEXT ANALYTICS
PRACTICE AREA FOR TEXT ANALYTICS
SENTIMENT ANALYSIS
Sentiment analysis is a field of study for analyzingpeople ‘s opinions , emotions attitudes andsentiments from a written language
People review , comment about all sorts ofdifferent topics . Such allows for tracking attitudesand feelings on the social media
Products , Brands and people can be tracked anddetermine if they are viewed positively ornegatively
SENTIMENT ANALYSISHILARY CLINTON LEAKED EMAILS
• Clinton during her time as secretary of state choose to use a private account for Sending official emails
• July 2015 :- Clinton staff handed over 30,000 emails to state department
• The emails were released by state department as PDF’s.
• Kaggle hosts the cleaned version of the released documents The leaked emails are hosted on kaggle in
a form of SQLite database
• Twitter sentiment analysis is performed to abstract social emotions for Hilary in regards to her email
controversy
ACQUIRING THE DATA
R makes it possible to easily pull information related to twitter through the API. library(twitteR)
DATA PROCESSING
Load Text Transform Text Text Corpus
DATA PROCESSINGData Processing Step Brief Description
Explore Corpus through Exploratory
Data Analysis
Understand the types of variables, their functions, permissible values, and so
on. Some formats including html and xml contain tags and other data
structures that provide more metadata
Convert text to lowercase This is to avoid distinguish between words simply on case.
Remove Number(if required) Numbers may or may not be relevant to our analyses
Remove Punctuations Punctuation can provide grammatical context which supports understanding.
Often for initial analyses we ignore the punctuation.
Remove English stop words Stop words are common words found in a language. Words like for, of, are,
etc are common stop words
Remove Own stop words(if required) Along with English stop words, we could instead or in addition remove our
own stop words
Strip whitespace Eliminate extra white-spaces. Any additional space that is not the space that
occur within the sentence or between words
Stemming Stemming uses an algorithm that removes common word endings for English
words, such as “es”, “ed” and “'s”. Example, "computer" & "computers"
become "comput"
Lemmatization Transform to dictionary base form i.e., "produce" & "produced" become
"produce"
DATA VISUALIZATION - WORD CLOUD
A word cloud is a valuable toolused to present a visual image ofthe magnitude of individual termsused throughout the overallpostings
‘Word Cloud1’ depicts the mostcommon words from tweetsmentioning the ‘Clinton emailscandals’ . ‘Word Cloud2’ showsthe most common words from thetweets fetched as a result ofsearch on “Clinton” + “Palin”.
TWITTER SENTIMENT ANALYSIS
Using ‘syuzhet’ package(get_nrc_sentiment function ) toapply sentiment analysis to eachemail which scores text on theattributes: anger, anticipation,disgust, fear, joy, sadness,surprise, and trust
People sentiments were analysedin relation to her email scandalwhich were mostly negative aspeople believed that she hasmisused the position and wasn’tcapable of handling technology
BUSINESS APPLICATIONS
Awareness of customer preferences
Are your product features the best ?
BUSINESS APPLICATIONS
Are your competitors better than you?
Is there an influential customer?