Top Banner
1 Jose Chinchilla MCITP/MCSE: Database Administrator, SQL Server MCITP/MCSE: Business Intelligence SQL Server Current Positions: President, Agile Bay, Inc. President, Tampa Bay Business Intelligence User Group Regional Mentor, PASS LATAM Blog: http://www.sqljoe.com Twitter: @sqljoe Linked-in: http://www.linkedin.com/in/josechinchilla Email: [email protected] Customers & Partners
29

Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Jun 17, 2018

Download

Documents

dokhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

1

Jose Chinchilla MCITP/MCSE: Database Administrator, SQL ServerMCITP/MCSE: Business Intelligence SQL Server

Current Positions:President, Agile Bay, Inc.President, Tampa Bay Business Intelligence User GroupRegional Mentor, PASS LATAM

Blog: http://www.sqljoe.comTwitter: @sqljoeLinked-in: http://www.linkedin.com/in/josechinchillaEmail: [email protected]

Customers & Partners

Page 2: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Agenda

1. Top 5  Client Use Cases of Big Data2. Overview of Big Data Ecosystem 3. Overview of  Sentiment Analysis4. Demo: Sentiment Analysis using Twitter 

Page 3: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Top 5 Client Use Cases of Big Data

Page 4: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Top 5 Client Use Cases1) Modern Data Warehouse• Enterprise Data Warehouse Hadoop integration• Long term data staging and archiving

2) Sentiment Analysis• Opinion Mining• Twitter, Facebook, Google+, Yelp,  UrbanSpoon, TripAdvisor

3) Market Basket Analysis• Product Affinity Analysis• Recommendation Engine

4) Clickstream Analysis• Website visitor behavior• Click patterns

5) Risk Analysis• Consumer behavior• Fraud detection

Page 5: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Big Data Ecosystem

Page 6: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Big Data Ecosystem 

Commercial Distributions• Microsoft• Horton Works• Cloudera

• Amazon Web Services• Greenplum• Talend

• MapR• Intel• IBM

Apache Hadoop • Open‐source• http://hadoop.apache.org

Page 7: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Big Data Ecosystem 

The Zoo

• Oozie• Flume• Sqoop• Hive• Pig• Falcon

• Mahout• Impala• Cheetah• Giraph• Stinger• Phoenix

Page 8: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Big Data Ecosystem 

The Zoo

• Oozie• Flume• Sqoop• Hive• Pig• Falcon

• Mahout• Impala• Cheetah• Giraph• Stinger• Phoenix

Page 9: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Sentiment Analysis Overview

Page 10: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

“Everything we hear is an opinion, not a fact. Everything we see is a perspective, not the truth.”

―Marcus Aurelius

Page 11: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

What is Sentiment Analysis

The #TMNT movie was great! Highly recommend.

Watching the #TMNT movie with @SQLJoe

That #TMNT movie was a waste of time. #fail

or

Page 12: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Sentiment Analysis 101

• Feelings• Emotions• Attitudes• Opinions• Judgments• Orientation• Polarity

Page 13: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Sentiment Analysis 101

• What are my customers saying about my products and services?• Are customers talking positively or negatively about my products and 

services?• What other brands are people talking about positively or negatively?• Who is influencing the public opinion or perception about my products 

and services?

Page 14: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Sentiment Analysis 101• Opinion mining• Emotion analysis• Opinion extraction• Sentiment detection• Sentiment categorization

• Sentiment classification• Sentiment polarity• Judgment analysis• Subjectivity analysis

Page 15: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Sentiment Analysis 101: Process

Page 16: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Sentiment Analysis 101

What about?• LOL• OMG• #FAIL• #AWESOMESAUCE• :(• :)

image courtesy of Twittonary.com

Page 17: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Sentiment Analysis: Hadoop and Twitter data #TMNT

Page 18: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Twitter Demo: Steps

1) You will tweet using #tmnt2) Extract tweets containing #tmnt hashtag via Flume job3) Stage tweets in text files in the TMNT folder in HDFS 

(1 file each 90 secs or every 1000 tweets)4) Load tweets into Hcatalog (cloudera JSON SerDe)5) Break down tweets into sentences6) Break down tweets into words7) Lookup each word in lexical dictionary to get polarity value (1,0,‐1)8) Add polarity value for each word and get overall tweet  polarity

Positive = > 0, Neutral = 0, Negative < 0

Page 19: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Twitter Demo: Example

The movie was great! Highly recommend #TMNT

TweetID LineNum Text_____________________________100 1 The movie was great!100 2 Highly recommend!

TweetID LineNum WordNum__ Word_____100 1 1 The100 1 2 movie100 1 3 was100 1 4 great100 2 1 Highly100 2 2 recommend

Page 20: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Twitter Demo: ExampleTweetID LineNum WordNum__ Word_____100 1 1 The100 1 2 movie100 1 3 was100 1 4 great100 2 1 Highly100 2 2 recommend

Word_____ Polarity Valuegood positive +1awesome positive +1best positive +1great positive +1nice positive +1highly positive +1excellent positive +1awful negative -1bad negative -1worse negative -1recommend neutral 0the neutral 0…

Dictionary

Page 21: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Twitter Demo: ExampleThe movie was great!0 + 0 + 0 + 1 = +1

Highly recommend!1 + 0 = +1

___________________________________________________________

= +2 (Positive)

Page 22: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

DEMO: Sentiment Analysis using Flume and Hive

Page 23: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Resources

Page 24: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Resources• Analyzing Twitter Data Using CDH

https://github.com/cloudera/cdh-twitter-example

• Anatomy of a tweethttps://media.twitter.com/best-practice/anatomy-of-a-tweet

• Hortonworks Sandbox & Tutorialshttp://hortonworks.com/sandbox/http://hortonworks.com/tutorials/

• Flume Twitter API Configurationhttp://www.crgt.com/streaming-twitter-into-the-hortonworks-data-platform-1-2/

• WordNet Lexical database (Princeton University)http://wordnet.princeton.edu/

Page 25: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Analyzing Twitter data

Twitter REST API v1.1• API Resources

https://dev.twitter.com/docs/api/1.1

• Field documentation: https://dev.twitter.com/docs/platform-objects/tweets

• Search (https://api.twitter.com/1.1/search/tweets.json) • GET search/tweets

https://dev.twitter.com/docs/api/1.1/get/search/tweets

• Using the Twitter Search APIhttps://dev.twitter.com/docs/using-search

Page 26: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Analyzing Twitter data

Twitter Streaming APIs• Public stream

https://dev.twitter.com/docs/streaming-apis/streams/public

• User streamhttps://dev.twitter.com/docs/streaming-apis/streams/user

• Site streamhttps://dev.twitter.com/docs/streaming-apis/streams/site

Page 27: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Analyzing Twitter data

Twitter Streaming APIs• Public stream

https://dev.twitter.com/docs/streaming-apis/streams/public

• Endpoints• POST statuses/filter• GET statuses/sample• GET statuses/firehose

Page 28: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Q & A ?

Page 29: Twitter Sentiment Analysis with Hadoop - TBTLA | … 5 Client Use Cases 1) Modern Data Warehouse • Enterprise Data Warehouse Hadoop integration • Long term data staging and archiving

Thank You!