Top Banner

of 19

COS326 Big Data Analytics Lecture19

Feb 19, 2018

Download

Documents

Adeyemi Odeneye
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/23/2019 COS326 Big Data Analytics Lecture19

    1/19

    COS 326Database Systems

    Lecture 19

    Big Data and

    Big Data Analytics (2)Notes

    14 October 2015

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    2/19

    Admin matters: next 3 weeks

    2

    Week Date Day Topic

    10 13 Oct TuesL18: Big Data Analytics

    Presentation for topic 15

    14 Oct WedL19: Big Data Analytics

    Presentation for topic 16

    16 Oct Fri No prac

    11 20 Oct Tues L20: Guest lecture: SAP

    21 Oct WedL21: Data analytics: Data mining

    Presentation for topic 1723 Oct Fri

    No prac

    project day for Computer Science

    12 27 Oct TueL22: Class test 3: data analytics

    Presentation for topic 18

    28 Oct Wed

    L23: Data analytics: Data mining

    Presentation for topics 19, 20

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    3/19

    3

    Outline

    Last lecture:1. Technologies supporting Big data storage & analytics

    MapReduce computation framework

    NoSQL big database management systems (BDMSes)

    NewSQL big database management systems (BDMSes)

    2. What types of analytics for big data?

    This lecture:Case study:

    Analysis of microblogs data: Twitter

    sentiment analysis of microblogs Pearson Education Limited 1995, 2005

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    4/19

    RECAP: on Big Data

    Sources of Big Data:Web-generated structured & unstructured data e.g.

    e-commerce purchasing histories

    social media: Face Book, Twitter, LinkedIn,YouTube etc.

    Some processing activities for big data:

    (1) descriptive analytics

    (2) predictive analytics

    e.g. sentiment analysis for microblogs (e.g. Twitter)

    4

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    5/19

    5

    Case study: Analytics for Twitter

    Twitter : http://www.twitter.com

    1. Why do people tweet?

    Notable users of Twitter:

    Pope Francis: 78.4 million followers

    Barak Obama: 640 thousand followers

    2. Format of a tweet: max 140 characters, possible inclusion of

    emoticons: smiley (:-) sad face (:-( to express sentiment

    4. Value of tweets to businesses:used by market researchers in business organisations

    (a) what are customers saying about our products & services?

    (b) what are customers saying about our competitors products &

    services? Pearson Education Limited 1995, 2005

    http://www.twitter.com/http://www.twitter.com/
  • 7/23/2019 COS326 Big Data Analytics Lecture19

    6/19

    Twitter statistics

    Twitter was launched in 2006 Twitter statistics (source Twitter, April 2010):

    106 million registered users

    180 million unique visitors every month

    300,000 new users signing up every day.

    600 million queries received daily via Twitters search engine

    3 billion requests per day based on the Twitter API.

    37% of active users used mobile phones to send requests.

    approx. 200 million tweets per day (big data)

    More recently:

    the number of regular Twitter users has been estimated at more

    than 200 million.6

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    7/19

    Twitter adoption in SA

    Adoption in South Africa:

    Businesses governments non-government

    organisations have a Twitter & Facebook presence.

    Adoption statistics for 2014

    (source: Fuseware and World Wide Worx , 2014)

    9.4 million active users of Facebook

    5.5 million users of Twitter in South Africa.

    93% of RSA major brands use Facebook

    and 79% use Twitter.

    7

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    8/19

    Twitter analytics

    Two approaches to analysis:

    (1) Online analytics:

    (i) Subscribe to a service for social media data analytics

    (ii) use service to obtain analysis reports & Twitter data

    (2) Offline analytics:

    (i) register with Twitter(ii) use Twitter APIs to obtain data & store it in a DB

    e.g. NoSQL DB

    (iii) conduct analysis on the data8

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    9/19

    Online analytics: Twitter data (1)

    (i) Subscribe to a service for social media data analytics

    (ii) use service to obtain analysis reports

    9

    Service name

    ( and purpose)

    URL &

    Examples of services provided / report types

    Sentiment140

    (sentiment

    analysis)

    URL: http://www.sentiment140.com

    Performs sentiment analysis on the tweets returned for a query

    supplied by the user. (for free)

    Twitonomy

    (get overall

    view of

    Twitter account)

    URL: http://www.twitonony.com

    Analyse a Twitter account. Provides the following for free:

    1. number of: tweets per day, mentions, retweets, favoritedtweets (for a given period)

    2. Charts showing tweet frequencies by day of the week

    and time of day

    3. platforms most tweeted from

    (e.g. Twitter for iPhone, Twitter web client)

    http://www.sentiment140.com/http://www.twitonony.com/http://www.twitonony.com/http://www.sentiment140.com/
  • 7/23/2019 COS326 Big Data Analytics Lecture19

    10/19

    4.2 Analysis of social network data: Twitter (2)

    Online tools for analysis of Twitter data

    Sentiment140: http://www.sentiment140/

    Performs sentiment analysis on the tweets returned fora query supplied by the user. (for free) e.g.

    10

    available languages

    http://www.sentiment140/http://www.sentiment140/
  • 7/23/2019 COS326 Big Data Analytics Lecture19

    11/19

    4.2 Analysis of Twitter data (3)Twitonomy URL: http://www.twitonomy.com

    Analyse a Twitter account. Provides the following for free:

    1. number of:tweets per day, mentions,

    retweets, favourited tweets

    ORSSA 2015 presentation 15

    September 201511

    http://www.twitonomy.com/http://www.twitonomy.com/
  • 7/23/2019 COS326 Big Data Analytics Lecture19

    12/19

    4.2 Analysis of social network data: Twitter (4)

    Twitonomy :Analyse a Twitter account. Provides the followingfor free:

    2. Charts showing tweet frequencies by day of the week and time of day

    ORSSA 2015 presentation 15

    September 201512

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    13/19

    4.2 Analysis of social network data: Twitter (5)Twitonomy: Analyse a Twitter account. Provides the following

    for free:

    3. platforms most tweeted from (e.g. Twitter for iPhone,

    Twitter web client)

    ORSSA 2015 presentation 15

    September 201513

    Can download

    tweets in

    MS Excel format

    for further

    analysis

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    14/19

    14

    Offline analysis of Twitter data

    Twitter: http://www.twitter.com

    (2) Offline analytics:

    (i) register with Twitter

    (ii) use Twitter APIs to obtain data & store it in a DB

    e.g. NoSQL DB(iii) conduct analysis on the data

    e.g. of analysis

    a. descriptives

    b. sentiment analysis

    c. graph mining, e.g. for community discovery

    Pearson Education Limited 1995, 2005

    http://www.twitter.com/http://www.twitter.com/http://www.twitter.com/
  • 7/23/2019 COS326 Big Data Analytics Lecture19

    15/19

    15

    Twitter: Facilities available for developers

    https://dev.twitter.com/overview/documentation

    Twitter APIs:

    (1) REST APIs

    provide programmatic access to read & write Twitter data

    responses available in JSON

    identifies Twitter applications & users using OAuth

    (2) Streaming APIs

    continuously deliver new responses to REST API queries over

    long-lived http connection receive updates on latest tweets matching a search query

    OAuth: applications send secure authorised requests Twitter APIs

    application must registered before it can access to Twitter APIs Pearson Education Limited 1995, 2005

    https://dev.twitter.com/overview/documentationhttps://dev.twitter.com/overview/documentation
  • 7/23/2019 COS326 Big Data Analytics Lecture19

    16/19

    16

    Sentiment Analysis for microblogs

    Sentiment Analysis (defined):

    Given a tweet on a topic of interest (e.g. to a market researcher): determine if the sentiment(opinion) of the tweet is:

    positive,

    negative, or

    neutral.

    the effect of one tweet may be small but the effect of many is

    significant

    Analysis methods: Use text mining methods to create predictive (classification) model

    to classify tweets as (+ve, -ve, neutral) sentiment

    Traditionallytext mining has been used for document

    classification Pearson Education Limited 1995, 2005

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    17/19

    17

    Sentiment Analysis for microblogs

    Using a predictive model to classify tweets

    Pearson Education Limited 1995, 2005

    tweets

    to be

    classified

    +ve

    sentiment

    tweets

    neutral

    sentiment

    tweets

    -ve

    sentiment

    tweets

    Predictive

    ( classification)

    model

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    18/19

    Essay presentation

    Topic topic 16

    18

  • 7/23/2019 COS326 Big Data Analytics Lecture19

    19/19

    References1. IBM Global Business Services (2012) Analytics: the real-world use of big data

    how innovative enterprises extract value from uncertain data, IBM Institutefor Business value.

    2. Moniruzzaman, A.B.M. & Hossain, S.A. (2013) NoSQL database: new era of

    databases for big data analyticsclassification, characteristics and

    comparison. International Journal of Database Theory and Application, vol. 6,

    no. 4, 2013.

    3. Wakade, S., Shekar, C., Liszka, K. J. and Chan, C.-C., 2012, Text Mining for

    Sentiment Analysis of Twitter Data, International Conference on Information

    and Knowledge Engineering, (IKE'12), pp. 109-114.

    19