Top Banner
Towards Social Imagematics: sentiment analysis in social multimedia Quanzeng You Department of Computer Science University of Rochester Rochester, NY 14627 [email protected] Jiebo Luo Department of Computer Science University of Rochester Rochester, NY 14627 [email protected] ABSTRACT Online social networks have attracted attention of people from both the academia and real world. In particular, the rich multimedia information accumulated in recent years provides an easy and convenient way for more active com- munication between people. This offers an opportunity to research people’s behaviors and activities based on those multimedia content, which can be considered as social im- agematics. One emerging area is driven by the fact that these massive multimedia data contain people’s daily senti- ments and opinions. However, existing sentiment analysis typically only pays attention to the textual information re- gardless of the visual content, which may be more informa- tive in expressing people’s sentiments and opinions. In this paper, we attempt to analyze the online sentiment changes of social media users using both the textual and visual con- tent. In particular, we analyze the sentiment changes of Twitter users using both textual and visual features. An empirical study of real Twitter data sets indicates that the sentiments expressed in textual content and visual content are correlated. The preliminary results in this paper give insight into the important role of visual content in online social media. Categories and Subject Descriptors H.2.8 [Database management]: Database Applications; H.3.1 [Information Storage and Retrieval]: Content Analysis and Retrieval; I.5.4 [Pattern Recognition]: Ap- plications General Terms Algorithms, Experimentation, Application Keywords Sentiment Analysis, Twitter, Social Multimedia, Social Cor- relation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDMKDD ’13, August 11, 2013, Chicago, Illinois, U.S.A. Copyright 2013 ACM 978-1-4503-2333-8 ...$15.00. 1. INTRODUCTION Twitter is one of the most influential social networks across the world. Research work of different topics related to Twit- ter has been published in different conference venues. The large amount of daily generated user content attracted many researchers around the world to analyze potential interested patterns in social media, including prediction of political election, sentiment analysis, information diffusion, topic trend and e.g. However, it should be noticed that at the begin- ning, Twitter as a social platform only allows a maximum of 140 characters to compose user’s messages. However, things have changed in 2011, when Twitter allowed online users to post images in their tweets. In this paper, we denote the tweets contain images as image tweets. The impact of supporting for image tweets is tremendous. This paper will focus on one particular impact of image tweets, namely the impact on sentiment analysis. Multimedia content, like images, are more likely to ex- press and convey people’s subtle feelings compared with text information. With the popularity of smart phones and con- venient social media APPs, more and more people are likely to post image tweets to attract attention from other users in Twitter. Figure 1 shows an example of an image tweet, where there is an big picture conveys more information of the Tweet. Figure 1: Example of an image tweets, where the left image shows a picture of Justin Bieber and the right image shows the ejection of Noah during the NBA playoffs. One of the most interesting aspects of Twitter is that peo- ple’s sentiment in Twitter seems to be related with real so- cial life. For instance, in [5], the authors found that the sentiment changes of Twitter users are closely related to the overall economy situations in U.S. and the stock market.
8

Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

Mar 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

Towards Social Imagematics: sentiment analysis in socialmultimedia

Quanzeng YouDepartment of Computer Science

University of RochesterRochester, NY 14627

[email protected]

Jiebo LuoDepartment of Computer Science

University of RochesterRochester, NY 14627

[email protected]

ABSTRACTOnline social networks have attracted attention of peoplefrom both the academia and real world. In particular, therich multimedia information accumulated in recent yearsprovides an easy and convenient way for more active com-munication between people. This offers an opportunity toresearch people’s behaviors and activities based on thosemultimedia content, which can be considered as social im-agematics. One emerging area is driven by the fact thatthese massive multimedia data contain people’s daily senti-ments and opinions. However, existing sentiment analysistypically only pays attention to the textual information re-gardless of the visual content, which may be more informa-tive in expressing people’s sentiments and opinions. In thispaper, we attempt to analyze the online sentiment changesof social media users using both the textual and visual con-tent. In particular, we analyze the sentiment changes ofTwitter users using both textual and visual features. Anempirical study of real Twitter data sets indicates that thesentiments expressed in textual content and visual contentare correlated. The preliminary results in this paper giveinsight into the important role of visual content in onlinesocial media.

Categories and Subject DescriptorsH.2.8 [Database management]: Database Applications;H.3.1 [Information Storage and Retrieval]: ContentAnalysis and Retrieval; I.5.4 [Pattern Recognition]: Ap-plications

General TermsAlgorithms, Experimentation, Application

KeywordsSentiment Analysis, Twitter, Social Multimedia, Social Cor-relation

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.MDMKDD ’13, August 11, 2013, Chicago, Illinois, U.S.A.Copyright 2013 ACM 978-1-4503-2333-8 ...$15.00.

1. INTRODUCTIONTwitter is one of the most influential social networks across

the world. Research work of different topics related to Twit-ter has been published in different conference venues. Thelarge amount of daily generated user content attracted manyresearchers around the world to analyze potential interestedpatterns in social media, including prediction of politicalelection, sentiment analysis, information diffusion, topic trendand e.g. However, it should be noticed that at the begin-ning, Twitter as a social platform only allows a maximum of140 characters to compose user’s messages. However, thingshave changed in 2011, when Twitter allowed online usersto post images in their tweets. In this paper, we denotethe tweets contain images as image tweets. The impact ofsupporting for image tweets is tremendous. This paper willfocus on one particular impact of image tweets, namely theimpact on sentiment analysis.

Multimedia content, like images, are more likely to ex-press and convey people’s subtle feelings compared with textinformation. With the popularity of smart phones and con-venient social media APPs, more and more people are likelyto post image tweets to attract attention from other usersin Twitter. Figure 1 shows an example of an image tweet,where there is an big picture conveys more information ofthe Tweet.

Figure 1: Example of an image tweets, where theleft image shows a picture of Justin Bieber and theright image shows the ejection of Noah during theNBA playoffs.

One of the most interesting aspects of Twitter is that peo-ple’s sentiment in Twitter seems to be related with real so-cial life. For instance, in [5], the authors found that thesentiment changes of Twitter users are closely related to theoverall economy situations in U.S. and the stock market.

Page 2: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

However, most research on sentiment changes are related tothe overall text tweets. Little attention has been paid tothe analysis of image tweets. The work in this paper is anattempt for the analysis of sentiment conveyed in the mul-timedia content in tweets. We intend to investigate socialmultimedia analysis, which we refer to as social imagematics.We conduct an empirical study on the sentiments expressedin people’s tweets, especially the impact of sentiments inimage tweets.

The paper is organized as follows. In Section 2, somerelated work on sentiment analysis will be presented anddiscussed. Next, in Section 3, the approaches and featuresused for sentiment analysis will be further discussed. Exper-iments setup, including data preparation and experimentalresults will be discussed in Section 4. Section 5 presentssome discussion on future work. We conclude the paper inSection 6.

2. RELATED WORKThere are many existing works on sentiment analysis of so-

cial media platforms. In particular, Twitter sentiment anal-ysis is one of the most popular research topics. This sectionpresents and summarizes some related work on sentimentanalysis. Most existing methods differ in terms of featuresand emphasize on the aspects of the problem. Guerra et al.[1] proposed a method to measure the bias of social mediausers toward a topic. Then, transfer learning is employed tolearn the textual features. In this way, they can build a moreaccurate classification model by using the user biases as anew feature. However, the identification of users’ bias on aparticular topic itself may be challenging. In [17], the au-thors used label propagation to use noisy labels and use thenetwork for the propagation of these labels. Their results in-dicate an improvement of accuracy over existing approaches.In [4], the authors used Twitter as a platform to predictthe language characterizes for mothers during postpartum.Their results indicate that using social media can discoverand understand the health and wellness of women followingchildbirth. Meanwhile, in [16], a method on streaming datasentiment analysis is proposed. The heart of the solutionis a training augmentation procedure. It will automaticallyincorporate new relevant messages into the training data.In [8], the authors used the social relations extracted fromtweets, and then applied graph Laplacian to form a sparseformulation. An optimization algorithm is proposed to solvethis problem. All of the proposed methods only use textualfeatures for sentiment analysis. Even though noisy labelsand network structures are also considered, however, ourapproach tries to use the image features for sentiment anal-ysis, which is another main content feature of tweets.

Meanwhile, other work related to the mining of differentaspects of social networks is also proposed. Kosinski et al.[9] analyzed the likes in facebook, and then discovered thatpeople in social media are more like to share some commoninterests with their friends and some particular community.Based on their model, they are able to predict the behaviorof the users according to his or her online social activities.

Rao et al. [13] used Bayesian models for latent attributedetection based on topic models. Goel et al. [7] used socialmedia for the browsing behavior of online users. Wong etal. [19] use online social network data to quantifying politi-cal leaning from the information extracted from tweets andretweets. Choudhury et al. [3] analyzed the sentiment or

mood characters in social media. They used valence and ac-tivation to represent moods. Their work provided validationof conceptualization of human mood.

For social media networks, the network structure itself canalso be employed for the analysis of sentiment propagation ofdifferent nodes across the network. In [11], the authors usedthe hyperlinks in the network to analyze the sentiment flowin hyperlink networks. Their results indicate that a nodeis significantly influenced by its immediate neighbors. Thestructure of information propagation graph also illustratesthe impact of different sentiment flow patterns. Similarly,users connected in social networks are more likely to holdsimilar opinions. To analyze sentiment in terms of user level,[18] employ this kind of network relationship to analyze thesentiment of a group of users over a particular topic. In [8],both the user-content and user-user relations are exploitedfor sentiment analysis. More specifically, they proposed asemi-supervised learning framework by using the networkrelations and formalized the problem into an optimizationframework. An empirical study of the proposed frameworkover two existing Twitter data sets illustrates the improvedperformance of the proposed algorithm.

3. APPROACHESAs discussed in Section 2, there are many existing works

on sentiment analysis using textual features. In this pa-per, we employ existing algorithms to analyze the sentimentof the textual tweets. For the sentiment analysis of visualfeatures, we build classifiers using low-level and mid-levelrespectively.

3.1 Textual sentiment analysisThere are many related work on sentiment analysis of

Twitter [8, 18, 17, 19]. Meanwhile, there are also manyonline services that provide easy access API to evaluate thesentiment of online tweets. Many of these tools 1 come di-rectly from the academic research. Since we are more con-cerned with image tweets and the sentiment of images, inthis paper, we directly use existing online service for thesentiment analysis of collected tweets.

In particular, we use the sentiment1402 [6]. Sentiment140is a semi-supervised machine learning approach. It exploitsemoticons as noisy labels for training data. Moreover, itprovides convenient API for the sentiment analysis of differ-ent tweets. Typically, one can send the data to the serverusing http request. The server then returns the sentimentfor each line contained in that file. The returned value inthis file contains three different values (0, 2 and 4). Here 0represents the negative sentiment and 4 represents the posi-tive sentiment and 0 means neutral. In this way, we are ableto classify the tweets into different sentiment categories.

3.2 Sentiment changes with the number of im-ages

Users in Twitter generally preferred different type of tweets.Some of the users like to post many image tweets, whilemany other users love to post traditional text tweets. Toanalyze the sentiments of users with different preferencesover image tweets, we conduct an experiment on the relation

1http://matei.org/ithink/2012/02/08/a-list-of-twitter-sentiment-analysis-tools/2http://www.sentiment140.com/

Page 3: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

Images

Image Sentiment Bank

Feature extraction

Low-level Features

classifier

Sentiment Result

Sentiment Result

Train

classifier

TrainMid-level Features

Feature extraction

Low-level Features

Mid-level Features

Figure 3: Framework of image sentiment classification using low-level and middle-level features respectively.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Proportion of Image Tweets

Pro

port

ion

of P

ositi

ve T

wee

ts

Figure 2: Relationship of proportion of image tweetsand the proportion of positive tweets.

between the proportion of image tweets and the proportionof positive tweets. We use the textual sentiment analysisin Section 3.1 to analyze the sentiments of different users.Then, the number of positive tweets over the sum of positiveand negative tweets is used to represent the proportion ofpositive sentiment.

We randomly picked about 300 users and downloadedtheir tweets using the user timeline API. Figure 2 showsthat for users who like to post many image tweets, they aremore likely to have positive sentiments. On the other hand,users with fewer proportion of image tweets, the proportionof positive sentiments among these users vary significantly.

3.3 Visual sentiment classificationImage sentiment analysis is quite challenging. As dis-

cussed in [15], the authors used the textual sentiment anal-ysis as the rough labels of the corresponding images. Then,RGB Hist and SIFT features are employed to train a classi-fier and classify the test images. Their results indicate thatthe positive and negative sentiment seems to share differentinteresting image patterns.

In our implementation, we use the image sentiment cor-pora from visual sentiment ontology3 with kind permissionfrom the authors. Then, according to the data set, wetrained two levels of classifiers. The first classifier only usesthe low-level features, which include HOG [2], GIST [12],

3http://visual-sentiment-ontology.appspot.com/

SSIM [14] and GEO-COLOR-HIST [10]. Different featureshave different advantages over different tasks [20]. HOG isgood for object and human recognition. GIST is anotherfeature designed for scene recognition. On the other hand,SSIM provides measure of invariant scene layout. Mean-while, geometric color histogram offers robust histogramfeature, which is invariant of scene layout. The low-levelfeatures can be easily extracted from the given images. Fig-ure 3 shows the framework employed for image sentimentclassification. The main component in this framework isthe low-level and middle-level image features. Accordingly,there are two classifiers. In our implementation, we chooseliblinear4 as the classifier for both levels due to its scalabil-ity in large scale learning. The first classifier is based on thelow-level features discussed above. Based on these low-levelfeatures, we also train and learn some middle level features.Middle level features are more interpretable than low-levelfeatures. In our implementation, we use the middle levelfeatures described in Table 1. For each middle level feature,we need to train a classifier, which can determine whether ornot the given image contains the corresponding middle leveldescription. By combining all the middle level features, weare able to construct a middle level features description forthe given image set. Then, a second level classifier based onthe extracted middle level features is constructed and em-ployed to classify the test images into different sentimentcategories.

For all the images contained in image tweets, we thendownload these images according to the URL contained inthe meta-data of each image tweet. Then low-level andmiddle-level features are extracted using the same proce-dure for the training images. In this way, we are able toclassify the sentiment of image tweets according to the vi-sual features of the images contained in image tweets.

4. EXPERIMENTSWe collect tweets using online Twitter API5. Twitter pro-

vides different categories of API. We mainly use the Twitterstreaming API and Twitter timeline API. In order to choosesome relatively active users, we firstly use the streaming APIto download over 19 million tweets. We chose empiricalthresholds to determine the relatively active users. To storesuch a large amount of data, we use couchdb6, a documentdatabase, to store the download tweets. Then, by analyzingthe downloaded 19 million tweets, we are able to identifythe activity levels of different online users. In this way, we

4http://www.csie.ntu.edu.tw/ cjlin/liblinear/5https://dev.twitter.com/6http://couchdb.apache.org/

Page 4: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

Table 1: Summary of the middle level features used in this study.dirt/soil matte man-made rugged scenenatural light dirty open area cluttered spacedirect sun/sunny rusty semi-enclosed area scaryelectric/indoor lighting arm enclosed area soothingaged/ orn cold far-away horizon stressfulglossy natural no horizon

0 2 4 6 8 10 12 14

0

5

10

15

20

log(retweet number)

log(

num

ber

of tw

eets

)

Data PointsFit: y = −1.308log(x) + 14.488

(a) Distribution of number of retweets

0 2 4 6 8 10 12 14

0

5

10

15

20

log(retweet number)lo

g(nu

mbe

r of

twee

ts)

Data PointsFit: y = −1.288log(x) + 12.729

(b) Distribution of number of image retweets

0 10 20 30 40 50 60 70 80 90 1000.4

0.5

0.6

0.7

0.8

0.9

1

retweet number

Cum

ulat

ive

Pro

babi

lity

Retweet distributionRetweet distribution of image tweets

(c) Probability distribution of top retweets numberbetween 1 and 100

0 100 200 300 400 500 600 700 800 900 1000

0.4

0.5

0.6

0.7

0.8

0.9

1

retweet number

Cum

ulat

ive

Pro

babi

lity

Retweet distributionRetweet distribution of image tweets

(d) Probability distribution of top retweets numberbetween 1 and 1000

Figure 4: Statistics of retweets number for all tweets and image tweets only.

identify over 8000 users.Among the 8000 users, we use the timeline API to down-

load the tweets of these relatively active users. We collectedover 20 million tweets for all the 8000 users. Next, the tweetsof these 8000 users are further analyzed. Among these 8000active Twitter users, we further filter out about 300 users,who are relatively active in posting both text and imagetweets. Then, use the URL contained in these image tweets,we collect all the users’ posted online images. After that,we got over 90,000 thousand images for about 300 Twitterusers.

In the downloaded 25 million tweets, we analyze the pro-portion of image tweets. Over the 25 million tweets, about6 million tweets are image tweets (5,988,058 / 25,580,000 =0.23). About every 1 in 4 tweets contain images in Twit-

ter. Figure 4 shows the distribution of number of retweets.Similar with many other user activities, the distribution isa power law distribution with long tail. Figure 4(a) andFigure 4(b) show that the number of image retweets sharea similar distribution, with a slightly different in the slopeof the fitted line of the log-log plot of the distribution. Ifwe further look at the cumulative probability distributionof retweets number for all tweets and image tweets only, wecan conclude from Figure 4(c) and Figure 4(d) that com-pared with images tweets, the proportion of tweets receivedsmall number of retweets takes a larger proportion than im-age tweets. This evidence also verifies the fact that imagetweets are more likely to attract online users’ attentions andare more easily to be diffused in the social network.

Page 5: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

4.1 Correlation of sentiment between imagetweets and text tweets

Table 2: Correlation Coefficients of textual senti-ment and visual sentiment (NA means not available)

user id low-level features mid-level features110914277 0.132197 0.1372981135866961 0.059131 0.108657183352499 0.038009 0.095219320657019 0.105444 0.084368341587111 NA NA606333611 0.618337 0.322811745235832 NA NA910880371 0.199853 0.023016924674300 0.297284 0.31749698005782 0.015088 0.166366

To illustrate the correlation between text and image tweets,we randomly select 10 users from the 300 users (It is pos-sible to give an overall statistical analysis. However, dueto space and time limit, that will be a future work). Weemploy the methods discussed in Section 3. The sentimentanalysis results using text and image features are shown inFigure 5 and Figure 6. In both figures, the red line repre-sents the sentiment changes of each user according to thesentiment analysis of using text tweets, while the blue linerepresents the sentiment changes of each user according tothe sentiment analysis of image tweets. The blue lines inthe left column give the sentiment analysis using low-levelimage features, while the blue lines in the right column givethe sentiment analysis using middle level image features. InFigure 5, we average the long-term sentiment for each userin terms of day, which means that each point represent theaverage sentiment score for a user. Similarly, in Figure 6,the sentiment is averaged in terms of one hour.

Table 2 shows the correlation coefficients between senti-ment of the selected users using text features and image fea-tures. Even though there is noise in the prediction of user’ssentiment, the results indicate that there is still positive cor-relation between the sentiment expressed in text tweets andimage tweets. In particular, for user 606333611, the senti-ments are highly correlated. The reasons for this may in-clude two aspects. First, we see this user is a relatively moreactive user. This can be reflected by the date in the x-axisof the figure. Since Twitter only allows us to download upto 3200 of a user’s most recent statuses, therefore, this userposted many tweets in a relatively short period. Secondthere is no negative sentiment predicted by the text tweets.At the sametime, for some users, they only have positivesentiment (there is no negative and neutral sentiment), thusthe correlation is unavailable. However, overall we see thatsentiment classification using middle level features seem tobe more correlated with sentiment of using text tweets.

4.2 Correlation of sentiment in a shorter pe-riod

The above results are averaged in terms of a day. Thismay not reflect people’s sentiment fluctuation in a partic-ular day. In this section, we average the short-term senti-ment of a user in terms of an hour. The results are shownin Figure 6. The results indicate that different users have

different sentiment change patterns. Some user are morelikely to have emotional fluctuation in terms of both textand image tweets. For some users, their sentiment changesare reflected by text tweets. Meanwhile, some users are morelikely to post images to express his or her sentiment changes.There is a correlation between the sentiment changes for therandomly selected 10 users. Table 3 shows the correlationcoefficients for the 40 most recent periods. Different from

Table 3: Correlation Coefficients of textual senti-ment and visual sentiment for recent 40 periods.

user id low-level features mid-level features110914277 0.17615 0.1320651135866961 0.172788 0.172788183352499 0.075004 -0.197358320657019 0.226449 0.212064341587111 0.150699 0.221518606333611 0.398337 0.065079745235832 0.089547 0.006048910880371 -0.071518 -0.244712924674300 0.245525 0.25258598005782 -0.127538 -0.027864

the results in terms of days, in this case some of the cor-relation coefficients are negative. However, for most of theusers, the correlation coefficients are mostly positive. Theresults of using low-level visual features and middle level vi-sual features are not consistent all the time. The results onone hand indicate the difficulty in image sentiment analysis.On the other hand, it also illustrates the different patternsof online users in expressing their sentiment.

5. DISCUSSION AND FUTURE WORKThe results in this paper are preliminary. Even though the

sentiment results contain some noise, in particular imagesentiment classification is quite challenging, we could stillobtain some insight into the impact of image tweets on users’sentiment changes. Some users are more likely to expresstheir sentiment using image tweets, while, some users arestill more likely to express their sentiment using text tweets.This reveals the challenges in predicting sentiment of onlinesocial network users. The results in this paper do give someinspire in using the multimedia information for social mediaanalysis.

To give more reliable and more accurate analysis of the im-pact of image tweets, there are still much more work needed.First, the time constraints need to be taken into considera-tion. People’s sentiments are more likely to fluctuate withtime. For a too much active user or a too silent user, it iseven hard for human being themselves to determine the sen-timent of that user, so we need to focus on those users whotweet regularly and normally. Second, the replies of eachtweet can also reveal more on the sentiment and influence ofa particular tweet. However, due to the restrictions on theAPI, it is hard to get all the replies of a particular tweet.In the future work, we will try to collect the replies of aparticular tweet. By using these replies as additional infor-mation, we are more likely to better predict the sentiment ofa particular tweet. Third, since different users prefer differ-ent ways to express their sentiment, it is likely to give moreaccurate and more robust sentiment classification results by

Page 6: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

using both the textual and visual features for a particularuser. Lastly, people are more likely to be influenced by his orher friends. This also true in social networks. So the networkstructure can also be employed to classify the sentiment ofa particular user. More interestingly, the textual sentimentcan be a kind of auxiliary information for the prediction ofonline images.

6. CONCLUSIONSSentiment analysis is quite challenging for social multi-

media. The short text property of tweets impose more chal-lenges on this task. The results in this paper indicate thatboth the textual and visual features are informative in deter-mining one’s sentiment. We discover the correlation betweenthe sentiment expressed by text tweets and image tweets. Atthe same time, different users also reveal different behaviorpatterns in online social networks. To develop more effectivesentiment analysis algorithm for online social network users,we should take both factors into consideration. Even thoughthe results do indicate some kind of relations between imagetweets and textual tweets, to get more robust and more in-terpretable results, we need more features and more robustdata for the discovery of influence of multimedia content inthe social network. The sentiment analyses of images arestill not mature. This, on the other hand, indicates thatwe have a great opportunity in this area to discover moreeffective and more interesting stories in this area.

7. REFERENCES[1] P. H. Calais Guerra, A. Veloso, W. Meira Jr, and

V. Almeida. From bias to opinion: a transfer-learningapproach to real-time sentiment analysis. InProceedings of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining,pages 150–158. ACM, 2011.

[2] N. Dalal and B. Triggs. Histograms of orientedgradients for human detection. In Computer Visionand Pattern Recognition, 2005. CVPR 2005. IEEEComputer Society Conference on, volume 1, pages886–893. IEEE, 2005.

[3] M. De Choudhury, S. Counts, and M. Gamon. Not allmoods are created equal! exploring human emotionalstates in social media. In Sixth International AAAIConference on Weblogs and Social Media, 2012.

[4] M. De Choudhury, S. Counts, and E. Horvitz. Majorlife changes and behavioral markers in social media:case of childbirth. In Proceedings of the 2013conference on Computer supported cooperative work,pages 1431–1442. ACM, 2013.

[5] E. Gilbert and K. Karahalios. Widespread worry andthe stock market. In Proceedings of the internationalconference on weblogs and social media, volume 2,pages 229–247, 2010.

[6] A. Go, R. Bhayani, and L. Huang. Twitter sentimentclassification using distant supervision. TechnicalReport, Stanford, pages 1–12, 2009.

[7] S. Goel, J. M. Hofman, and M. I. Sirer. Who doeswhat on the web: A large-scale study of browsingbehavior. In Proceedings of the 6th InternationalAAAI Conference on Weblogs and Social Media (June2012), 2012.

[8] X. Hu, L. Tang, J. Tang, and H. Liu. Exploiting socialrelations for sentiment analysis in microblogging. InProceedings of the sixth ACM international conferenceon Web search and data mining, pages 537–546. ACM,2013.

[9] M. Kosinski, D. Stillwell, and T. Graepel. Privatetraits and attributes are predictable from digitalrecords of human behavior. Proceedings of theNational Academy of Sciences, 2013.

[10] J.-F. Lalonde, D. Hoiem, A. A. Efros, C. Rother,J. Winn, and A. Criminisi. Photo clip art. In ACMTransactions on Graphics (TOG), volume 26, page 3.ACM, 2007.

[11] M. Miller, C. Sathi, D. Wiesenthal, J. Leskovec, andC. Potts. Sentiment flow through hyperlink networks.Proc. AAAI CWSM, 2011.

[12] A. Oliva and A. Torralba. Modeling the shape of thescene: A holistic representation of the spatialenvelope. International journal of computer vision,42(3):145–175, 2001.

[13] D. Rao, M. Paul, C. Fink, D. Yarowsky, T. Oates, andG. Coppersmith. Hierarchical bayesian models forlatent attribute detection in social media. Proc.ICWSM, pages 598–601, 2011.

[14] E. Shechtman and M. Irani. Matching localself-similarities across images and videos. In ComputerVision and Pattern Recognition, 2007. CVPR’07.IEEE Conference on, pages 1–8. IEEE, 2007.

[15] S. Siersdorfer, E. Minack, F. Deng, and J. Hare.Analyzing and predicting sentiment of images on thesocial web. In Proceedings of the internationalconference on Multimedia, pages 715–718. ACM, 2010.

[16] I. S. Silva, J. Gomide, A. Veloso, W. Meira Jr, andR. Ferreira. Effective sentiment stream analysis withself-augmenting training and demand-drivenprojection. In Proceedings of the 34th internationalACM SIGIR conference on Research and developmentin Information Retrieval, pages 475–484. ACM, 2011.

[17] M. Speriosu, N. Sudan, S. Upadhyay, andJ. Baldridge. Twitter polarity classification with labelpropagation over lexical links and the follower graph.In Proceedings of the First Workshop on UnsupervisedLearning in NLP, pages 53–63. Association forComputational Linguistics, 2011.

[18] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, and P. Li.User-level sentiment analysis incorporating socialnetworks. arXiv preprint arXiv:1109.6018, 2011.

[19] F. M. F. Wong, C. W. Tan, S. Sen, and M. Chiang.Quantifying political leaning from tweets and retweets.2013.

[20] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, andA. Torralba. Sun database: Large-scale scenerecognition from abbey to zoo. In Computer visionand pattern recognition (CVPR), 2010 IEEEconference on, pages 3485–3492. IEEE, 2010.

Page 7: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

02/09/13 02/19/13 03/01/13 03/13/13 03/23/13 04/02/13 04/12/13 04/22/13 05/02/13 05/12/13

0

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(a) 11091427702/09/13 02/19/13 03/01/13 03/13/13 03/23/13 04/02/13 04/12/13 04/22/13 05/02/13 05/12/13

0

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(b) 110914277

02/07/13 02/28/13 03/05/13 03/10/13 03/15/13 03/20/13 03/29/13 04/05/13 05/15/130

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(c) 113586696102/07/13 02/28/13 03/05/13 03/10/13 03/15/13 03/20/13 03/29/13 04/05/13 05/15/13

0

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(d) 1135866961

03/04/13 03/23/13 04/02/13 04/12/13 04/22/13 05/02/13 05/12/13

−0.2

0

0.2

0.4

Sen

timen

t Sco

re

(e) 18335249903/04/13 03/23/13 04/02/13 04/12/13 04/22/13 05/02/13 05/12/13

−0.2

0

0.2

0.4

Sen

timen

t Sco

re

(f) 183352499

03/04/13 04/01/13 04/11/13 04/21/13 05/03/13 05/14/13−0.5

0

0.5

1

Sen

timen

t Sco

re

(g) 32065701903/04/13 04/01/13 04/11/13 04/21/13 05/03/13 05/14/13

−0.5

0

0.5

1

Sen

timen

t Sco

re

(h) 320657019

03/04/13 04/06/13 04/11/13 04/18/13 04/23/13 04/28/13 05/03/13 05/08/13 05/13/130

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(i) 34158711103/04/13 04/06/13 04/11/13 04/18/13 04/23/13 04/28/13 05/03/13 05/08/13 05/13/13

0

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(j) 341587111

03/26/13 04/03/13 04/21/13 04/28/13 05/09/13

0

0.1

0.2

0.3

Sen

timen

t Sco

re

(k) 60633361103/26/13 04/03/13 04/21/13 04/28/13 05/09/13

0

0.1

0.2

0.3

Sen

timen

t Sco

re

(l) 606333611

03/03/13 04/09/13 04/14/13 04/19/13 04/24/13 04/29/13 05/04/13 05/09/13 05/14/13

0

0.2

0.4

0.6

Sen

timen

t Sco

re

(m) 74523583203/03/13 04/09/13 04/14/13 04/19/13 04/24/13 04/29/13 05/04/13 05/09/13 05/14/13

0

0.2

0.4

0.6

Sen

timen

t Sco

re

(n) 745235832

11/13/12 02/03/13 02/20/13 03/03/13 04/01/13 04/11/13 04/21/13 05/01/13 05/11/13−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(o) 91088037111/13/12 02/03/13 02/20/13 03/03/13 04/01/13 04/11/13 04/21/13 05/01/13 05/11/13

−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(p) 910880371

11/17/12 12/04/12 02/05/13 03/25/13 04/04/13 04/14/13 04/25/13 05/05/13 11/17/12

−0.2

0

0.2

0.4

0.6

0.8

Sen

timen

t Sco

re

(q) 92467430011/17/12 12/04/12 02/05/13 03/25/13 04/04/13 04/14/13 04/25/13 05/05/13 11/17/12

−1

−0.5

0

0.5

Sen

timen

t Sco

re

(r) 924674300

02/23/13 03/01/13 03/29/13 04/03/13 04/08/13 04/13/13 04/19/13 04/24/13 04/29/13 05/04/13 05/10/13 02/23/13

−0.2

0

0.2

0.4

0.6

0.8

Sen

timen

t Sco

re

(s) 9800578202/23/13 03/01/13 03/29/13 04/03/13 04/08/13 04/13/13 04/19/13 04/24/13 04/29/13 05/04/13 05/10/13 02/23/13

−0.2

0

0.2

0.4

0.6

0.8

Sen

timen

t Sco

re

(t) 98005782

Figure 5: Long-term sentiment changes of tweets and images tweets using low-level and mid-level features.The red line represent the sentiment of each user using the textual features and blue line represents thesentiment of each user using the visual features from the image tweets.

Page 8: Towards Social Imagematics: sentiment analysis in social ...chbrown.github.io/kdd-2013-usb/workshops/MDMKDD/doc/mdmkdd2013_submission_3.pdfon sentiment analysis using textual features.

05/09/13 14:59 05/11/13 08:59 05/11/13 14:59 05/12/13 08:59 05/12/13 13:59 05/13/13 15:59 05/14/13 14:59 05/15/13 14:59

−0.5

0

0.5

1

Sen

timen

t Sco

re

(a) 110914277

05/09/13 14:59 05/11/13 08:59 05/11/13 14:59 05/12/13 08:59 05/12/13 13:59 05/13/13 15:59 05/14/13 14:59 05/15/13 14:59

−0.2

0

0.2

0.4

0.6

0.8

Sen

timen

t Sco

re

(b) 110914277

03/27/13 16:59 03/29/13 18:59 03/31/13 02:59 04/03/13 22:59 04/06/13 22:59 04/07/13 04:59 04/08/13 04:59 05/15/13 19:59−1

−0.5

0

0.5

Sen

timen

t Sco

re

(c) 1135866961

03/27/13 16:59 03/29/13 18:59 03/31/13 02:59 04/03/13 22:59 04/06/13 22:59 04/07/13 04:59 04/08/13 04:59 05/15/13 19:59−1

−0.5

0

0.5

Sen

timen

t Sco

re

(d) 1135866961

05/11/13 22:59 05/12/13 12:59 05/12/13 18:59 05/13/13 07:59 05/13/13 19:59 05/14/13 15:59 05/14/13 22:59 05/15/13 20:59−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(e) 183352499

05/11/13 22:59 05/12/13 12:59 05/12/13 18:59 05/13/13 07:59 05/13/13 19:59 05/14/13 15:59 05/14/13 22:59 05/15/13 20:59−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(f) 183352499

04/27/13 09:59 04/27/13 16:59 04/30/13 14:59 05/06/13 14:59 05/07/13 15:59 05/10/13 17:59 05/13/13 12:59 05/15/13 12:59−0.5

0

0.5

1

Sen

timen

t Sco

re

(g) 320657019

04/27/13 09:59 04/27/13 16:59 04/30/13 14:59 05/06/13 14:59 05/07/13 15:59 05/10/13 17:59 05/13/13 12:59 05/15/13 12:59−0.5

0

0.5

1

Sen

timen

t Sco

re

(h) 320657019

05/13/13 18:59 05/13/13 23:59 05/14/13 04:59 05/14/13 14:59 05/14/13 19:59 05/15/13 00:59 05/15/13 14:59 05/15/13 20:59

0

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(i) 341587111

05/13/13 18:59 05/13/13 23:59 05/14/13 04:59 05/14/13 14:59 05/14/13 19:59 05/15/13 00:59 05/15/13 14:59 05/15/13 20:59

0

0.2

0.4

0.6

0.8

1

Sen

timen

t Sco

re

(j) 341587111

05/10/13 06:59 05/10/13 11:59 05/10/13 17:59 05/12/13 18:59 05/13/13 13:59 05/15/13 08:59 05/15/13 13:59 05/15/13 19:59−0.2

0

0.2

0.4

Sen

timen

t Sco

re

(k) 606333611

05/10/13 06:59 05/10/13 11:59 05/10/13 17:59 05/12/13 18:59 05/13/13 13:59 05/15/13 08:59 05/15/13 13:59 05/15/13 19:59−0.2

0

0.2

0.4

Sen

timen

t Sco

re

(l) 606333611

05/11/13 18:59 05/12/13 10:59 05/12/13 15:59 05/12/13 20:59 05/13/13 19:59 05/14/13 15:59 05/15/13 12:59 05/15/13 17:59−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(m) 745235832

05/11/13 18:59 05/12/13 10:59 05/12/13 15:59 05/12/13 20:59 05/13/13 19:59 05/14/13 15:59 05/15/13 12:59 05/15/13 17:59−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(n) 745235832

05/12/13 12:59 05/12/13 17:59 05/12/13 22:59 05/13/13 15:59 05/13/13 20:59 05/14/13 11:59 05/14/13 19:59 05/15/13 20:59−0.5

0

0.5

1

Sen

timen

t Sco

re

(o) 910880371

05/12/13 12:59 05/12/13 17:59 05/12/13 22:59 05/13/13 15:59 05/13/13 20:59 05/14/13 11:59 05/14/13 19:59 05/15/13 20:59−0.5

0

0.5

1

Sen

timen

t Sco

re

(p) 910880371

05/05/13 13:59 05/06/13 12:59 05/07/13 12:59 05/09/13 10:59 05/10/13 08:59 05/11/13 09:59 05/13/13 12:59 05/15/13 13:59

−0.2

0

0.2

0.4

0.6

0.8

Sen

timen

t Sco

re

(q) 924674300

05/05/13 13:59 05/06/13 12:59 05/07/13 12:59 05/09/13 10:59 05/10/13 08:59 05/11/13 09:59 05/13/13 12:59 05/15/13 13:59−0.5

0

0.5

1

Sen

timen

t Sco

re

(r) 924674300

05/10/13 10:59 05/11/13 03:59 05/11/13 12:59 05/12/13 13:59 05/13/13 03:59 05/14/13 03:59 05/14/13 09:59 05/15/13 06:59−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(s) 98005782

05/10/13 10:59 05/11/13 03:59 05/11/13 12:59 05/12/13 13:59 05/13/13 03:59 05/14/13 03:59 05/14/13 09:59 05/15/13 06:59−1

−0.5

0

0.5

1

Sen

timen

t Sco

re

(t) 98005782

Figure 6: Short-term sentiment of the recent 40 periods. We choose one hour in which the users postedtweets as one short periods.