Top Banner
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 199–211 Online, November 20, 2020. c 2020 Association for Computational Linguistics https://doi.org/10.18653/v1/P17 199 Emoji and Self-Identity in Twitter Bios Jinhang Li, * Giorgos Longinos, * Steven R. Wilson and Walid Magdy School of Informatics The University of Edinburgh Edinburgh, United Kingdom {j.li-183,g.longinos}@sms.ed.ac.uk [email protected], [email protected] Abstract Emoji are widely used to express emotions and concepts on social media, and prior work has shown that users’ choice of emoji reflects the way that they wish to present themselves to the world. Emoji usage is typically studied in the context of posts made by users, and this view has provided important insights into phe- nomena such as emotional expression and self- representation. In addition to making posts, however, social media platforms like Twitter allow for users to provide a short bio, which is an opportunity to briefly describe their ac- count as a whole. In this work, we focus on the use of emoji in these bio statements. We explore the ways in which users include emoji in these self-descriptions, finding different pat- terns than those observed around emoji usage in tweets. We examine the relationships be- tween emoji used in bios and the content of users’ tweets, showing that the topics and even the average sentiment of tweets varies for users with different emoji in their bios. Lastly, we confirm that homophily effects exist with re- spect to the types of emoji that are included in bios of users and their followers. 1 Introduction With the rise of social media usage and online text- based communication, emoji, a simple but power- fully expressive set of visual characters (Danesi, 2016), have become a hugely popular means to ex- press emotions, moods, and feelings over computer- mediated communication (Kelly and Watts, 2015). In the era of big data, with more and more people engaging with social media, researchers have be- gun to study the ways in which social media users include emoji in their posts, finding that emoji usage is associated with things like personality (Li et al., 2018), culture (Guntuku et al., 2019), * Authors contributed equally. and socio-geographical differences (Barbieri et al., 2016). Prior work has typically focused on how people use emoji within the posts that they make online (Ljubeˇ si ´ c and Fi ˇ ser, 2016; Robertson et al., 2018), or the way that they can be used as reactions to other content (Tian et al., 2017). However, emoji are also commonly used within user’s self-created profiles. In this work, we specifically examine the inclusion of emoji in Twitter bios, which are short (160 characters maximum) texts describing a Twit- ter account. These bios are featured prominently on a user’s profile page, and given their limited length, users often use this space succinctly express the es- sential information about their accounts. Therefore, we expect that the choice of emoji used in these bios will have a strong connection to a user’s online self-identity, or the way that they seek to portray themselves to others on a social media platform. The goal of this paper is to give an overview of how emoji are used in Twitter bios from a com- putational linguistics perspective, that is, we treat emoji as a special category of tokens and make use of natural language processing methods to un- derstand the major trends in the ways that people use emoji in their bios and what this says about both the things they tweet about and their follower network. Our results provides insights into the va- riety of ways in which people choose to present themselves online in their Twitter bios that may be overlooked when only considering non-emoji word tokens or only considering the ways that people use emoji in the content of tweets. More specifically, we ask, and subsequently describe the work done to answer, the following research questions: RQ1. How are emoji used in Twitter bios? As a first step, we seek to characterize the ways in which users use emoji in their bios. We look at the types of emoji that most commonly used in Twitter bios,
13

Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

Jan 20, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 199–211Online, November 20, 2020. c©2020 Association for Computational Linguistics

https://doi.org/10.18653/v1/P17

199

Emoji and Self-Identity in Twitter Bios

Jinhang Li,∗Giorgos Longinos,∗ Steven R. Wilson and Walid MagdySchool of Informatics

The University of EdinburghEdinburgh, United Kingdom

{j.li-183,g.longinos}@[email protected], [email protected]

Abstract

Emoji are widely used to express emotions andconcepts on social media, and prior work hasshown that users’ choice of emoji reflects theway that they wish to present themselves tothe world. Emoji usage is typically studied inthe context of posts made by users, and thisview has provided important insights into phe-nomena such as emotional expression and self-representation. In addition to making posts,however, social media platforms like Twitterallow for users to provide a short bio, whichis an opportunity to briefly describe their ac-count as a whole. In this work, we focus onthe use of emoji in these bio statements. Weexplore the ways in which users include emojiin these self-descriptions, finding different pat-terns than those observed around emoji usagein tweets. We examine the relationships be-tween emoji used in bios and the content ofusers’ tweets, showing that the topics and eventhe average sentiment of tweets varies for userswith different emoji in their bios. Lastly, weconfirm that homophily effects exist with re-spect to the types of emoji that are included inbios of users and their followers.

1 Introduction

With the rise of social media usage and online text-based communication, emoji, a simple but power-fully expressive set of visual characters (Danesi,2016), have become a hugely popular means to ex-press emotions, moods, and feelings over computer-mediated communication (Kelly and Watts, 2015).In the era of big data, with more and more peopleengaging with social media, researchers have be-gun to study the ways in which social media usersinclude emoji in their posts, finding that emojiusage is associated with things like personality(Li et al., 2018), culture (Guntuku et al., 2019),

∗ Authors contributed equally.

and socio-geographical differences (Barbieri et al.,2016).

Prior work has typically focused on how peopleuse emoji within the posts that they make online(Ljubesic and Fiser, 2016; Robertson et al., 2018),or the way that they can be used as reactions toother content (Tian et al., 2017). However, emojiare also commonly used within user’s self-createdprofiles. In this work, we specifically examine theinclusion of emoji in Twitter bios, which are short(160 characters maximum) texts describing a Twit-ter account. These bios are featured prominently ona user’s profile page, and given their limited length,users often use this space succinctly express the es-sential information about their accounts. Therefore,we expect that the choice of emoji used in thesebios will have a strong connection to a user’s onlineself-identity, or the way that they seek to portraythemselves to others on a social media platform.

The goal of this paper is to give an overview ofhow emoji are used in Twitter bios from a com-putational linguistics perspective, that is, we treatemoji as a special category of tokens and makeuse of natural language processing methods to un-derstand the major trends in the ways that peopleuse emoji in their bios and what this says aboutboth the things they tweet about and their followernetwork. Our results provides insights into the va-riety of ways in which people choose to presentthemselves online in their Twitter bios that may beoverlooked when only considering non-emoji wordtokens or only considering the ways that people useemoji in the content of tweets. More specifically,we ask, and subsequently describe the work doneto answer, the following research questions:

RQ1. How are emoji used in Twitter bios? As afirst step, we seek to characterize the ways in whichusers use emoji in their bios. We look at the typesof emoji that most commonly used in Twitter bios,

Page 2: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

200

and the position within the bios that emoji appear.We compare our findings to trends from the usageof emoji in tweets by the same set of users and notethe differences.

RQ2. What is the relationships between theemoji in a user’s bio and the content that theuser posts? Next, we explore the correlations thatexist between the choice of emoji to be includedin a user’s bio and the content that that user tweetsabout. We consider this from the perspectives ofword-level patterns, topic usage, and overall tweetsentiment.

RQ3. Do users and their followers use emoji intheir bios in a similar way? Last, we investigatethe homophily of emoji usage with bios by study-ing the follower networks of our core set of users.We look at the similarities in both the absence orpresence of emoji in users’ bios as well as particu-lar choices of emoji used.

2 Background

2.1 Online Self-Identity

Self-identity, or self-concept, is a collection of firmand noticeable beliefs about oneself (Sparks andShepherd, 1992). From a general perspective, self-identity gives the answers to the question “Whoam I?”. Many components make up self-identitytogether. The self-categorization theory asserts thatthe self-identity consists of at least two types ofself-categorization: personal identity (what makesme unique?) and social identity (which groups do Ibelong to?) (Guimond et al., 2006).

As social attributes are inherent, people revealtheir self-identity when they communicate with oth-ers or interact with the outside world (Fisher et al.,2014). Expressing themselves is also a way forpeople to establish connections and bonds with theworld. Therefore, social media provides a naturalopportunity to study self-identity. Previous studieshave shown that specific personality characteristicscan be measured by analyzing linguistic behavioron social media using natural language process-ing techniques (Plank and Hovy, 2015). Otherwork analyzed the words, phrases, and topics col-lected from the Facebook messages, and linkedthese to personality traits and demographics ofusers (Schwartz et al., 2013). Twitter bios havebeen shown to be are particularly useful in discov-ering other aspects of self-identity such as politicaland religious affiliations (Rogers and Jones, 2019).

2.2 Self-representation in Emoji

While many studies related to online self-identityare based on the analysis of textual features, othershave turned to emoji as important signals of users’identities. In one study, researchers looked at Twit-ter names and bios, uncovering stark differencesin the emoji use of groups supporting and opposedto white nationalism (Hagen et al., 2019). Graells-Garrido et al. (2020) found that in two South Amer-ican countries, different colour variations of heartemoji indicated users’ opinions about abortions:tweets containing the green heart emoji ‘ ’ weremore likely to convey support of women’s rights,while the blue heart emoji ‘ ’ was more associatedwith stronger restrictions of abortions. In anotherstudy, researchers explored differences in emoji us-age across cultures, finding that users from westerncountries tend to use more emoji than users fromeastern countries (Guntuku et al., 2019). Althoughthere were specific emoji that were found to beculturally specific (e.g. cooked rice ‘ ’), it wassuggested that many common emoji have similarmeanings across cultures.

It has been shown that usage of some emoji arealso correlated with aspects of identity such as per-sonality traits (Volkel et al., 2019), and the useof skin-tone modifiers in emoji has been linkedto greater feelings of self-representation online,with no evidence that the skin-tones in emoji cor-related with the expression of racist views online(Robertson et al., 2018, 2020). Other work foundgender stereotypes in the use of male and femaleemoji modifiers: male modifiers were more fre-quently used in emoji related to business and tech-nology while female modifiers were used in emojirelated to love and makeup more often (Barbieriand Camacho-Collados, 2018).

3 Data

For our study, we sampled users from Twitter whotweeted between April and July 2020. Using theTwitter streaming API, we began collecting tweetsand storing all user-level information available for

Dataset Users Tweets RetweetsEmojiBio 20,000 2,998,219 1,568,661

NonEmojiBio 2,000 491,646 247,800Followers 7,105,521 425,704,661 169,935,436

Table 1: Number of users, tweets, and retweets (subsetof tweets) in our datasets.

Page 3: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

201

Bios TweetsEmoji Appearances Emoji Appearances

1311 1159991152 65410965 48411720 40892559 34368547 33766543 33173490 23037467 17040326 16929

Table 2: The most frequently used emoji in the bios andtweets of the emojiBio dataset.

each tweet, including the bio. In order to filterout both fake or less well-established accounts, weremoved all accounts that had less than 100 fol-lowers, and to remove celebrity or other widelypopular accounts, we filtered out those with morethan 1000 followers. From the remaining set ofusers, we randomly sampled 20,000 users whichhave at least one emoji in their bios, and collectedtheir most recent 200 tweets, as available, label-ing this dataset “emojiBio”. We also collected 200tweets each for a set of 2,000 users who did notuse any emoji in their bio as a control group, whichwe label the “nonEmojiBio” dataset. Finally, the“Followers” dataset contains the user-level informa-tion and recent tweets of the followers of the usersof both the emojiBio and nonEmojiBio datasets.Details about the size of the datasets are presentedin Table 1.

As our dataset contains text written in many lan-gauges, we first used the pre-trained fastText lan-guage identification model (Joulin et al., 2016a,b)to detect the language that each tweet or bio waswritten in. The most common languages in ourdatasets were English, Japanese, Spanish, and Por-tuguese, followed by others. After identifying thelangauges, we tokenized the English-language textsusing the NLTK (Loper and Bird, 2002) TweetTo-kenizer1 and the texts detected as being writtenin other languages using the Polyglot multilingualtokenizer.2

4 Emoji Usage in Bios

First, we sought to characterize the use of emoji inusers’ bios, so we turn to just the emojiBio dataset.

1https://www.nltk.org/api/nltk.tokenize.html2https://polyglot.readthedocs.io/

Group Name Num. Emojis In Bios User ratio ExamplesPeople & Body 2485 745 20.0%

Symbols 301 229 15.4%

Objects 299 219 15.9%

Flags 275 215 16.5%

Travel & Places 264 206 14.9%

Smileys & Emotion 162 151 44.3%Animals & Nature 147 132 18.9%

Food & Drink 131 117 5.5%

Activities 95 82 15.2%

Table 3: Emoji groups present in Unicode Emoji v13.0,number of unique emoji in the group, number of uniqueemoji used at least once in a bio in the userBios dataset,the percentage of users who use at least one emoji fromthe corresponding group in their bio, and examples ofemoji from the group.

We contrast the most commonly used emoji3 inbios and in tweets in Table 2, finding that facialexpression emoji (‘ ’, ‘ ’, ‘ ’, ‘ ’, ‘’, ‘ ’) are more frequently used in tweets, whiledifferent variations of heart emoji (‘ ’, ‘ ’, ‘’, ‘ ’, ‘ ’, ‘ ’) are more frequently used inbios. Another emoji that is regularly used in bios isthe rainbow emoji ‘ ’. The sparkles emoji ‘ ’and the female sign emoji ‘ ’ (not in top 10) arefrequently used in both bios and tweets. We alsochecked the average position of emoji within users’bios and tweets, and found that in both cases, mostemoji appear at the end of the text. These emojiat the end commonly signify the overall meaningor sentiment of the text. However, we noticed thatthe emoji in bios are, on average, used closer tothe middle of the text than emoji that are used intweets. There is also a nontrivial number of emojiused at the start of texts, which happens more oftenin bios than in tweets. Additionally, we found thatis more common for users to use a single emoji asthe entire content of a bio than as the entire contentof a tweet (more details in Appendix B).

Unicode Emoji 13.0 contains a total of 4,159emoji in nine groups according to categories. Wecarried out analysis on emoji based on their prede-fined groups, and the results are shown in the Table3. We found that the number of unique emoji in acategory is directly correlated with the number ofunique emoji from that group that appear in users’bios. However, after calculating the proportion of

3In their Unicode representations, some emoji withthe same visual pattern are represented by different codepoints for historical reasons, code points can be di-vided into fully-qualified, minimally-qualified or unqualified(https://www.unicode.org/reports/tr51/). In this paper, we onlypresent the qualified version of a given emoji pattern whenreporting results.

Page 4: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

202

Table 4: Mutual information score rank of emoji in thebios group by top 20 emoji.

the users who use at least one emoji from eachgroup, we noticed that most users used emoji fromthe Smileys & Emotion group in bios, with a totalof 44.3% of the 20,000 users, followed by the Peo-ple & Body group with 20% of users including atleast one emoji from that group. On the contrary,the number of users who used the emoji of the Food& Drink group is the least, accounting for only 5%of the total users. This suggests that users chooseto represent themselves with more facial expres-sions, people-centric emoji, and emotions, whichare connected to aspects of self-identity. We alsofound that many users use their bios to present theirinterests to others – some users use these types ofemoji to express their love for certain singers orsports clubs.

Next, we examine the relationships between setsof emoji that users include in their bios. We se-lected the top 20 emoji used in bios and computedthe mutual information between the presence ofthese emoji in a user’s bio and the presence of anyother emoji. The emoji with the highest mutualinformation scores are presented in Table 4.4 . Wefound that high-frequency emoji also had high mu-tual information scores for many other emoji, suchas heart emoji of various colors: ‘ ’, ‘ ’,‘ ’.This indicates that these high requency emoji arenot used indiscriminately, but in particular ways

4The first emoji represent a red heart, and the fifteenthemoji represent a heart suit. They are two emoji patterns withentirely different meanings and also subtle differences in theshape and color.

Table 5: Mutual information score rank of tokens in thebios group by top 20 emoji, and translate non-Englishin parentheses.

and have patterns in the ways that they co-occurwith other emoji. Another finding is that emojiwhich are similar to the original emoji have highscores. This finding suggests that similar or thesame types of emoji are more likely to be used to-gether. For example, in row 10, four types of ballemoji: basketball ‘ ’, baseball ‘ ’, tennis ‘ ’,and American football ‘ ’, appear in the ten emojithat provide the most mutual information for soc-cer ball emoji ‘ ’. People who like football mayalso enjoy other ball sports, and using these ballemoji in the bios at the same time indicates thatthey are ball sports enthusiasts (either as playersor spectators). Another example is that in the 14throw, there are eight national flag emoji out of theten emoji that have the highest mutual informationwith the American flag emoji ‘ ’. People may usemultiple flags in the bios to imply their residencesand national origin. Finally, we noticed that userstend to use emoji together that fit a specific con-text. For example, for the ring emoji ‘ ’ in row18, the most relevant emoji are kiss ‘ ’, personwith veil ‘ ’, man in tuxedo ‘ ’, and pregnantwoman ‘ ’. People may use these emoji in thebios to express their relationship status, potentiallyindicating whether they are engaged, married, orexpecting a child.

We also calculated the mutual information scoreof non-emoji tokens and the top 20 emoji, as shownin Table 5. Our dataset is multilingual, so the to-kens obtained are also multilingual. We removedsome tokens that do not capture any specific contentinformation, such as some honorifics in Japanese.We found that the usage of emoji is related to wordswith similar meanings as the emoji, consistent withour previous findings that emoji with similar mean-

Page 5: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

203

EmojiBio NonEmojiBioBios Tweets Bios Tweets

Average Number of Emoji 3.05 0.73 0 0.39Average Number of Hashtags 0.23 0.06 0.19 0.08Average Number of Words 8.51 6.75 9.49 7.74

Table 6: The average number of emoji, words (exclud-ing stopwords) and hashtags in the bios and tweets ofthe emojiBio and nonEmojiBio datasets

ings had high mutual information. An example ofthis in the word-level results is in row 10 of Table 5,the tokens most related to soccer ball emoji ‘ ’ arewords in different languages with similar meaningsrelated to soccer and player. This finding furtherconfirms that people prefer to use relevant emojiin a specific context. There are many other exam-ples with similar trends, such as the rainbow flagemoji ‘ ’ in row 8 and the American flag emoji‘ ’ in row 14. Further, we observed that the heartemoji used in bios are more related to showingthe love for celebrities or sports clubs, for exam-ple, “flamengo” (Row 1) is a sports club (shorthandname for Clube de Regatas do Flamengo), and “bts”(Row 4) is a Korean male singing group.

5 The Relationship between Emoji inBios and Tweeted Content

Next, we explore the relationship between Emojiusage in bios and tweeted content. We start bycomparing the overall trends in twitter usage be-tween the sets of users with and without emoji intheir bios in order to investigate whether there arenotable differences in the volume of emoji, hash-tags, and words (excluding emoji, hashtags, andstopwords) used by each group (Table 6).

In terms of the quantity of words and hashtags,there are no significant differences between theemojiBio and nonEmojiBio datasets. In the emo-jiBio dataset, we noticed that there is increasedusage of emoji in bios compared to tweets (3.05emoji in bios compared to 0.73 in tweets). The factthat the character limit for tweets is more flexiblethan the limit for bios makes this result even moreimpressive. In the nonEmojiBio dataset, the aver-age number of emoji that appear in tweets drops to0.39, which is roughly half the rate of emoji usagein tweets found in the emojiBio group. In terms ofhashtags, there is again an increased usage in bioswhich is similar between the two datasets. In termsof words, users who do not have emoji in their biostend to use a slightly higher amount of words intheir bios and tweets. Specifically, the users in the

nonEmojiBio group used roughly 1 more word, onaverage, than their emojiBio counterparts, in bothtweets and bios.

In addition to differences in the number of words,hashtags, and emoji used, we expect that aspectsof a user’s identity that are revealed through emojiin their bios will be reflected in measurable waysin the content that they choose to tweet about. Weperform a case study in which we select two partic-ular interesting emoji that were common in users’bios, and compare the content of the tweets fromusers who had these emoji in their bios using bothtopic modeling and sentiment analysis.

The emoji that we focus on for this case studyare the rainbow emoji ‘ ’, and the American flagemoji ‘ ’. These emoji are both used with similarfrequencies, but are rarely used together and repre-sent distinct groups of users which we seek to un-derstand through the lens of the twitter content thatthey generate. In our emojiBio dataset, the numberusers using these in bios are close at 324 (‘ ’) and302 (‘ ’), while only two of the users use bothemoji at the same time in their bios, so these twoemoji can distinguish users well. These emoji alsobelong to different emoji subgroups within Uni-code Emoji 13.0: the rainbow ‘ ’ belongs to thesky & weather subgroup under the Travel & Placesgroup, and the American flag ‘ ’ belongs to thecountry-flag subgroup under the Flags group.

Among the 324 users who use rainbow emoji‘ ’, 155 users use English in the bios, 46 Japanese,33 Portuguese, and 31 Spanish. For compari-son, among the 302 users who use the Ameri-can flag ‘ ’, 245 use English as the language inbios, 15 Spanish, 12 Japanese, and 9 Portuguese.The tweets involved also are multilingual, but aremostly written in English. For the analyses in thissection, we first translated all non-English tweetsinto English using the Google Translate API.5 Con-sidering that the topic modeling and sentiment anal-ysis methods that we use mostly rely on bag-of-words representations of the text, issues with thegrammatical accuracy of translated tweets will nothave as large of an impact. After the translation,we have two sets of tweets corresponding to thetwo groups of users who used the emoji of ‘ ’and ‘ ’. The number of tweets for each group are61,239 and 58,376, respectively.

We performed topic modeling using LatentDirichlet Allocation (Blei et al., 2003) on the tweets

5https://cloud.google.com/translate

Page 6: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

204

Figure 1: The most relevant tokens for topics inferredfrom the tweets from users who use the ‘rainbow’ emojiand the ‘United States flag’ emoji in their bios.

of users who used the emoji ‘ ’ and ‘ ’ in theirbios. We used the coherence score provided by thegensim Python library6 to select the number of top-ics. We train a separate topic model for each groupof users, and select four topics for each model. InFigure 1, we visualize the process of inferring top-ics by zooming in on the most relevant tokens foreach of the topics within the set of tweets writ-ten by each group of users. The weights betweentopics are unequal, decreasing from top to bottomas presented in the figure. The topics of tweetsfrom users who use rainbow emoji ‘ ’ in the biosinclude words related to concepts like life, commu-nity, entertainment, and society. We notice sometopics that contain more pleasant words, some re-lated to gender identity, others to life and pets. Thefourth topic appears to be related to issues of policebrutality. However, on the whole, the tweets postedby users who use the American flag emoji ‘ ’ inthe bios are more heavy and serious. They are moreconcerned about topics related to police, president,and current affairs. Because of the massive surgein the #blacklivesmatter movement, caused by thedeath of George Floyd in the United States, brokeout at the end of May 2020, and we downloadeduser tweets during this time, there is a clear topicfor this current affair. Besides, other current affairsdiscussed include Antifa and COVID, but thesewere part of the same topic. Comparing the twosets of different topics, we found that the differ-ent emoji included by the users in their bios arerelated to distinct topics, which also may reflect theself-identities of the users who used these emoji.

6https://radimrehurek.com/gensim/

Figure 2: Sentiment analysis of the tweets from userswho use emoji ‘ ’ and ‘ ’ in bios separately.

The rainbow emoji ‘ ’ often represents gay pride,as well as happiness and peace in general, so thecorresponding tweets also mostly reflect the loveof these users for life and others. In contrast, userswho use the American flag ‘ ’ are more concernedabout national politics and current affairs withinthe United States.

We also conducted a sentiment analysis on thesetwo sets of tweets, using the Vader sentiment analy-sis tool (Hutto and Gilbert, 2014), giving the resultspresented in Figure 2. According to the figure, forthe two datasets, the distribution of sentiment isfairly consistent overall, with more positive contentthan negative. While the amount of neutral senti-ment in the two datasets is almost the same, theusers with rainbow emoji ‘ ’ in their bios tweetedmore positive content overall, compared the theusers with the US Flag emoji ‘ ’ in their bios.Close to 40% of the tweets from users who userainbow emoji ‘ ’ in bios are positive, and lessthan 25% are negative. In contrast, less than 35%of tweets sent by users using the American flag ‘ ’in bios are positive, and close to 30% are negative.

These sentiment analysis results are mostly con-sistent with the results of the topic modeling. Thetweets sent by users who use rainbow emoji ‘ ’are more happy and light than those sent by userswho use the American flag emoji ‘ ’ in the bios.This case study suggests that users using differ-ent emoji in bios can reflect aspects of both theirnational identity and their personality. More specif-ically, this analysis shows that groups using someemoji in the bios generate more positive contentthan groups using other emoji.

6 Homophily Effects in Emoji Usage inBios

For our final set of analyses, we explored the extentto which users and their followers use emoji intheir bios in similar ways. At a very basic level,regarding the absence or presence of emoji in the

Page 7: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

205

7420 6529 26199 2028

7361 4054 5422 1943

5509 3817 2582 1375

4902 2128 2421 1331

4897 1865 2262 1150

3335 1816 1933 913

2941 1523 1929 828

2880 1495 1861 791

2042 1363 1837 782

2042 1319 1568 693

Table 7: Top 10 emoji used by followers of users withparticular emoji in bios and their counts. Bold indicatesthe count for the same emoji that was used by the ref-erence user. We observe that it is very common for auser and their followers to use the same kinds of emojiin their bios.

followers’ bios, there was a considerable differencebetween the emojiBio and nonEmojiBio datasets.The followers of users that have emoji in their bios(emojiBio) have emoji in their bios as well 32.47%of the time. For the followers of users that donot have emoji in their bios (nonEmojiBio), thisaverage percentage drops to 23.23%.

Next, we selected three representative emojifrom the set of most frequently used emoji in theemojiBio dataset, namely, green heart emoji ‘ ’,soccer ball emoji ‘ ’, and American flag emoji‘ ’. Also, to eliminate bias caused by only consid-ering high-frequency emoji, we selected the low-frequency dog face emoji ‘ ’ used by a total ofjust 157 users in our emojiBio dataset. In Table 7,we list the ten most frequently used emoji in thebios by the followers (from our Followers dataset)of the users who use these four specific emoji andmark the emoji that are the same as the users inbold text.

The green heart ‘ ’ and the American flag ‘ ’are the emoji that are used most frequently by fol-lowers of users who also include these emoji. Thesoccer ball emoji ‘ ’ ranks third, and the dogface emoji ‘ ’ ranks fifth, only with several high-frequency emoji in front of them. There is a stronghomophily relationship that indicates that the usersuse the same emoji with their followers in bios.Using the same emoji also reflects that emoji inthe bios can reflect the users’ self-identity in termsof group belonging, or their social identity. As anillustration, users using dog face emoji in bios maywant to signal that they are dog lovers, and they

Figure 3: The distribution of the percentage of commonand similar emoji appearances in the followers’ bios.The lines in the graph represent the average percentageof common and similar emoji appearances.

Common Emoji Appearances Similar Emoji AppearancesEmoji Percentage (%) Emoji Percentage (%)

14.75 34.7713.72 34.3113.33 33.9110.46 33.8110.4 30.32

Table 8: The emoji with the highest percentage of com-mon (exact match) and similar appearances betweenthe users’ and the followers’ bios.

may also chose to make online connections withothers who are similar, leading to many other doglovers in their networks.

We also take a particular look at the high-frequency emoji used by followers of users who usethe American flag ‘ ’. Prior work on emoji andAmerican political movements on Twitter (Hagenet al., 2019) pointed out that water (“blue”) waveemoji ‘ ’ is related to the US Democratic party,and pointed out that this emoji is frequently asso-ciated with hashtag #resist to express anti-whitenationalist sentiments. We also observe the use ofthe red heart ‘ ’ and blue heart ‘ ’ emoji, twocolors are are often associated with the US repub-lican and democratic parties, respectively. Thesefollowers may be expressing their political opin-ions: they use the American flag emoji along withother more specific emoji express their particularviews. Lastly, we notice several emoji related toreligion in this column, indicating expressions ofreligious as well as political affiliations.

In addition to the focused study on these fouremoji, we also examined whether the emoji used inbios of Twitter users are either the same, or gener-ally similar to those used by their followers in the

Page 8: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

206

entire dataset. To assess similarity we trained ourown emoji embeddings with a skip-gram model(Mikolov et al., 2013) using the tweets and biosof the emojiBio dataset, and subsequently we cre-ated a similarity lexicon of emoji based on thecosine similarity between the vectors, consideringone emoji to be similar to another if it was withinthe top ten nearest neighbors in the learned embed-dings space. We found that the average percent-ages for common (i.e., exact matches) and similaremoji appearances between the users of the emo-jiBio dataset and their followers are 3.45% and13.30%, respectively. The respective distributionsof the percentage for common and similar emojiappearances are presented in Figure 3. We usedpermutation tests to confirm that the difference be-tween these two values was statistically significant,and therefore conclude that the followers of a givenuser seem to have a considerably high probabilityto use the same, or similar emoji in their bios as theusers they follow. Table 8 shows the five emoji forwhich followers used the same, or similar emoji asthe users that they follow.

7 Discussion

We now give answers to our original research ques-tions based on our results:

RQ1. How are emoji used in Twitter bios? Ourresults showed that emoji are used in unique wayswithin users’ bios on Twitter, even compared to theways in which they are used in tweets. In general,emoji are positioned earlier in bios than in tweets,while there is a higher percentage of bios that startwith an emoji compared to tweets. Also, it is morecommon for an emoji to be the only content of abio than the only content of a tweet.

Moreover, facial expression emoji are the domi-nant type of emoji in tweets, while different varia-tions of heart emoji are dominant in bios. Specif-ically, the most popular emoji in bios are fromthe Smileys & Emotion group, while the least fre-quently used emoji are from the Food & Drinkgroup. Furthermore, we noted that the most fre-quently used emoji in bios have a high mutual in-formation with other emoji that are similar to them,or from the same category (e.g. hearts, balls, flags),or related to the same concept (e.g. relationship sta-tus). In their bios, people tend to use emoji to showtheir support for musical groups or sports teams (orsports in general), as well as things like countriesthat they come from or are currently living in.

RQ2. What is the relationships between theemoji in a user’s bio and the content that theuser posts? Compared to users who do not haveany emoji in their bios, users with emoji in theirbios use about twice as many emoji in their tweets,on average. They also use less words in both theirtweets and bios. In our case study, topic modelsbuilt from the tweets of the users that use the rain-bow ‘ ’ and the American flag ‘ ’ emoji in theirbios showed that users who have the rainbow emoji‘ ’ in their bios tweet about life, community, enter-tainment, and society, whereas users who have therainbow emoji ‘ ’ in their bios tweet about police,president, and current affairs. Also, it was shownthat tweets of users that have the rainbow emoji‘ ’ in bios convey a more positive sentiment onaverage compared to users that use the Americanflag ‘ ’ in their bios. This is just one exampleto showcase the fact that the types of emoji thatpeople choose to include in their bios reflect largerviews, opinions, and sentiments that are expressedin the content of their tweets.

RQ3. Do users and their followers use emoji intheir bios in a similar way?

The usage of emoji in bios also led us to someconclusions related to homophily effects in Twitter.First, our results indicate that followers of userswho have emoji in their bios, are more likely tohave emoji in their bios as well. We also foundthat users tend to use the same, or similar emoji intheir bios as the users they follow. For example,followers of users with the green heart emoji intheir bios also had other colored hearts in theirbios, with the green heart being the most commonused by the followers. These findings suggest thatthere are indeed similarities within user networksin the ways in which emoji are used in Twitter bios.

8 Conclusion

We have presented an overview of the ways inwhich Twitter users include emoji in their bios, andwhat kinds of things we can learn about those usersfrom the particular emoji that they use. Using arange of approaches, we have shown that emoji arean important component to consider when exam-ining the ways in which users present themselvesto others in online settings like Twitter. The emojithat users choose to include reveal important as-pects of their self-identities, such as the teams andmusicians that they support, the activities they en-joy, their national and political identities, and show

Page 9: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

207

their similarities with their followers in these sameaspects. At the same time, we have only brushedthe surface of the types of in-depth analyses thatcould be performed by consider specific sets ofemoji and examining how these relate to the iden-tities of the users who include them in their bios.This work can provide an important complemen-tary view to other work on online-self identity thatmainly focuses only on the plain text content.

ReferencesFrancesco Barbieri and Jose Camacho-Collados. 2018.

How gender and skin tone modifiers affect emoji se-mantics in twitter. In Proceedings of the SeventhJoint Conference on Lexical and Computational Se-mantics, pages 101–106.

Francesco Barbieri, German Kruszewski, FrancescoRonzano, and Horacio Saggion. 2016. How cos-mopolitan are emojis? exploring emojis usage andmeaning over different languages with distributionalsemantics. In Proceedings of the 24th ACM interna-tional conference on Multimedia, pages 531–535.

David M Blei, Andrew Y Ng, and Michael I Jordan.2003. Latent dirichlet allocation. Journal of ma-chine Learning research, 3(Jan):993–1022.

Marcel Danesi. 2016. The semiotics of emoji: Therise of visual language in the age of the internet.Bloomsbury Publishing.

Michael Fisher, Martin Abbott, and Kalle Lyytinen.2014. The concept of self-identity. In The Powerof Customer Misbehavior, pages 61–67. Springer.

Eduardo Graells-Garrido, Ricardo Baeza-Yates, andMounia Lalmas. 2020. Every colour you are: Stanceprediction and turnaround in controversial issues.arXiv preprint arXiv:2005.10019.

Serge Guimond, Armand Chatard, Delphine Martinot,Richard J Crisp, and Sandrine Redersdorff. 2006.Social comparison, self-stereotyping, and gender dif-ferences in self-construals. Journal of personalityand social psychology, 90(2):221.

Sharath Chandra Guntuku, Mingyang Li, Louis Tay,and Lyle H Ungar. 2019. Studying cultural differ-ences in emoji usage across the east and the west. InProceedings of the International AAAI Conferenceon Web and Social Media, volume 13, pages 226–235.

Loni Hagen, Mary Falling, Oleksandr Lisnichenko,AbdelRahim A Elmadany, Pankti Mehta, Muham-mad Abdul-Mageed, Justin Costakis, and Thomas EKeller. 2019. Emoji use in twitter white nationalismcommunication. In Conference Companion Publi-cation of the 2019 on Computer Supported Cooper-ative Work and Social Computing, pages 201–205.

Clayton J Hutto and Eric Gilbert. 2014. Vader: A par-simonious rule-based model for sentiment analysisof social media text. In Eighth international AAAIconference on weblogs and social media.

Armand Joulin, Edouard Grave, Piotr Bojanowski,Matthijs Douze, Herve Jegou, and Tomas Mikolov.2016a. Fasttext.zip: Compressing text classificationmodels. arXiv preprint arXiv:1612.03651.

Armand Joulin, Edouard Grave, Piotr Bojanowski,and Tomas Mikolov. 2016b. Bag of tricksfor efficient text classification. arXiv preprintarXiv:1607.01759.

Ryan Kelly and Leon Watts. 2015. Characterisingthe inventive appropriation of emoji as relationallymeaningful in mediated close personal relationships.Experiences of technology appropriation: unantici-pated users, usage, circumstances, and design, 20.

Weijian Li, Yuxiao Chen, Tianran Hu, and JieboLuo. 2018. Mining the relationship between emojiusage patterns and personality. arXiv preprintarXiv:1804.05143.

Nikola Ljubesic and Darja Fiser. 2016. A global analy-sis of emoji usage. In Proceedings of the 10th Webas Corpus Workshop, pages 82–89.

Edward Loper and Steven Bird. 2002. Nltk: The natu-ral language toolkit. In Proceedings of the ACL-02Workshop on Effective Tools and Methodologies forTeaching Natural Language Processing and Compu-tational Linguistics, pages 63–70.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-frey Dean. 2013. Efficient estimation of wordrepresentations in vector space. arXiv preprintarXiv:1301.3781.

Barbara Plank and Dirk Hovy. 2015. Personality traitson twitter—or—how to get 1,500 personality testsin a week. In Proceedings of the 6th Workshopon Computational Approaches to Subjectivity, Sen-timent and Social Media Analysis, pages 92–98.

Alexander Robertson, Walid Magdy, and Sharon Gold-water. 2018. Self-representation on twitter usingemoji skin color modifiers. In Proceedings of theInternational AAAI Conference on Web and SocialMedia (ICWSM).

Alexander Robertson, Walid Magdy, and Sharon Gold-water. 2020. Emoji skin tone modifiers: Analyzingvariation in usage on social media. ACM Transac-tions on Social Computing, 3(2):1–25.

Nick Rogers and Jason J Jones. 2019. Using twit-ter bios to measure changes in social identity: Areamericans defining themselves more politically overtime?

H Andrew Schwartz, Johannes C Eichstaedt, Mar-garet L Kern, Lukasz Dziurzynski, Stephanie M Ra-mones, Megha Agrawal, Achal Shah, Michal Kosin-ski, David Stillwell, Martin EP Seligman, et al. 2013.

Page 10: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

208

Personality, gender, and age in the language of socialmedia: The open-vocabulary approach. PloS one,8(9):e73791.

Paul Sparks and Richard Shepherd. 1992. Self-identityand the theory of planned behavior: Assesing therole of identification with” green consumerism”. So-cial psychology quarterly, pages 388–399.

Ye Tian, Thiago Galery, Giulio Dulcinati, EmiliaMolimpakis, and Chao Sun. 2017. Facebook sen-timent: Reactions and emojis. In Proceedings of theFifth International Workshop on Natural LanguageProcessing for Social Media, pages 11–16.

Sarah Theres Volkel, Daniel Buschek, Jelena Pran-jic, and Heinrich Hussmann. 2019. Understand-ing emoji interpretation through user personality andmessage context. In Proceedings of the 21st Interna-tional Conference on Human-Computer Interactionwith Mobile Devices and Services, pages 1–12.

Appendix

A Differences across languages

A language identification analysis was conductedfor the combined data of the emojiBio and nonEmo-jiBio datasets to identify the most frequently usedlanguages in the tweets and bios. The analysis wasconducted using the fastText language identifica-tion tool (Joulin et al., 2016b), (Joulin et al., 2016a)and the language distribution is presented in Figure4.

English Japanese Spanish Portuguese

10

20

30

40

5044.7

1210.1

8

46.8

15.5 15.1 13.5

Perc

enta

ge(%

)

Bios Tweets

Figure 4: Language distribution in the tweets and biosof the emojiBio and nonEmojiBio datasets combined.

B Positioning Analysis

The positioning analysis distribution for bios andtweets is presented in Figure 5. The results ofthe positioning analysis indicate that emoji appearearlier in bios than in tweets. For each emoji, itspositional value was calculated by computing its

distance from the first character of the text and di-viding it by the overall length of the text. Therefore,emoji that were used at the beginning of the texthad a positional value of 0, whereas emoji that wereused at the end of the text had a positional value of1.

Figure 5: The distribution of the positional values ofemoji in the tweets and bios of the emojiBio dataset.The vertical lines in the graphs represent the mean po-sitional value for tweets (blue) and bios (orange).

C Group Analysis

In the group analysis, we divided the bios into fourgroups according to the language used, and we cal-culated the mutual information score for all emojithat appeared. Table 9 shows the 25 emoji with thehighest scores in each group. We observed that inall language groups, there were multiple nationalflag emoji amongst the results. In most cases, thoseflags belong to countries where the respective lan-guage is spoken as a first or second language by aconsiderable portion of the population.

While emoji grouping is analyzed in Chapter 4,it is also important to consider that Unicode Emojialso provides standards for subgroups of emoji.Specifically, each emoji belongs to a group andalso belongs to a subgroup under the group, whichmakes the classification more specific. For exam-ple, the grinning face emoji ‘ ’ belongs to theface-smiling subgroup under the Smileys & Emo-tion group. Each group contains a different numberof subgroups, and overall there are 98 subgroups.We counted the total number of times the emojifrom each subgroup appeared in users’ bios andsorted them in descending order. Table 10 demon-strates the ten most popular subgroups.

The results suggest that the most frequently usedsubgroup is emotion while the face-smiling sub-group also belongs to the same category (Smileys& Emotion), showing that people are commonlyusing emoji to express their sentiments in bios. The

Page 11: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

209

Table 9: Mutual information score rank of emoji inbios, group by language

Subgroup Num Examplesemotion 8448

country-flag 3870

sky & weather 2361

animal-mammal 1700

event 1407

plant-flower 1224

zodiac 1110

clothing 1092

game 971

face-smiling 844

Table 10: The distribution of emoji in bios, based onpredefined subgroups.

second most frequently used subgroup is country-flag, which implies that users regularly use emojiin their bios to reveal their nationality or the coun-tries where they have lived. Animal-mammal andplant-flower are also frequently used. These emojiare used to express the love of users for animals orplants, but also for decoration reasons, to make biosmore attractive. Another interesting finding wasthat the zodiac subgroup ranks seventh. This find-ing shows that people like to use symbolic emoji totell others about their zodiac, which they consideras a part of their self-identity.

To confirm that the emoji could be accuratelygrouped in clusters, we conducted a statistical anal-ysis based on the results of the mutual informationscores for the 20 most frequently used emoji in bios.

Figure 6: The average mutual information score be-tween the 20 most frequently used emoji in bios andeach group and subgroup, respectively.

Specifically, we divided the 20 most frequentlyused emoji into groups and subgroups, and we plot-ted two heat maps which illustrate the categoriza-tion of emoji, as shown in the Figure 6. Whilecalculating the mutual information scores betweena group and a specific emoji, we did not considerthat emoji as part of the group, to ensure normal-ization. The results show that each emoji achieveda higher mutual information score with the groupor subgroup in which it belongs. This suggests thatemoji in bios are more commonly used with otheremoji from the same group or subgroup.

D Topic Modeling

We conducted supplementary experiments on thetopic modeling analysis of Chapter 5. Specifically,we used the LDAvis tool to visualize the results,and the topic distributions are shown in Figure 7.The topic distributions visualize the weight of eachtopic and the connection between different topics.More precisely, the circles represent the topics, andthe distance between the circle centers determinethe connection between the topics. More prevalenttopics are represented by larger circles.

E Frequency Analysis

The results of the frequency analysis showed thatthe popularity of words and hashtags varies greatlybetween bios and tweets. Table 11 presents themost frequently appearing English words and hash-

Page 12: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

210

BiosWord Appearances Hashtag Appearanceslove 645 #bts 33fan 473 #resist 28life 375 #maga 22

account 371 #exo 21insta 273 #bernie 16

instagram 218 #blacklivesmatter 15flamengo 216 #jimin 14

god 212 #wwgwga 11dm 212 #blm 11

follow 198 #mufc 10Tweets

Word Appearances Hashtag Appearanceslike 22843 #peing 868love 16025 #nintendoswitch 802get 14019 #blacklivesmatter 790

people 13541 #newprofilepic 739know 11671 #acnh 656good 10867 #covid 563time 9881 #animalcrossing 561go 9791 #otgalafinal 458lol 9504 #sanditon 349got 9241 #psshare 345

Table 11: The most frequently appearing English wordsand hashtags in the bios and tweets of the emojiBiodataset.

BiosWord Appearances Hashtag Appearances

fan 53 #phish 4love 48 #maga 4

account 36 #taehyung 3life 28 #resistance 3

twitter 22 #kag 3like 22 #ynwa 2

good 20 #trump 2world 19 #research 2god 19 #mufc 2

people 17 #bernie 2Tweets

Word Appearances Hashtag Appearanceslike 3826 #chismesfarandulachilena 288

people 2500 #meigen 182get 2315 #shindanmaker 160love 2202 #covid 151good 2036 #blacklivesmatter 135know 1921 #survivor 127time 1656 #nintendoswitch 121think 1633 #digitalmarketing 104

go 1625 #lockdownhouseparty 102see 1526 #bitcoin 92

Table 12: The most frequently appearing English wordsand hashtags in the bios and tweets of the nonEmojiBiodataset.

Figure 7: Topic distributions for the tweets of the userswho the ‘ ’ and ‘ ’ emoji in their bios.

tags in the bios and tweets of the emojiBio dataset.While the frequency analysis was conducted onmultilingual data, we present only the most fre-quently appearing English words for consistencyreasons, since there are many different Englishtranslations for words in other languages.

In terms of words, the more frequently usedwords in bios are nouns, in contrast with tweets,where verbs appear more frequently. The most fre-quently used words in bios are mostly related tothe social media activity of the user (account, insta,instagram, dm, follow) and their religious or spir-itual beliefs (love, life, god). On the contrary, intweets, we can see verbs related to positive senti-mental expression (like, love) or the conduction ofan activity (get, go, got).

The hashtags that more frequently appear in

Page 13: Emoji and Self-Identity in Twitter BiosSmileys & Emotion 162 151 44.3% Animals & Nature 147 132 18.9% Food & Drink 131 117 5.5% Activities 95 82 15.2% Table 3: Emoji groups present

211

bios are related to music artists or bands (#bts,#exo, #jimin), presenting the user’s music prefer-ences, to political beliefs or election candidates(#resist, #maga, #bernie) and the anti-violenceprotest group “Black Lives Matter” (#blacklives-matter, #blm). The hashtags related to “Black LivesMatter” are commonly found also in tweets, to-gether with hashtags related to gaming consolesand video games (#nintendoswitch, #animalcross-ing, #acnh, #psshare), TV series (#sanditon) andmusic competitions (#otgalafinal). Users also usehashtags to tweet about Peing - an “anonymousQ&A box” service on Twitter (#peing) and to no-tify others about an update of their profile picture(#newprofilepic). Additionally, users frequentlyuse a hashtag in their tweets which is related to theCOVID-19 pandemic (#covid).

Overall, the results for the words and hashtagsfrequency analysis per element of the nonEmo-jiBio dataset do not have significant differencescompared to the results of the emojiBio dataset.Also, despite the decreased usage of emoji in tweetsby these users, the distribution of the frequenciesare very similar compared to the emojiBio dataset,since facial expression emoji are dominant again.The complete results for the frequency analysis ofthe nonEmojiBio dataset are presented in Table 12,but they should be interpreted with caution sincethe nonEmojiBio dataset is considerably smaller.