Detecting Fake Engagement on Instagram

Post on 22-Jan-2018

184 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

Transcript

Detecting Fake Engagement on Instagram

Indira Sen

linkedin/in/indira-sen-8a6068140

@drealcharbar fb.com/indira.sen.31

Dr. Ponnurangam Kumaraguru(chair)

1

Thesis Committee

- Dr. Anwitaman Datta, NTU Singapore

- Mr. Nitendra Rajput, InfoEdge

- Dr. Ponnurangam Kumaraguru, IIIT Delhi (Chair)

2

Likes on Instagram

3,363 likes

3

Likes on Instagram

1,008 likes

4

Why is Engagement Important on Instagram?

5

Why Fake Likes?

- ‘Influencers’ compensated on engagement: likes and comments

- Incentive to artificially inflate engagement metrics by purchasing likes, like markets or like back networks

- Inflated like count fool potential brand or advertisers into hiring ‘unworthy’ Influencers

6

Motivation

7

- Influencer Marketing - $1B industry- Fake influencers landed deals over

$500

- How do we automatically detect fraudulent likes on Instagram?

Core Thesis Question

Organic Likes- Likers who engage with content- Genuine reach

Inorganic Likes- Likers bought from marketplaces- Artificial reach

- Understanding properties of genuine liking behaviour B : {b1, b2, …, bn}- Reducing the effect of likes which do not match B

8

Thesis Outline

- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion

9

What is a Like Instance?

- Given a poster S whose post p has been liked by liker L, we define a like instance as the tuple (L, p, S)

10

Research Aim

- Find out the features of liker L, post p and S, to determine the probability of liker L genuinely liking that particular post p.

- Identify true reach of poster by determining fake likes received on the posted content.

11

Possible Reasons for Genuine Liking

Homepage: followees’ posts

Explore:Instagram’s

Recommendations

Likes of followees

12

Possible Reasons for Genuine Liking

Based on photos you liked

Based on people you follow

Similar to accounts you interact with

Explore

13

Possible Reasons For Genuine Liking

- Poster is a followee - Poster is a followee of a followee

- Topical interests in common

14

How to get Fake Likes

- Marketplaces

- Like Back collusion networks

- Link Farming hashtags

- Bots15

Architecture Diagram1) Liker meta and last 18 posts2) Poster meta and last 18 posts3) Post meta

Fake Likes

Other Likes

Training Data

Machine Learning

Model

Random unknown Likes

Fake

Not Fake

Features

Features

16

1 - α

α

Data Collection: Fake Likes

Purchased Fake Likes

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

17

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

Data Collection: Fake Likes

Purchased Fake Likes

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

17

Data Collection: Fake Likes

- Honeypots to trap fake likers bought through a service- If user falls for honeypot then we monitor their liking

behaviour

Honeypot

18

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

Data Collection: Fake Likes

Purchased Fake Likes

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

19

Data Collection: Other Likes

Purchased Fake Likers

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

20

Data Collection: Other Likes

- Randomly sample 500 users from 1M users who are not honeypot victims

#Likes #Posts #Likers #Posters

Fake 10,417 8,408 500 7,715

Other 11,810 11,644 500 7,631

21

Thesis Outline

- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion

22

Understanding Fake Likes

- Hypotheses indicative of fake liking behaviour

- Validate with 2 sample KS test

- Network effect:- Liker is follower of poster- Liker is follower of follower of poster

23

Liker is Follower of Poster

- Green edges: liker relationship

- Red edges: liker - follower relationship

- Other likes have a higher proportion of follower-likers

24

Other Likes

Fake Likes

Network Effects

25

- 90% fake like instances have only .25 of followee likes

90%

56%

Interest Overlap

- A user will like a post if she shares topical interests with the post

- Affinity: lower the affinity, the higher the overlap

26

Extracting Topics

- Bio, post text and post image- Wikification and Densecap for images

27

Extracting Topics

- Bio, post text and post image- Wikification and Densecap for images

28

Image topics

Post caption topics

Interest Overlap

- A user will like a post if she shares topical interests with the post

- Affinity

- non-commutative29

Affinity

- Affinity outperforms Jaccard distance in terms of discernibility

- post image topics strong indicators of genuine liking

30

- Our metric is able to capture semantic relationship between entities compared to other traditional distance metrics

- 90% of other likes have an average affinity of 0.5 - 90% of fake likes have an average affinity of 0.74

0.740.5

31

Other Features

- Celebrities tend to get more likes (engagement) - Genuine likers will keep coming back - repeated likers- Link Farming hashtags: #like4like, #l4l, #like2follow- Topical hashtags- Posting activity of liker (Badri et al, CIKM’16) and poster- Profile picture of liker: egghead profiles (cheap to

create)

32

Automatic Detection of Fake Likes

- Using features described and a set of ML classifiers

- Fake likes : Other likes ratio → 1:2

- SVM RBF kernel gives best performance

33

Classification Model

- Performance

- Manually look at 100 false negatives and find that 70 of them had high topical overlap

- Liker interest set was small: affinity metric limitation

Precision Recall F1-score

0 0.93 0.96 0.945

1 0.895 0.825 0.86

total 0.92 0.925 0.92

34

In the Wild Experiment

- random 1,34,669 like instances

- Categorize posts into : food, fashion, outdoors, merchandise, people, gadgets, pets, captioned

- We find 8,557 fake likes

- Manually analyze 100 of these and find 78 to be fake35

Thesis Outline

- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion

36

- Enable advertisers to make better decisions- Reduce the effect of fake likes a poster may have

received- Measure Deviation in reach

Reach Estimation

37

Who receives fake likes?

- Users posting about merchandise, outdoors (including travel posts) and people (posts containing faces) have highest deviation from the projected reach.

38

Who receives fake likes?

39

merchandise, outdoors (including travel posts) and people

Most posters do not have high deviation while some users have very high deviation

Do Popular Users have more Fake Likes?

- No, users with lower follower counts who maybe trying to gain a following higher deviation

40

‘Micro Influencers’ have higher deviation

Conclusion

- Automated method to detect fake like instances

- Performs well to identify unseen fake likes on Instagram.

- Find true reach of a user

- Helps advertisers and brands identify users with genuine, meaningful reach

41

Challenges, Limitations and Future Work

- Availability of labeled data, approximations using honeypot

- Data collection constraints, integrate network features

- Improve affinity, improve precision(dynamic features)

- Fine grained topical recommendations for brands and advertisers 42

Acknowledgement

- Anupama Aggarwal, PhD Scholar, IIIT Delhi- Committee members- Srishti Gupta, Divyansh Agarwal, Neha Jawalkar, Sonu

Gupta, Kushagra Bhargava- Siddharth Singh, Shiven Mian- Members of Precog- Family and friends

43

Thanks!Any questions?You can find me at:

indira15021@iiitd.ac.in

45

pk@iiitd.ac.in

top related