Twitter Content-based Spam Filtering - CISIS 2013

Post on 18-Dec-2014

1272 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation at CISIS 2013 International conference of the paper: Twitter Content-based Spam Filtering

Transcript

Igor Santos Igor Miñambres-Marcos Carlos Laorden Patxi Galán-García Aitor Santamaría-Ibirika Pablo G. Bringas

Detecting spammer accounts

Content-based analysis

(TweetSpike) (Legitimate)

spam ham

t1

t2

t3

m1

m2

m10

m3

m9

m4

m7

m8

m5

m11

m6

legitimate

spam

legitimate

spam

testing

probability

Dynamic Markov Chain (DMC)

Prediction by Partial Match (PPM)

Classifier Acc. Sp Sr F-Measure AUC

Random Forest N=50 96.42 0.98 0.94 0.96 0.99

DMC without Adaptation 95.99 0.96 0.95 0.96 0.99

Random Forest N=10 95.96 0.97 0.94 0.95 0.99

PPM without Adaptation 94.80 0.97 0.91 0.94 0.99

Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98

Bayes K2 94.12 0.99 0.88 0.93 0.98

DMC with Adaptation 93.11 0.94 0.90 0.92 0.98

C4.5 95.79 0.98 0.92 0.95 0.97

KNN K=3 93.71 0.97 0.89 0.93 0.97

SVM PVK 95.81 0.97 0.93 0.95 0.96

PPM with Adaptation 76.50 0.78 0.69 0.72 0.86

Naive Bayes 72.72 0.64 0.89 0.75 0.76

A new and public dataset of twitter spam to serve as evaluation

Adaptation of content-based spam filtering to Twitter

A new compression-based text filtering library for the ML tool WEKA

enhance this approach using social network features

semantic capabilities by studying the linguistic relationships

1. Follow me: http://files.twiyo-magazine.com/200000231-

1dfbb1ef57/follow-me-twitter.png

2. Twitter: http://www.redunonet.co/twitter.png

3. Twitter Infography: http://expandedramblings.com/index.php/march-

2013-by-the-numbers-a-few-amazing-twitter-stats

4. Twitter news: http://techtips.biz/wp-

content/uploads/sites/9/2013/07/twitter-news.jpg

5. Customer service: http://www.parature.com/wp-

content/uploads/2012/04/customerservice_twitter.jpg

6. MUSI Deusto: https://twitter.com/MUSIDeusto

7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock-

Gossiping-Women-Retro-Clip-A-17343494.jpg

8. Cyber-bullying:

http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber-

bullies.jpg

9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy-

bear-15726476.jpg

10. Spam bird: http://all4boys.ru/_pu/0/52734883.png

11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for-

transporting-drug-money-from-vegas/dollars/

12. Day 97: Infected by dustywrath:

http://www.flickr.com/photos/10921499@N07/2187318683

13. my bank sucks by B Rosen:

http://www.flickr.com/photos/rosengrant/3537904106/

14. Spam wall by freezelight:

http://www.flickr.com/photos/63056612@N00/155554663/

15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp-

content/uploads/2010/11/Bird-with-Boxing-Gloves.png

16. Twitter media: http://media.meltybuzz.fr/article-1440806-

ajust_930/media.jpg

17. Construction bird: http://i1-news.softpedia-

static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg

18. Bird in egg: http://needsomeonetoblog.com/wp-

content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg

19. Document folder:

http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202

662836172612

20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png

21. Bird in pole: http://www.microcenterblog.com/wp-

content/uploads/2013/01/Fake-or-Real-150x150.jpg

22. Bird screaming: http://www.bluewaterbrand.com/wp-

content/uploads/2013/04/168_2671597.jpg

23. Bird with sign: http://blog.retirementincomenetwork.com/wp-

content/uploads/2013/05/twitter-bird.jpg

24. Bird in lineup: http://sparkboutik.com/wp-

content/uploads/2012/01/twitterfauxpas.jpg

top related