Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable Gibson Mba* Jeremiah Onaolapo # Gianluca Stringhini # Lorenzo Cavallaro* *Royal Holloway, University of London # University College London WWW2017 CyberSafety Workshop // Perth, Australia // April 4, 2017
31
Embed
Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable
Gibson Mba* Jeremiah Onaolapo#
Gianluca Stringhini# Lorenzo Cavallaro*
*Royal Holloway, University of London#University College London
WWW2017 CyberSafety Workshop // Perth, Australia // April 4, 2017
● “419” derived from Nigeria’s Criminal Law against such scams
● Been around for some time
● Most previous work focus on cybercrime targeting the US, EU, Asia1,2,3
● What about cybercrime targeting Africa?
○ Little attention
○ Hence our study
3
1R. Anderson et al., Measuring the Cost of Cybercrime, WEIS, 2012.2N. Christin et al., Dissecting one click frauds, CCS, 2010.3B. Stone-Gross et al., Your botnet is my botnet: Analysis of a botnet takeover, CCS, 2009.
Contributions
● Highlight a unique form of scam targeting vulnerable Nigerian students,
secondary school leavers, and unemployed persons, among others
● Provide insight into common themes around which fraudsters build their
scam schemes
○ We rely on Machine Learning (ML) techniques to achieve this
○ Themes -- Academic, Employment, Spirituality, Dating, Other
4
Automatic data classification
Dataset description
Ground truth extraction Validation ClusteringOur roadmap
5
Automatic data classification
Dataset description
Ground truth extraction Validation Clustering
Data sources
Our goal -- Collect and analyze data to understand scam schemes
● Topix.com’s Nigeria forum
● 2005 -- posts on news and current affairs
● 2012 onwards -- scam posts show up and grow
● Sheds light on 419 scams perpetrated against Nigerians
● Hosts posts promoting different types of scam services
○ Also contact information (mostly phone numbers) to reach the fraudsters
151T. Joachims, Text categorization with Support Vector Machines: Learning with many relevant features, ECML, 1998.2G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, IPM Journal, 1988.
TF-IDF2 SVMPostsFeatures
{is_scam, not_scam}
Results of automatic data classification
● Applied SVM model on entire dataset
○ 711,861 minus 1,035 posts used for training
● 679,222 (95.55% of the posts) -- YES (in other words, is_scam)
● 31,604 (4.45% of the posts) -- NO (in other words, not_scam)
● Conclusion -- The forum is a crime hub used by scammers to advertise
schemes to deceive and exploit their victims
16
17
Automatic data classification
Dataset description
Ground truth extraction Validation Clustering
5-fold validationMetric SVM
Accuracy 95.17%
Precision 96.54%
Recall 95.80%
Specificity 94.14%
F1 96.16%
Error 4.83%
18
Validation
19
Automatic data classification
Dataset description
Ground truth extraction Validation Clustering
Automatic data classification
Our goal -- Identify the theme of each scam post
Obstacle -- Too many posts (679,222)
Solution -- Rely on supervised ML techniques
● We manually checked 655 scam posts (training set) and identified five themes
● Traceable to dwindling academic performance ○ As reported by examination bodies
● Unemployment is also an issue○ 23.9% as of 2011, according to the
Nigerian Bureau of Statistics
● Themes are very important○ Key contribution
21
22
Automatic data classification
Dataset description
Ground truth extraction Validation Clustering
Confusion matrix (5-fold validation)
Academic Employment Spirituality Other Dating TotalCorrect
predictions
Academic 413 7 1 9 1 431 95.82%
Employment 2 136 0 2 0 140 97.14%
Spirituality 0 0 16 1 0 17 94.12%
Other 1 5 1 23 5 35 65.71%
Dating 1 0 0 0 31 32 96.88%
Total 417 148 18 35 37 655
23
24
Automatic data classification
Dataset description
Ground truth extraction Validation Clustering
Clustering
Goal -- Identify clusters of entities in the dataset, for instance,
groups of related scammers
Why? Could indicate coordination among fraudsters/ existence of criminal gangs
● We selected 16,194 posts from 679,222 crime posts
○ Random sampling without replacement
○ Confidence level 99%
○ Error rate 1%
● Density-Based Spatial Clustering of Applications with Noise (DBSCAN)1
251M. Ester et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, KDD, 1996.
Visualization of clusters
● DBSCAN computed 197 clusters from our data
● We fed some clusters through Gephi1
26
Case studySame topic, multiple phone numbers● Indicates coordination of activities among
scammers
● Could also be because some fraudsters trying to
copy post topics of other scammers
1M. Bastian et al., Gephi: An Open Source Software for Exploring and Manipulating Networks, ICWSM, 2009.
27
Case study -- a cluster of clusters● An elaborate scamming scheme● Emphasis on cluster
○ Single phone number node○ Multiple topics
28
Automatic data classification
Dataset description
Ground truth extraction Validation Clustering
● 2011 -- SIM registration policy by Nigerian Communications Commission (NCC)○ Registration involves recording some personal data and biometric information about subscribers
○ Key objective was to assist law enforcement agencies during the criminal investigations
● Did SIM registration help to reduce cybercrime on the forum?○ Posts containing phone numbers actually increased after SIM registration was introduced
○ Overall number of posts on the forum also increased (i.e., posting activity increased)
● Did SIM registration encourage the growth of criminal activity on the forum?○ No. The absence of cybercrime law until 2014, and weak investigation/ prosecution capabilities on the part of
law enforcement agencies are more likely reasons
○ Telecommunication firms were also not totally compliant with the SIM registration policy
SIM card registration: A countermeasure?
29
Takeaways
● Despite the massive coverage of 419 scams, some types are still understudied
● We highlight a unique form of scam targeting specific Nigerian demographics
● Law enforcement agencies may find the cluster analysis approach useful
○ To identify and takedown key nodes in sophisticated scam schemes
● The SIM card registration policy is not sufficient in tackling online scams
involving phone numbers
● Future work could involve studying whether certain demographics are more
susceptible to these types of scams we highlighted