Example-Dependent Cost-Sensitive Credit Card Fraud Detection March 21 st , 2014 Alejandro Correa Bahnsen with Djamila Aouada, SnT Björn Ottersten, SnT
Jun 08, 2015
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
March 21st, 2014
Alejandro Correa Bahnsen
with
Djamila Aouada, SnT Björn Ottersten, SnT
Introduction
€ 500
€ 600
€ 700
€ 800
2007 2008 2009 2010 2011E 2012E
Europe fraud evolution Internet transactions (millions of euros)
2
Introduction
$-
$1.0
$2.0
$3.0
$4.0
$5.0
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
US fraud evolution Online revenue lost due to fraud (Billions of dollars)
3
• Increasing fraud levels around the world
• Different technologies and legal requirements makes it harder to control
• Lack of collaboration between academia and practitioners, leading to solutions that fail to incorporate practical issues of credit card fraud detection:
• Financial comparison measures
• Huge class imbalance
• Response time measure in milliseconds
Introduction
4
• Introduction
• Database
• Evaluation
• Bayes Minimum Risk
• Experiments
• Probability Calibration
• Other applications
• Conclusions & Future Work
Agenda
5
Simplify transaction flow
Fraud??
Network
6
Data
• Larger European card processing company
• Jan2012 – Jun2013 card present transactions
• 1,638,772 Transactions • 3,444 Frauds • 0.21% Fraud rate
• 205,542 EUR lost due to fraud
on test dataset
Jun13
May13
Apr13
Mar13
Feb13
Jan13
…
…
…
Mar12
Feb12
Jan12
Test
Train
7
• Raw attributes
• Other attributes:
Age, country of residence, postal code, type of card
Data
TRXID Client ID Date Amount Location Type Merchant
Group Fraud
1 1 2/1/12 6:00 580 Ger Internet Airlines No
2 1 2/1/12 6:15 120 Eng Present Car Rent No
3 2 2/1/12 8:20 12 Bel Present Hotel Yes
4 1 3/1/12 4:15 60 Esp ATM ATM No
5 2 3/1/12 9:18 8 Fra Present Retail No
6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes
8
• Derived attributes
Data
Trx ID
Client ID
Date Amount Location Type Merchant
Group Fraud
No. of Trx – same client – last 6 hour
Sum – same client – last 7 days
1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0
2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580
3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0
4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700
5 2 3/1/12 9:18 8 Fra Present Retail No 0 12
6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760
By Group Last Function
Client None hour Count
Credit Card Transaction Type day Sum(Amount)
Merchant week Avg(Amount)
Merchant Category month
Merchant Country 3 months
– Combination of following criteria:
9
Date of transaction 04/03/2012 - 03:14 07/03/2012 - 00:47 07/03/2012 - 02:57 08/03/2012 - 02:08 14/03/2012 - 22:15 25/03/2012 - 05:03 26/03/2012 - 21:51 28/03/2012 - 03:41
𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛 =1
𝑛 𝑡
𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑀𝑒𝑎𝑛 = tan _2−1 sin(𝑡) , cos(𝑡)
𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑆𝑡𝑑 = 𝑙𝑛 11
𝑛 sin 𝑡
2
+1
𝑛 cos 𝑡
2
𝑡 ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝑘 ≈ 1 𝑠𝑡𝑑
𝑃 −𝑧𝑡 < 𝑡 < 𝑧𝑡 = 0.95
-1
-1
24h
6h
12h
18h
Data
10
Date of transaction 04/03/2012 - 03:14 07/03/2012 - 00:47 07/03/2012 - 02:57 08/03/2012 - 02:08 14/03/2012 - 22:15 25/03/2012 - 05:03 26/03/2012 - 21:51 28/03/2012 - 03:41
-1
-1
24h
6h
12h
18h
02/04/2012 - 02:02
03/04/2012 - 12:10
new features
Inside CI(0.95) last 30 days
Inside CI(0.95) last 7 days
Inside CI(0.5) last 30 days
Inside CI(0.5) last 7 days
Data
11
• Misclassification = 1−𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
• Recall =𝑇𝑃
𝑇𝑃+𝐹𝑁
• Precision =𝑇𝑃
𝑇𝑃+𝐹𝑃
• F-Score = 2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
Evaluation
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1) Legitimate
(𝑦𝑖=0)
Predicted class (𝑝𝑖)
Fraud (𝑝𝑖=1) TP FP
Legitimate (𝑝𝑖=0) FN TN
• Confusion matrix
12
• Motivation:
• Equal misclassification results
• Frauds carry different cost
Evaluation - Financial measure
TRXID
Amount Fraud
1 580 No
2 120 No
3 12 Yes
4 60 No
5 8 No
6 1210 Yes
Miss-Class 2 / 6
Cost 1222
Prediction (Fraud?)
No
No
Yes
No
Yes
No
2 / 6
1212
Prediction (Fraud?)
No
No
No
No
Yes
Yes
2 / 6
14
Prediction (Fraud?)
No
No
No
No
No
No
Algorithm 1 Algorithm 3 Algorithm 2
13
• Cost matrix
where:
Evaluation - Financial measure
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)
Predicted class (𝑝𝑖)
Fraud (𝑐𝑖=1) 𝐶𝑇𝑃𝑖= Ca 𝐶𝐹𝑃𝑖= Ca
Legitimate (𝑐𝑖=0) 𝐶𝐹𝑁𝑖 = Amt(i) 𝐶𝑇𝑁𝑖 = 0
Ca Administrative costs
Amt Amount of transaction i
𝐶𝑜𝑠𝑡 = 𝑦𝑖 𝑐𝑖𝐶𝑎 + 1 − 𝑐𝑖 𝐴𝑚𝑡𝑖 + (1 − 𝑦𝑖)𝑐𝑖𝐶𝑎
𝑚
𝑖=1
14
• Evaluation measure
• Introduction
• Database
• Evaluation
• Bayes Minimum Risk
• Experiments
• Probability Calibration
• Other applications
• Conclusions & Future Work
Agenda
15
• Decision model based on quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions
• Risk of classification
Bayes Minimum Risk
16
• If then
𝑡𝐵𝑀𝑅𝑖 =𝐶𝑎𝐴𝑚𝑡𝑖
Bayes Minimum Risk
17
• Example-dependent threshold
• Estimation of the fraud probabilities using
• Decision Trees
• Logistic Regression
• Random Forest
• Datasets
Experiments
Database Transactions Frauds Losses
Total 1,638,772 0.21% 860,448
Train 815,368 0.21% 416,369
Validation 412,137 0.22% 238,537
Test 411,267 0.21% 205,542
18
Experiments
0.00
0.05
0.10
0.15
0.20
0.25
0
50,000
100,000
150,000
200,000
250,000
No Model RF DT LR
Co
st (E
uro
s)
Cost F1-Score
19
Experiments
0.00
0.05
0.10
0.15
0.20
0.25
-
50,000
100,000
150,000
200,000
250,000
No Model RF BMR DT BMR LR BMR
RF DT LR
Co
st (E
uro
s)
Cost F1-Score
20
• Cost its reduced when using BRM
• The F1-Score is also reduced
Probability Calibration
• When using the output of a binary classier as a basis for decision making, there is a need for a probability that not only separates well between positive and negative examples, but that also assesses the real probability of the event
21
Probability Calibration
• Reliability Diagram
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(p
f)
P(pf|x)
base LR RF DT
22
Probability Calibration
• ROC Convex Hull calibration
ROC Curve Class (y) Prob (p)
0 0.0
1 0.1
0 0.2
0 0.3
1 0.4
0 0.5
1 0.6
1 0.7
0 0.8
1 0.9
1 1.0
23
Probability Calibration
• ROC Convex Hull calibration
ROC Convex Hull Curve
Class (y) Prob (p) Cal Prob
0.0 0 0
0.1 1 0.333
0.2 0 0.333
0.3 0 0.333
0.4 1 0.5
0.5 0 0.5
0.6 1 0.666
0.7 1 0.666
0.8 0 0.666
0.9 1 1
1.0 1 1
the calibrated probabilities are extracted by first group the probabilities according to the points in the ROCCH curve, and then make the calibrated probabilities be the slope(T) for each group T.
24
Probability Calibration
• Reliability Diagram
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(p
f)
P(pf|x)
base RF DT LR
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P
(pf)
P(pf|x)
base Cal RF Cal DT Cal LR
25
• Extra 1.5% decrease in cost by using calibrated probabilities
Experiments
0.00
0.05
0.10
0.15
0.20
0.25
-
50,000
100,000
150,000
200,000
250,000
NoModel
RF BMR CALBMR
DT BMR CALBMR
LR BMR CALBMR
RF DT LR
Co
st (E
uro
s)
Cost F1-Score
26
• Introduction
• Database
• Evaluation
• Bayes Minimum Risk
• Experiments
• Probability Calibration
• Other applications
• Conclusions & Future Work
Agenda
27
Other Applications
• Direct Marketing: Banking LTD offers
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
• Credit Scoring: 2011 Kaggle competition Give Me Some Credit
http://www.kaggle.com/c/GiveMeSomeCredit/
• Credit Scoring: 2009 Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD) competition
http://sede.neurotech.com.br:443/PAKDD2009/
28
Other Applications
• Direct Marketing: Banking LTD offers
where int(i) is the expected interest gains of customer i
• Datasets
Cost Matrix True Class (𝑦𝑖)
Accept (𝑦𝑖=1) Decline (𝑦𝑖=0)
Predicted class (𝑝𝑖)
Accept (𝑐𝑖=1) 𝐶𝑇𝑃𝑖= Ca 𝐶𝐹𝑃𝑖= Ca
Decline (𝑐𝑖=0) 𝐶𝐹𝑁𝑖 = Int(i) 𝐶𝑇𝑁𝑖 = 0
29
Database Examples Acceptance Int
Total 47,562 12.56% 394,211
Train 19,119 12.64% 156,676
Validation 11,809 12.78% 97,498
Test 11,815 12.23% 97,594
Other Applications
• Direct Marketing: Banking LTD offers
0.00
0.10
0.20
0.30
0.40
-
4,000
8,000
12,000
16,000
NoModel
RF BMR CALBMR
DT BMR CALBMR
LR BMR CALBMR
RF DT LR
Co
st (E
uro
s)
Cost F1-Score
95,5
94
30
• Extra 13.4% decrease in cost by using calibrated probabilities
Other Applications
• Credit Scoring
where 𝑙𝑔𝑑 is the loss given default, 𝐶𝑙𝑖 is the credit line of client i, 𝑟𝑖 is the expected profit of client i, and 𝐶𝐹𝑃𝑎 is the expected cost of lending the money to an alternative borrower.
• Datasets Kaggle Credit Dataset PAKDD Credit Dataset
Cost Matrix True Class (𝑦𝑖)
Accept (𝑦𝑖=1) Decline (𝑦𝑖=0)
Predicted class (𝑝𝑖)
Accept (𝑐𝑖=1) 𝐶𝑇𝑃𝑖= 0 𝐶𝐹𝑃𝑖 = 𝑟𝑖 +𝐶𝐹𝑃𝑎
Decline (𝑐𝑖=0) 𝐶𝐹𝑁𝑖 = 𝐶𝑙𝑖 ∗ 𝑙𝑔𝑑 𝐶𝑇𝑁𝑖 = 0
31
Database Examples Default
Total 112,915 6.74%
Train 45,358 6.83%
Validation 33,850 6.67%
Test 33,707 6.71%
Database Examples Default
Total 38,969 19.88%
Train 15,614 19.98%
Validation 11,711 20.02%
Test 11,644 19.63%
Other Applications
• Credit Scoring
Kaggle Credit Dataset PAKDD Credit Dataset
0.00
0.10
0.20
0.30
0.40
-
5.00
10.00
15.00
20.00
25.00
RF BMR CAL BMR
Co
st (
Mill
ion
s Eu
ros)
Cost F1-Score
0.00
0.10
0.20
0.30
0.40
-
0.20
0.40
0.60
0.80
1.00
RF BMR CAL BMRC
ost
(M
illio
ns
Euro
s)
Cost F1-Score
32
• Extra 0.9% decrease in cost by using calibrated probabilities
Conclusion
• Selecting models based on traditional statistics does not give the best results in terms of cost
• Models should be evaluated taking into account real financial costs of the application
• Algorithms should be developed to incorporate those real financial costs
• Calibration of probabilities yields to further decrease in cost
33
Future work
• Example Dependent Cost Sensitive Decision Trees
• Example-Dependent Cost-Sensitive Calibration Method
• Applications: • Corporate credit risk • Involuntary & Voluntary Churn in TV subscription
market
34
References
• Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2013). Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk. In International Conference on Machine Learning and Applications. Miami, USA: IEEE.
• Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2014). Improving Credit Card Fraud Detection with Calibrated Probabilities. In SIAM International Conference on Data Mining. Philadelphia, USA: SIAM.
• Correa Bahnsen, A., Aouada, D., & Ottersten, B. (2014). Example-Dependent Cost-Sensitive Credit Scoring using Bayes Minimum Risk. Submitted to ECAI 2014.
35
Contact information
Alejandro Correa Bahnsen
University of Luxembourg
Luxembourg
http://www.linkedin.com/in/albahnsen
http://www.slideshare.net/albahnsen
36