This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction Problem formulation Learning strategy Experiments Conclusion
Credit Card Fraud Detection andConcept-Drift Adaptation with Delayed
Supervised Information
Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen,Cesare Alippi, and Gianluca Bontempi
15/07/2015
IEEE IJCNN 2015 conference
1/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
INTRODUCTION
Fraud Detection is notably a challenging problem because ofI concept drift (i.e. customers’ habits evolve)I class unbalance (i.e. genuine transactions far outnumber
frauds)I uncertain class labels (i.e. some frauds are not reported or
reported with large delay and few transactions can betimely investigated)
2/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
INTRODUCTION II
Fraud-detection systems (FDSs) differ from a classificationtasks:
I only a small set of supervised samples is provided byhuman investigators (they check few alerts).
I the labels of the majority of transactions are available onlyseveral days later (after customers have reportunauthorized transactions).
3/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
PROBLEM FORMULATION
We formalise FD as a classification problem:
I At day t, classifier Kt−1 (trained on t− 1) associates to eachfeature vector x ∈ Rn, a score PKt−1(+|x).
I The k transactions with largest PKt−1(+|x) define the alertsAt reported to the investigators.
I Investigators provide feedbacks Ft about the alerts in At,defining a set of k supervised couples (x, y)
Ft = {(x, y), x ∈ At}, (1)
Ft are the only immediate supervised samples.
4/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
PROBLEM FORMULATION III At day t, delayed supervised couples Dt−δ are transactions
that have not been checked by investigators, but their labelis assumed to be correct after that δ days have elapsed.
Figure : The supervised samples available at day t include: i)feedbacks of the first δ days and ii) delayed couples occurred beforethe δth day.
5/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
I Ft are a small set of risky transactions according the FDS.I Dt−δ contains all the occurred transactions in a day (≈ 99%
genuine transactions).
Time%
Fraudulent%transac9ons%in%
Genuine%transac9ons%in%Fraudulent%feedback%in%%
Genuine%feedback%in%%
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
FtFt−1Dt−7 Ft−6 Ft−5Ft−4Ft−3 Ft−2Dt−8
Day'1'
Day'2'
Day'3'
FtFt
StSt
Dt−9
Figure : Everyday we have a new set of feedbacks(Ft,Ft−1, . . . ,Ft−(δ−1)) from the first δ days and a new set of delayedtransactions occurred on the δth day (Dt−δ). In this Figure we assumeδ = 7.
6/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
ACCURACY MEASURE FOR A FDS
The goal of a FDS is to return accurate alerts, thus the highestprecision in At. This precision can be measured by the quantity
pk(t) =#{(x, y) ∈ Ft s.t. y = +}
k(2)
where pk(t) is the proportion of frauds in the top k transactionswith the highest likelihood of frauds ([1]).
7/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
LEARNING STRATEGY
Learning from feedbacks Ft is a different problem than learningfrom delayed samples in Dt−δ:
I Ft provides recent, up-to-date, information while Dt−δmight be already obsolete once it comes.
I Percentage of frauds in Ft and Dt−δ is different.I Supervised couples in Ft are not independently drawn, but
are instead selected by Kt−1.I A classifier trained on Ft learns how to label transactions
that are most likely to be fraudulent.
Feedbacks and delayed transactions have to be treatedseparately.
8/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
CONCEPT DRIFT ADAPTATIONTwo conventional solutions for CD adaptation areWt andEt [6, 5]. To learn separately from feedbacks and delayedtransactions we propose Ft,WD
In the 2013 dataset there is an average of 160k transaction perday and about 304 frauds per day, while in the 2014 datasetthere is a daily average of 173k transactions and 380 frauds.
12/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
EXPERIMENTS
Settings:I We assume that after δ = 7 days all the transactions labels
are provided (delayed supervised information)I A budget of k = 100 alerts that can be checked by the
investigators (Ft is trained on a window of 700 feedbacks).I A window of α = 16 days is used to trainWD
t (16 modelsin ED
t )Each experiments is repeated 10 times and the performance isassessed using pk.
13/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
In both 2013 and 2014 datasets, aggregations AWt and AE
toutperforms the other FDSs in terms of pk.
Table : Average pk in all the batches for the sliding window
Introduction Problem formulation Learning strategy Experiments Conclusion
AW
W
(e) Sliding window strate-gies on dataset CD1
AW
W
(f) Sliding window strate-gies on dataset CD2
WAW
(g) Sliding window strate-gies on dataset CD3
AE
E
(h) Ensemble strategies ondataset CD3
Figure : Average pk per day (the higher the better) for classifiers ondatasets with artificial concept drift smoothed using moving averageof 15 days. The vertical bar denotes the date of the concept drift.
18/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
CONCLUDING REMARKS
We notice that:I Ft outperforms classifiers on delayed samples (trained on
obsolete couples).I Ft outperforms classifiers trained on the entire supervised
dataset (dominated by delayed samples).I Aggregation gives larger influence to feedbacks.
19/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
CONCLUSION
I We formalise a real-world FDS framework that meetsrealistic working conditions.
I In a real-world scenario, there is a strong alert-feedbackinteraction that has to be explicitly considered
I Feedbacks and delayed samples should be separatelyhandled when training a FDS
I Aggregating two distinct classifiers is an effective strategyand that it enables a prompter adaptation in conceptdrifting environments
20/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
FUTURE WORK
Future work will focus on:I Adaptive aggregation of Ft and the classifier trained on
delayed samples.I Study the sample selection bias in Ft introduced by
alert-feedback interaction.
21/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
BIBLIOGRAPHY[1] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland.
Data mining for credit card fraud: A comparative study.Decision Support Systems, 50(3):602–613, 2011.
[2] L. Breiman.Random forests.Machine learning, 45(1):5–32, 2001.
[3] C. Chen, A. Liaw, and L. Breiman.Using random forest to learn imbalanced data.University of California, Berkeley, 2004.
[4] M. Friedman.The use of ranks to avoid the assumption of normality implicit in the analysis of variance.Journal of the American Statistical Association, 32(200):675–701, 1937.
[5] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu.Classifying data streams with skewed class distributions and concept drifts.Internet Computing, 12(6):37–49, 2008.
[6] D. K. Tasoulis, N. M. Adams, and D. J. Hand.Unsupervised clustering in streaming data.In ICDM Workshops, pages 638–642, 2006.