
Gianluca Bontempi
[intermediate/advanced] Big Data Analytics in Fraud Detection and Churn Prevention: from Prediction to Causal Inference
Summary
Designing machine learning algorithms for real big data raises a number of research challenges which deserve a specific attention not only from an applied perspective. This lecture will focus on two real business analytics cases to illustrate a number of recent research contributions of my group.
Credit-card fraud detection: the design of efficient fraud detection algorithms is key for reducing billions of dollars of yearly losses due to fraudulent credit card transactions. More and more algorithms rely on advanced machine learning techniques to assist fraud investigators. The design of fraud detection algorithms is however particularly challenging due to non stationary distribution of the data, highly imbalanced classes distributions and continuous streams of transactions. At the same time, public data are scarcely available for confidentiality issues, leaving unanswered many questions about which is the best strategy to deal with them. In this talk we will discuss a number of lessons learned during our long-standing collaboration with the R&D team of Worldline. In particular, we will focus on best practices for the assessment of credit card fraud detection models and we will discuss the impact of data unbalancedness and non-stationarity on the resulting accuracy. More recent directions of research, including big data infrastructure, active and transfer learning, will be sketched as well.
Churn detection: this is an important issue for telecommunication companies evolving in a highly competitive market where attracting new customers is much more expensive than retaining existing ones. Retention campaigns can be used to prevent customer churn, but their effectiveness depends on the availability of accurate prediction models. Churn prediction shares a number of issues with fraud detection notably in terms of the large amount of data, non-linearity, imbalance and low separability between the classes of churners and non-churners. However, the design of retention campaigns raises a number of research issues which go beyond predictive aspects and concern causal inference, notably uplift and counterfactuals. The uplift measures the causal effect of some action, or treatment, on the outcome of an individual. Counterfactual reasoning is crucial in retention campaign designs since customers could be stratified according to four counterfactual behaviours: (i) Sure thing: customer not churning regardless of the action. (ii) Persuadable: customer churning only if not contacted. (iii) Do-not-disturb: customer churning only if contacted. (iv) Lost cause: customer churning regardless of the action. The last part of the lecture will present recently published results about the bounds on the probability of counterfactuals and their assessment on a large real-world customer data set provided by Orange Belgium.
Syllabus
Slot 1:
- Introduction to fraud detection systems
- Introduction to churn detection
- Formalisation of the detection tasks in terms of machine learning: unsupervised vs supervised classification
- From prediction to causal inference in big data
Slot 2: Research challenges in fraud detection:
- The unbalancedness issue
- Nonstationarity
- Transfer learning
- Scalable computing
- Reproducibility of results
Slot 3: Research challenges in churn detection:
- Uplift modeling
- Counterfactuals
- Theoretical results
References
Dal Pozzolo, Andrea; Caelen, Olivier; Johnson, Reid A.; Bontempi, Gianluca. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015.
Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41, 10, 4915-4928, 2014.
Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems, 29, 8, 3784-3797, IEEE, 2018.
Dal Pozzolo, Andrea. Adaptive machine learning for credit card fraud detection. ULB MLG PhD thesis (supervised by G. Bontempi).
Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark. Information Fusion, 41, 182-194, 2018.
Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. International Journal of Data Science and Analytics, 5, 4, 285-300, 2018.
Lebichot, Bertrand; Le Borgne, Yann-Aël; He, Liyun; Oblé, Frederic; Bontempi, Gianluca. Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection. INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, 78-88, 2019.
Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Oblé, Frederic; Bontempi, Gianluca. Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection. Information Sciences, 2019.
Le Borgne, Yann-A; Bontempi, Gianluca. Reproducible machine learning for credit card fraud detection – practical handbook, https://fraud-detection-handbook.github.io/.
Verhelst, Théo; Caelen, Olivier; Dewitte, Jean-Christophe; Lebichot, Bertrand; Bontempi, Gianluca. Understanding telecom customer churn with machine learning: from prediction to causal inference. Artificial Intelligence and Machine Learning: 31st Benelux AI Conference, BNAIC 2019, and 28th Belgian-Dutch Machine Learning Conference, BENELEARN 2019, Brussels, Belgium, Springer International Publishing.
Verhelst, Théo; Shrestha, Jeevan; Mercier, Denis; Dewitte, Jean-Christophe; Bontempi, Gianluca. Predicting Reach To Find Persuadable Customers: Improving Uplift Models for Churn Prevention. Discovery Science: 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, 44-54, 2021, Springer International Publishing.
Verhelst, Théo; Mercier, Denis; Shrestha, Jeevan; Bontempi, Gianluca. Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment. arXiv preprint arXiv:2211.07264,2022. To appear in Machine Learning Journal.
Gianluca Bontempi. Statistical foundations of machine learning: the book. https://leanpub.com/statisticalfoundationsofmachinelearning
Pre-requisites
Basic knowledge of machine learning and classification.
Short bio
Gianluca Bontempi is Full Professor in the Computer Science Department at the Université Libre de Bruxelles (ULB), Brussels, Belgium, co-head of the ULB Machine Learning Group (mlg.ulb.ac.be). He has been Director of (IB)2, the ULB/VUB Interuniversity Institute of Bioinformatics in Brussels (ibsquare.be) in 2013-17. His main research interests are big data mining, machine learning, bioinformatics, causal inference, predictive modeling and their application to complex tasks in engineering (time series forecasting, fraud detection) and life science (network inference, gene signature extraction). He was Marie Curie fellow researcher, he was awarded in two international data analysis competitions and he took part to many research projects in collaboration with universities and private companies all over Europe. He is author of more than 250 scientific publications and his H-number is 64. He is associate editor of the International Journal of Forecasting and IEEE Senior Member. He was Belgian (French Community) national contact point of the CLAIRE network and co-leader of the CLAIRE COVID19 Task Force. He is also co-author of several open-source software packages for bioinformatics, data mining and prediction.