BigDat 2023 Summer
7th International School
on Big Data
Las Palmas de Gran Canaria, Spain · July 17 - 21, 2023
Registration
Downloads
  • Call Bigdat 2023 Summer
  • Poster BigDat 2023 Summer
  • Lecture Materials
  • Home
  • Schedule
  • Lecturers
  • Sponsoring
  • News
  • Info
    • Accommodation
    • Restaurants
    • ULPGC staff and students
    • Visa
    • Code of conduct
  • Home
  • Schedule
  • Lecturers
  • Sponsoring
  • News
  • Info
    • Accommodation
    • Restaurants
    • ULPGC staff and students
    • Visa
    • Code of conduct
gianluca-bontempi

Gianluca Bontempi

Université Libre de Bruxelles

[intermediate/advanced] Big Data Analytics in Fraud Detection and Churn Prevention: from Prediction to Causal Inference

Summary

Designing machine learning algorithms for real big data raises a number of research challenges which deserve a specific attention not only from an applied perspective. This lecture will focus on two real business analytics cases to illustrate a number of recent research contributions of my group.

Credit-card fraud detection: the design of efficient fraud detection algorithms is key for reducing billions of dollars of yearly losses due to fraudulent credit card transactions. More and more algorithms rely on advanced machine learning techniques to assist fraud investigators. The design of fraud detection algorithms is however particularly challenging due to non stationary distribution of the data, highly imbalanced classes distributions and continuous streams of transactions. At the same time, public data are scarcely available for confidentiality issues, leaving unanswered many questions about which is the best strategy to deal with them. In this talk we will discuss a number of lessons learned during our long-standing collaboration with the R&D team of Worldline. In particular, we will focus on best practices for the assessment of credit card fraud detection models and we will discuss the impact of data unbalancedness and non-stationarity on the resulting accuracy. More recent directions of research, including big data infrastructure, active and transfer learning, will be sketched as well.

Churn detection: this is an important issue for telecommunication companies evolving in a highly competitive market where attracting new customers is much more expensive than retaining existing ones. Retention campaigns can be used to prevent customer churn, but their effectiveness depends on the availability of accurate prediction models. Churn prediction shares a number of issues with fraud detection notably in terms of the large amount of data, non-linearity, imbalance and low separability between the classes of churners and non-churners. However, the design of retention campaigns raises a number of research issues which go beyond predictive aspects and concern causal inference, notably uplift and counterfactuals. The uplift measures the causal effect of some action, or treatment, on the outcome of an individual. Counterfactual reasoning is crucial in retention campaign designs since customers could be stratified according to four counterfactual behaviours: (i) Sure thing: customer not churning regardless of the action. (ii) Persuadable: customer churning only if not contacted. (iii) Do-not-disturb: customer churning only if contacted. (iv) Lost cause: customer churning regardless of the action. The last part of the lecture will present recently published results about the bounds on the probability of counterfactuals and their assessment on a large real-world customer data set provided by Orange Belgium.

Syllabus

Slot 1:

  • Introduction to fraud detection systems
  • Introduction to churn detection
  • Formalisation of the detection tasks in terms of machine learning: unsupervised vs supervised classification
  • From prediction to causal inference in big data

Slot 2: Research challenges in fraud detection:

  • The unbalancedness issue
  • Nonstationarity
  • Transfer learning
  • Scalable computing
  • Reproducibility of results

Slot 3: Research challenges in churn detection:

  • Uplift modeling
  • Counterfactuals
  • Theoretical results

References

Dal Pozzolo, Andrea; Caelen, Olivier; Johnson, Reid A.; Bontempi, Gianluca. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015.

Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41, 10, 4915-4928, 2014.

Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems, 29, 8, 3784-3797, IEEE, 2018.

Dal Pozzolo, Andrea. Adaptive machine learning for credit card fraud detection. ULB MLG PhD thesis (supervised by G. Bontempi).

Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark. Information Fusion, 41, 182-194, 2018.

Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. International Journal of Data Science and Analytics, 5, 4, 285-300, 2018.

Lebichot, Bertrand; Le Borgne, Yann-Aël; He, Liyun; Oblé, Frederic; Bontempi, Gianluca. Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection. INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, 78-88, 2019.

Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Oblé, Frederic; Bontempi, Gianluca. Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection. Information Sciences, 2019.

Le Borgne, Yann-A; Bontempi, Gianluca. Reproducible machine learning for credit card fraud detection – practical handbook, https://fraud-detection-handbook.github.io/.

Verhelst, Théo; Caelen, Olivier; Dewitte, Jean-Christophe; Lebichot, Bertrand; Bontempi, Gianluca. Understanding telecom customer churn with machine learning: from prediction to causal inference. Artificial Intelligence and Machine Learning: 31st Benelux AI Conference, BNAIC 2019, and 28th Belgian-Dutch Machine Learning Conference, BENELEARN 2019, Brussels, Belgium, Springer International Publishing.

Verhelst, Théo; Shrestha, Jeevan; Mercier, Denis; Dewitte, Jean-Christophe; Bontempi, Gianluca. Predicting Reach To Find Persuadable Customers: Improving Uplift Models for Churn Prevention. Discovery Science: 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, 44-54, 2021, Springer International Publishing.

Verhelst, Théo; Mercier, Denis; Shrestha, Jeevan; Bontempi, Gianluca. Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment. arXiv preprint arXiv:2211.07264,2022. To appear in Machine Learning Journal.

Gianluca Bontempi. Statistical foundations of machine learning: the book. https://leanpub.com/statisticalfoundationsofmachinelearning

Pre-requisites

Basic knowledge of machine learning and classification.

Short bio

Gianluca Bontempi is Full Professor in the Computer Science Department at the Université Libre de Bruxelles (ULB), Brussels, Belgium, co-head of the ULB Machine Learning Group (mlg.ulb.ac.be). He has been Director of (IB)2, the ULB/VUB Interuniversity Institute of Bioinformatics in Brussels (ibsquare.be) in 2013-17. His main research interests are big data mining, machine learning, bioinformatics, causal inference, predictive modeling and their application to complex tasks in engineering (time series forecasting, fraud detection) and life science (network inference, gene signature extraction). He was Marie Curie fellow researcher, he was awarded in two international data analysis competitions and he took part to many research projects in collaboration with universities and private companies all over Europe. He is author of more than 250 scientific publications and his H-number is 64. He is associate editor of the International Journal of Forecasting and IEEE Senior Member. He was Belgian (French Community) national contact point of the CLAIRE network and co-leader of the CLAIRE COVID19 Task Force. He is also co-author of several open-source software packages for bioinformatics, data mining and prediction.

Other Courses

Sander KlousSander Klous
paolo-addessoPaolo Addesso
Marcelo BertalmíoMarcelo Bertalmío
Altan ÇakırAltan Çakır
Ian FiskIan Fisk
ravi-kumarRavi Kumar
wladek-minorWladek Minor
José M.F. Moura 2José M.F. Moura
panos-pardalosPanos Pardalos
ramesh-sardaRamesh Sharda
steven-skienaSteven Skiena
Mayte Suarez-FarinasMayte Suarez-Farinas
Ana TrisovicAna Trisovic
sebastian-venturaSebastián Ventura

BigDat 2023 Winter

CO-ORGANIZERS

Universidad de Las Palmas de Gran Canaria

Universitat Rovira i Virgili, Tarragona

Institute for Research Development, Training and Advice – IRDTA, Brussels/London

Active links
  • DeepLearn 2023 Summer – 10th International Gran Canaria School on Deep Learning
Past links
  • BigDat 2020
  • BigDat 2019
  • BigDat 2018
  • BigDat 2017
  • BigDat 2016
  • BigDat 2015
© IRDTA 2023. All Rights Reserved.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSIDsessionThis cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_91 minuteThis cookie is set by Google and is used to distinguish users.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
Powered by CookieYes Logo