Fraud Detection
Combatting fraud with advanced detection technology
Credit card fraud, and in particular card-not-present fraud, is an ever-growing problem in today’s financial market. There has been a rapid increase in the rate of fraudulent activities in recent years, causing substantial financial loss to many organizations, companies, and government agencies.
This type of fraud usually occurs online or over the phone and can be very hard to detect. Recently, researchers and financial agencies have begun applying machine learning to help detect and prevent fraud.
What is Fraud Detection?
Fraud detection is a particular and common case of anomaly detection, where we identify certain banking transactions as being fraudulent, and depending on the certainty one has of it being a fraudulent transaction, we can either issue an alert for further analysis or block the transaction immediately.
To help them make a decision, fraud analysts look at features such as the geography where the transaction occurred, compared with previous transactions, and the amount used in one or more subsequent transactions.
As you might have guessed by now, these and other more advanced features can also be used as historic data to train a machine-learning model.
Fraud detection with Machine Learning
Supervised learning is one of the easiest and fastest ways to detect fraud using machine learning. With supervised learning we can train a classification algorithm to look into a transaction and to classify it as a fraudulent or a legitimate one.
These are the steps we take to develop a fraud detection model with machine learning:
Large dataset: We start with a large dataset containing labeled transactions. By labeled we mean that these have been previously classified as either legitimate or fraudulent. Fraud represents a minority of all transactions, this results in an unbalanced dataset. Because of this, it’s important to have as much data as possible, so fraud is correctly represented and so the classification models don’t overfit the data.
Data Cleaning: more often than not, the data we receive is not ready to be consumed by a model. During this step we identify problems with the data, and potentially problems with the data collection process itself. We take note of these and we escalate these. This allows the organization to improve their data collection process, consequently their data quality which finally results in better models and predictions.
Feature Selection: during this process we identify which features can or should be used to detect fraud. It’s common to find useless features, or features that represent the same information in different ways. These features are removed as they add complexity to the whole system and can negatively affect the performance of the model.
Feature Engineering: some of the most relevant features for fraud detection are actually not a part of the transaction itself, but are features than can be created from past transactions. Example of some of these features are: “Average transaction size for this person” or “Distance between this last withdrawal and the previous one, divided by the time between them”. These features are often described by people who are very familiar with the domain knowledge.
Feature Preprocessing: depending on the models, different features might have to be preprocessed differently so the model performance is maximized. In some cases, if this is not done, the model might not work at all.
Model Selection: during this stage different models are tested. Often, the models that we can use are dependent not only on their classification performance but on the capability of the transaction software to integrate with external software. For example, IBM doesn’t allow for the use of Neural Networks in their SaferPayments software. However, they allow for decision trees and forests.
Model deployment: The model is deployed, either as a micro-service or as part of a larger software package. As fraud detection often needs to happen close to real-time, computational performance is a priority.
Continuous evaluation: this process starts during the model selection and continues as the model goes into production. Because fraud is an adversarial system, it’s very important to continuously monitor the model’s performance.
In cases where the anomalies are not labeled, however, we can approach the problem with an unsupervised learning technique. Fraud is identified as an anomaly and typically, the same models than can be used to identify anomalies, can be used to identify fraudulent transactions.
Our experience
One of our clients in the banking industry needed to improve on manually defined business rules that were being used to detect card-not-present fraud, i.e., transactions using stolen or cloned debit or credit cards. This type of fraud usually occurs online or over the phone and can be very hard to detect.
It is a problem notoriously hard to model for the following reasons:
It is an adversarial environment, as the fraudsters try to find new ways to behave undetected;
The dataset is highly imbalanced, creating a tendency for a model to develop a strong bias for the majority class, tending to misrepresent a fraudulent transaction as genuine;
The development environment is many times under tight regulation, slowing down development considerably;
The machine learning engineer does not have access to updated data;
Some fraudulent transactions are not classified as such as they could go unseen. this mislabeling can lead to a decrease in the model performance as positive cases would be considered negative during training.
We built two machine learning models, a block model, and an alert model. Each of these was the result of the optimization of a particular metric.
These models helped our client achieve superhuman performance and automate over 80% of the manual analysis, leaving human fraud analysts time for the most delicate cases or where the models were not so certain. You can read more about this use case here.
Want to learn more?
Manuel Levi
Data Strategist, Data Scientist, Co-founder
Manuel Levi has helped various companies in banking and finance implement fraud detection using machine learning and AI.
With his extensive knowledge of machine learning, deep understanding of business processes, and experience in the banking industry, Manuel can expertly identify the unique needs of each client and provide tailored AI solutions to effectively combat fraud.