In our last article we looked at the numerical context behind retail AML transaction monitoring (TM) platforms and why they often produce a lot of noise. In this article we are going to start a journey to understand how AI, or more specifically Machine Learning, can help to reduce the noise and what some of the consequences are for how we think about detecting financial crime risk.
First, let's start on familiar ground and begin with a very simple, rules-based, traditional detection platform. On a periodic basis it considers three pieces of data about each customer's recent behaviour:
Has the customer made cash deposits?
Has the customer made transfers to high risk jurisdictions?
Has the customer made round-dollar transfers?
Based on our expert knowledge we program our simple detection machine with two rules:
Raise an alert if customer has made cash deposits.
Raise an alert if customer has made transfers to high risk jurisdictions AND has made round-dollar transfers
Our traditional rules-based system looks like this:
Now let's imagine we have worked some cases and we have the following table of results. We have labelled each of these cases with a 'Useful Alert?' tag to denote whether it was worthy of investigating in more depth:
Now we could take these examples and teach a machine to predict whether we would find an alert useful, based on the presence or absence of cash deposits, transfers to high risk jurisdictions and round dollar amounts.
Based on the above example a simple classifier would learn a decision tree like this:
From the example above the machine has learnt the two rules we started with. It predicts that when cash deposits are present we will wish to see an alert, and if both transfers to high risk jurisdictions and round dollar amounts are present we will also want to review an alert. The machine doesn't need to be told what the rules should be by an expert, it works them out from the training examples it's been shown.
In the above example we had a rich table of examples of almost every combination of inputs possible. But consider what would happen if we had fewer cases to train our machine. For example if we only had the first four cases:
Retraining our machine it would learn that the only behaviour that matters is transfers to high risk jurisdictions. The presence or absence of these transfers correlates perfectly with the 'Useful Alert' tags. The presence of Cash deposits and Round Dollar amounts is superfluous! The machine would learn:
Future customers with only cash deposits would not be alerted! This illustrates the importance of having sufficient data, covering the full range of possible scenarios, to correctly train our machine to predict what we would like to see. In this simple example it's trivial to visualise what the machine has learnt and decide whether we are happy with it, however in a more realistic scenario this isn't so easy. It is also worth noting that we can't easily overrule the system and introduce an additional rule (like a cash deposit alert) if we don't like what the machine has learned. To get it to perform differently we need to retrain it with better examples.
Now you might be thinking - so what? The machine has simply learnt rules we could have programmed it with in the first place. How does that reduce the noise? In this example we have used very simple binary yes/no inputs (or features in machine learning speak) to teach our machine what to recommend when certain behaviours are present. In the next article we'll see how we can make our machine a lot more discerning by training it on more complex inputs to only produce alerts in more targeted situations.
[This article is based on a scikit-learn DecisionTreeClassifier using the CART algorithm]