In Part 2 of this series we considered how to teach a machine to identify potentially suspicious cash deposits for businesses with differing levels of expected turnover. Our machine learning algorithm calculated a set of acceptable cash deposit thresholds, based on a set of tiered sample cases, and then applied those new rules to previously unseen customers. This was an example of supervised learning - we provided the answer we wanted in response to certain input conditions and taught our machine to reproduce our judgement in similar cases.
Supervised learning is extremely powerful when we have a rich set of worked examples that cover the various scenarios that are likely to arise. However, this approach simply replicates what we already know. If new behaviour arises that the machine has not seen before then it may not identify it as suspicious even if the clues were there. Unsupervised learning takes a different approach, seeking to identify these clues from within the data itself, without being taught what to look for. This method of detection relies on identifying anomalies in the data that make a particular case stand out as worthy of further examination.
Let's consider an example based on transactional behaviour of customers. The diagram below shows the average transaction value versus the number of monthly transactions, for a sample of 200 customers. The customers are drawn from two populations: in purple, a higher net worth population who on average make fewer, larger value, transactions; and, in yellow, a lower net worth population who make more frequent, smaller value, payments. The former population has a greater variation in their spending pattern than the latter.
Taking an unsupervised approach we have no predefined rules or SME-set thresholds, nor do we have any known suspicious cases to work from. We can, however, consider what is normal behaviour for the overall population and what is an atypical transaction pattern.
By eye we can see some outlier customers with unusual numbers of higher value transactions this month - almost none in one case on the far left of the diagram and over 60 in three other cases on the far right.
There are several machine learning algorithms that apply the same approach, mathematically determining how isolated each data point is from the rest of the population. These techniques can also detect groupings or clusters in the data that are likely to be associated with similar types of customers. Applying these techniques to the data above we get the following picture:
The red dots represent customers whose monthly transaction behaviour is flagged as unusual because it occupies a relatively empty space on the chart, that is, it has fewer close neighbours who have the same transaction characteristics. The algorithm also determines that the dense concentration of data points in the lower right hand corner are likely to belong to customers of a similar type. It estimates a decision boundary (the red demarcation line) to assign each data point to one population or the other. We can see that four higher net worth customers (denoted in purple) are grouped inside this perimeter and close examination shows that one of the pair of red dots on the lower right is excluded despite being a member of the lower net worth group.
Overall, however, our machine has done a reasonable job of picking out the edge cases, identifying customers whose behaviour is an outlier compared to the rest of the population. If we wanted to we could tune our machine to lower our suspicion threshold, picking off more customers from around the edges as potentially suspicious:
As we can see, this approach will flag more of our higher net worth population, whose smaller community and more diverse spending patterns mean they appear more unusual across the community as a whole. If, however, we segment our populations and instruct our machine to treat these populations separately we get a more nuanced picture:
As each population is now considered in isolation, more lower net worth individuals are identified as having unusual transaction behaviour relative to their population. At a lower suspicion threshold this disparity is even more stark:
As we have no 'ground truth' data to validate our conclusions we must rely on manual investigation to inform where to set our unusual behaviour threshold.
In this simple example it's been straightforward to appreciate why the algorithm has determined a customer to be an outlier. However, as we consider multiple data points we are looking for isolated customers in a multi-dimensional space - not so easy to visualise! To ensure our detection space isn't too sparsely populated, and genuine outliers truly stand out from the crowd, this approach requires a sufficiently large dataset.
We can use this anomaly-detecting technique to identify behaviour that is unusual with respect to peer customers or to detect behaviour that is atypical over a given time period, perhaps month on month for that specific customer. The ability for a machine to learn the appropriate detection thresholds for each customer basis presents the opportunity for greater precision and less risk of overlooking a specific anomaly.
Combined together unsupervised techniques, supervised learning and traditional SME-led rules provide a powerful suite of detective capabilities with predictive power to forecast how an investigator might adjudicate the level of suspicion associated with a given customer.
[This article is based on a scikit-learn Gaussian Mixture Model using Expectation Maximisation convergence to determine clustering]