Why does TM get it wrong?

In this second article on financial crime capability we will consider some of the numerical context behind retail AML transaction monitoring platforms and why they often produce a lot of noise.

Let's imagine you are a retail bank with 1 million customers. And let's say that the behaviour of 1 in every 1000 customers is suspicious. You've worked really hard on tuning your TM system and it 'gets it right' 99 times out of 100.

Let me be specific about exactly what I mean by 'gets it right'. I'm going to use the terms 'good' and 'bad' as a shorthand - what I really mean is 'not suspicious' and 'suspicious' - we almost never have the full story. I mean that when the customer is 'good' then 99 times out of 100 it marks that customer as 'good' and only once does it incorrectly mark a 'good' customer as 'bad'. Similarly when it considers 'bad' customers, it marks 99 of them correctly as 'bad' and only 1 as incorrectly 'good'. A data scientist will say the True Positive Rate is 0.99 and the False Positive Rate is 0.01.

So in our example population of 1,000,000 customers we have:

999,000 - Good,
1,000 - Bad.

Now let's apply our finely tuned TM system.

Of the 1,000 Bad it marks:

990 as Bad (True Positives)
10 as Good (False Negatives)

Of the 999,000 Good it marks

989,010 as Good (True Negatives)
9990 as Bad (False Positives)

So if we consider the alerted population, marked as Bad, we have:

Total alerted population of 990 + 9990 = 10980

Of which only 9% (990) are actually Bad (True Positives).

So even a system that 'gets it right' 99% of the time (compare to human error rate) still produces incorrect alerts 91% of the time!

What really matters is the ratio of the correctness of the machine compared to the ratio of the suspicious actors in the overall population. (This is also true of medical tests for very rare conditions)

If we make a machine that 'gets it right' 99.9% of the time then only 50% of our alerts will be incorrect. Conversely if good KYC/CDD and financial crime procedures drive your suspicious population down to 1 in 10,000 then your alerts will be incorrect 99% of the time. The cleaner your book the harder the job to find the bad apples.

I hope this helps to explain why efficient customer monitoring, whether with traditional rules or modern AI-based techniques, remains a difficult challenge.

Stay tuned for future articles on AI and how we can do better.

Google Sites

Report abuse