AUC is a common way of evaluating binary classification problems. Two main reason are:
- It’s insensitive to unbalanced datasets e.g. fraud detection, conversion prediction.
- It does not require the predictions to be thresholded (e.g. assign to one class or the other). Operates directly on classification scores.
An AUC of 0.5 is random performance, while an AUC of 1 is perfect classification. Personally I transform this to GINI = 2AUC – 1. The main reason being, that intuitively for me 0 being random and 1 being perfect makes more sense.
First let us look at a confusion matrix:
Imagine we have a set of scores from a classification model (this could be probabilities, etc.) along with their true labels. The numbers in the confusion matrix assume we have pick a threshold \(t\), where \(>= t\) we assign the positive label and \(< t\) we assign the negative label.
If we have \(M\) unique scores from the classification model, we could have \(M +1 \) possible choices of \(t\). If we plot the True Positive Rate and False Positive Rate over all the ranges of \(t\), we get our Receiver Operating Characteristic (ROC) Curve.
Figure 1 shows an example ROC. The blue line is our performance, the grey dotted line is an example of what we would see from random classification.
The Area Under the receiver operator Curve (AUC) for this example would be the area under the blue line. The area under the grey line would be 0.5, hence why an AUC of 0.5 is random performance and AUC of 1 is perfect classification.
In the top-left of figure 2 we have a hypothetical score distribution of the output of a classifier. The two “clumps” of scores are the negative (blue) and positive (red) examples. You can imagine as we move the threshold line from left to right, we will change the values of the TP and FP and therefore the TPR and FPR.
How much the score distributions of the negatives and positive overlap will determine our AUC. In figure 3 we show some different score distributions, the red distribution is the positive class, the blue the negative.
In the top example, wherever we set the threshold we get a perfect TPR. In this case we would get an AUC of 1.
In the middle example, wherever we set the threshold, our TPR will be the same as our FPR. Hence we will see an AUC of 0.5
In the bottom example, we have some overlap of the score distributions, so our AUC will be somewhere between 0.5 and 1.
The shape of the score distributions and amount of overlap will affect the shape of the ROC Curve and therefore the AUC we end with.
The R pROC library has functions for computing ROC, AUC and plotting.