Area under the curve (AUC or c-statistic) is not paramount.
Shape often matters more.
What is the issue?
It boils down to the clinical use of a particular diagnostic. This is not represented by the area under the curve (AUC), or c-statistic. It is determine by the shape of the curve.
There are other nuances as well.
To begin with, the AUC is only a rough guide of what’s good, and for the most part predominantly useful in comparing curves of the same contour. The bigger the area, for one of a group of curves with the same profile, the better. However, curve shape matters. A lot.
As can be seen in the figure below, a steep shape at the bottom near the y-axis (high-specificity or low false positive fraction) is best for rule in. The patient has the disease. Likewise, a shallow, asymptotic shape at the top, with very high sensitivity, is best for rule out as not having the disease. So, curves that are big at the top, or bottom, are generally more clinically useful than those that are big in the middle. Many (most?) curves are big in the middle.* In contrast, the skewed shapes have a more straightforward clinical value. This can be so, even if the AUC is lower for the skewed curve than a more well rounded shape.
What are the implications?
For one thing, the idea that the optimal cut-off for an ROC curve is at the 45 slope inflection is incongruous. The cut-off should be set clinically, based on the treatment algorithm, the risks of false negatives, and the costs and risks of false positives. In fact, for a curve such as the one below (real, disguised data), two cut offs would be appropriate, and the diagnosis of patients between them indeterminate.
What does this mean practically?
Lately, ROC analysis is being used a lot for biomarker discovery, multi-marker indexes (FDA termed IVD MIAs), and the like. Differences in areas under the curves (help) drive selection of the markers. Often, lists of biomarkers or algorithms are merely ranked by AUCs (c-statistics) as a screen—high good, low bad. Of course, if too many biomarkers are being assayed against too few patient samples, this traditional use of ROCs, breaks down even more (so called false discovery, a topic for another day).
It goes without saying, that before this kind of analysis comparing biomarker performance, one should understand the clinical issues and fit for the diagnostic. With this understanding, trade offs between false positives and negatives, sensitivity versus specificity, or importance of positive or negative predictive value can be reasoned. That shows where on the curve one should be, and what contour can work best. The clinically valuable biomarker, and its ROC, may appear quite middling in the normally used stats, but be the most useful.
So it’s time to slog it out. Look at and compare the shapes. For those that appear useful and similar in profile, the AUC is the right tool, and a great tool, for selecting the best biomarker(s). Use it then.
© 2013 Winton Gibbons