The curious case of the test set AUROC

Submitted by Paula Smith on Thu, 04/04/2024 - 15:14

The area under the receiver operating characteristic curve (AUROC) is a staple within machine learning for reporting model performance and assessing model generalisability. However, CMIH researchers demonstrate that reporting the AUROC alone for a test set masks not only domain shift between validation and test data but also obfuscates model instability and gives optimistic performance estimates. The researchers highlight the utility of the test AUROC for understanding model concordance and propose several complementary scores, which disentangle the effects of domain shift and model instability.

Read more about this work in Nature Machine Intelligence.

Roberts, M., Hazan, A., Dittmer, S. et al. The curious case of the test set AUROC. Nat Mach Intell (2024). https://doi.org/10.1038/s42256-024-00817-7

Funded by

Study at Cambridge

About the University

Research at Cambridge