Accuracy Is Not Enough

Rethinking Model Selection for Clinical Machine Learning

Authors

  • Moumita Kamal Tennessee Tech University
  • Douglas A. Talbert Tennessee Tech University,
  • Nolan Patterson Tennessee Tech University
  • Nicholas Atkins Tennessee Tech University
  • Celia Hough

DOI:

https://doi.org/10.32473/flairs.39.1.141830

Keywords:

LIME, faithfulness, XGBoost, random forest, Explainability, explainable AI

Abstract

Clinical machine learning models often require more than just high accuracy to gain clinician trust and adoption; they require understandable and stable reasoning. Therefore, selecting competing models based on performance metrics alone may be insufficient. In this work, we introduce the Multidimensional Evaluation of Diagnostic Algorithms and Learning (MEDAL) framework, which supports the incorporation of explanatory analysis into the model selection process. We adapt metrics originally designed for assessing model compression faithfulness, specifically cosine similarity, correlation, and top-k permutation tests, to evaluate the explanatory stability and similarity of candidate models. By applying this framework to a large-scale trauma triage dataset, we evaluated XGBoost and Random Forest architectures. Our results demonstrate that while both architectures exhibit high internal stability under training data perturbations, they rely on different underlying logic to achieve comparable accuracy. This explanatory divergence highlights a critical blind spot in standard evaluation: distinct models may yield identical predictions for different reasons. We propose a two-step selection paradigm that filters models by predictive performance and then differentiates them based on logical alignment with clinical guidelines, ensuring that deployed models are not only accurate but also explanatorily dependable.

Downloads

Published

06-05-2026

How to Cite

Kamal, M., Talbert, D. A., Patterson, N., Atkins, N., & Hough, C. (2026). Accuracy Is Not Enough: Rethinking Model Selection for Clinical Machine Learning. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141830

Issue

Section

Special Track: AI in Healthcare Informatics