Choosing the Right Metrics: A Study of Performance Measurement for Binary Classification in Imbalanced and Big Data

Authors

  • Mary Anne Walauskis Florida Atlantic University Department of Electrical Engineering and Computer Science, Boca Raton, USA
  • Taghi M. Khoshgoftaar Florida Atlantic University Department of Electrical Engineering and Computer Science, Boca Raton, USA

DOI:

https://doi.org/10.32473/flairs.38.1.139140

Abstract

There is not a general consensus as to which performance metrics provide more reliable and informative results compared to others. While there are studies which investigate and compare different metrics, they are typically focused on the performance of a classifier, and do not provide a clear understanding as to the specific relationships between metrics, nor their reliability in different settings (such as highly imbalanced datasets). This study examines the underlying relationships among 17 commonly used performance metrics and their suitability for datasets of varying sizes and class distribution levels, using factor analysis to uncover latent factors. We analyzed 23 publicly available datasets from diverse domains, ranging in size from 309 to over five million instances and distribution levels from 0.17% to 44.87%, using two gradient boosting algorithms, LightGBM and XGBoost, and one unsupervised anomaly detection algorithm, Isolation Forest. Factor analysis was used to group the metrics into distinct latent factors, enabling a framework for researchers to select appropriate metrics and avoid redundant or misleading ones based on dataset characteristics.

Author Biography

Taghi M. Khoshgoftaar, Florida Atlantic University Department of Electrical Engineering and Computer Science, Boca Raton, USA

Dr. Taghi M. Khoshgoftaar is Motorola Endowed Chair professor of the Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University and the Director of NSF Big Data Training and Research Laboratory. His research interests are in big data analytics, data mining and machine learning, health informatics and bioinformatics, social network mining, security analytics, fraud detection, and software engineering. He has published more than 900 refereed journal and conference papers in these areas. He is the conference chair of the IEEE International Conference on Machine Learning and Applications (ICMLA 2025).  He is the Co-Editor-in Chief of the journal of Big Data. He has served on organizing and technical program committees of various international conferences, symposia, and workshops.  Also, he has served as North American Editor of the Software Quality Journal and was on the editorial boards of the journals Multimedia Tools and Applications, Knowledge and Information Systems, and Empirical Software Engineering, Software Engineering and Knowledge Engineering, and Social Network Analysis and Mining. 

Downloads

Published

14-05-2025

How to Cite

Walauskis, M. A., & Khoshgoftaar, T. M. (2025). Choosing the Right Metrics: A Study of Performance Measurement for Binary Classification in Imbalanced and Big Data. The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.139140

Issue

Section

Invited Talk Papers/Abstracts