Multimodal Chest Pathology Classification with Language and Image Transformers

Authors

DOI:

https://doi.org/10.32473/flairs.39.1.141584

Keywords:

Multimodal Learning, Multi-label Classification, Medical Image Analysis, Radiology Report Analysis, Chest Pathology Classification, Clinical Decision Support, Healthcare AI

Abstract

This paper presents a multimodal, multi-label framework for automated chest pathology classification that integrates radiology reports, chest X-ray images, and patient demographic data. Using the CheXpert Plus dataset, the approach combines domain-specific language models (BioBERT and ClinicalBERT), a vision transformer (ViT-base), and demographic embeddings within a unified learning framework. Two fusion strategies, a multi-layer perceptron (MLP) and a convolutional neural network (CNN), are evaluated to assess their effectiveness in integrating heterogeneous representations. Experimental results across 14 configurations show that multimodal learning improves performance over single-modality approaches, particularly for clinically ambiguous pathologies such as Lung Opacity and Pleural Effusion. While visually distinct conditions (e.g., Pneumonia and Fracture) are largely driven by image features, textual and demographic information provides complementary context that enhances robustness. The study provides a systematic empirical evaluation of multimodal fusion strategies in a clinically realistic setting, highlighting the benefits and limitations of integrating diverse medical data sources.

Downloads

Published

06-05-2026

How to Cite

Kekulandara, M., Abeysiriwardana, C., & Hendawi, A. (2026). Multimodal Chest Pathology Classification with Language and Image Transformers. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141584

Issue

Section

Special Track: AI in Healthcare Informatics