Multimodal Chest Pathology Classification with Language and Image Transformers
DOI:
https://doi.org/10.32473/flairs.39.1.141584Keywords:
Multimodal Learning, Multi-label Classification, Medical Image Analysis, Radiology Report Analysis, Chest Pathology Classification, Clinical Decision Support, Healthcare AIAbstract
This paper presents a multimodal, multi-label framework for automated chest pathology classification that integrates radiology reports, chest X-ray images, and patient demographic data. Using the CheXpert Plus dataset, the approach combines domain-specific language models (BioBERT and ClinicalBERT), a vision transformer (ViT-base), and demographic embeddings within a unified learning framework. Two fusion strategies, a multi-layer perceptron (MLP) and a convolutional neural network (CNN), are evaluated to assess their effectiveness in integrating heterogeneous representations. Experimental results across 14 configurations show that multimodal learning improves performance over single-modality approaches, particularly for clinically ambiguous pathologies such as Lung Opacity and Pleural Effusion. While visually distinct conditions (e.g., Pneumonia and Fracture) are largely driven by image features, textual and demographic information provides complementary context that enhances robustness. The study provides a systematic empirical evaluation of multimodal fusion strategies in a clinically realistic setting, highlighting the benefits and limitations of integrating diverse medical data sources.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Madhukara Kekulandara, Chamudi Abeysiriwardana, Abdeltawab Hendawi

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.