Bridging the Knowledge Gap: Improving BERT models for answering MCQs by using Ontology-generated synthetic MCQA Dataset
DOI:
https://doi.org/10.32473/flairs.37.1.135608Keywords:
Deep Learning, Ontology, NLP, Transformers, BERT, MCQA, Question Answering, Question generationAbstract
BERT-based models possess impressive language understanding capabilities but often lack domain-specific knowledge, limiting their performance on specialised tasks such as medical multiple-choice question answering (MCQA). In this paper, we study how biomedical ontologies, rich repositories of medical knowledge, can be harnessed to enhance BERT-based models for medical MCQA task. Our contributions include OntoMCQA-Gen, a system which leverages different biomedical ontologies to construct BioOntoMCQA, a large synthetic MCQA dataset. OntoMCQA-Gen exploits the subclass-class relationships, definitions of concepts, and also synonym relationships from the ontologies to create this dataset of MCQs automatically. We then use this synthetic dataset to fine-tune various BERT-based models to answer medical MCQs. We evaluated these fine-tuned BERT models on the challenging MedMCQA and MedQA datasets of questions from admission examinations for medical degrees in India and USA, respectively. Our evaluation study on these datasets shows that fine-tuning the BERT-based models on BioOntoMCQA results in significantly improved accuracy scores. BioBERT and PubMedBERT, pretrained on the large medical corpus, have also shown significant improvements with our technique of fine-tuning ontology-generated synthetic data. This finding highlights the effectiveness of incorporating biomedi- cal ontologies to enhance the BERT-based model in the medical domain. Moreover, our results underscore the importance of using ontology-generated data along with model adaptation for specialised domains, contributing to a novel advancement in natural language processing.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Sahil Sahil, P Sreenivasa Kumar
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.