Bridging the Knowledge Gap: Improving BERT models for answering MCQs by using Ontology-generated synthetic MCQA Dataset


  • Sahil Sahil Indian Institute of Technology Madras
  • P Sreenivasa Kumar Indian Institute of Technology Madras



Deep Learning, Ontology, NLP, Transformers, BERT, MCQA, Question Answering, Question generation


BERT-based models possess impressive language understanding capabilities but often lack domain-specific knowledge, limiting their performance on specialised tasks such as medical multiple-choice question answering (MCQA). In this paper, we study how biomedical ontologies, rich repositories of medical knowledge, can be harnessed to enhance BERT-based models for medical MCQA task. Our contributions include OntoMCQA-Gen, a system which leverages different biomedical ontologies to construct BioOntoMCQA, a large synthetic MCQA dataset. OntoMCQA-Gen exploits the subclass-class relationships, definitions of concepts, and also synonym relationships from the ontologies to create this dataset of MCQs automatically. We then use this synthetic dataset to fine-tune various BERT-based models to answer medical MCQs. We evaluated these fine-tuned BERT models on the challenging MedMCQA and MedQA datasets of questions from admission examinations for medical degrees in India and USA, respectively. Our evaluation study on these datasets shows that fine-tuning the BERT-based models on BioOntoMCQA results in significantly improved accuracy scores. BioBERT and PubMedBERT, pretrained on the large medical corpus, have also shown significant improvements with our technique of fine-tuning ontology-generated synthetic data. This finding highlights the effectiveness of incorporating biomedi- cal ontologies to enhance the BERT-based model in the medical domain. Moreover, our results underscore the importance of using ontology-generated data along with model adaptation for specialised domains, contributing to a novel advancement in natural language processing.




How to Cite

Sahil, & Kumar, P. S. (2024). Bridging the Knowledge Gap: Improving BERT models for answering MCQs by using Ontology-generated synthetic MCQA Dataset. The International FLAIRS Conference Proceedings, 37(1).