Explainable Neural Text Classifiers

Authors

  • Diana Inkpen

DOI:

https://doi.org/10.32473/flairs.39.1.141996

Abstract

Advances in Large Language Models (LLMs) allow us to develop highly accurate neural text classifiers. One of their major disadvantages is their lack of explainability, due to their black box nature. I am looking into neural text classifiers that are explainable, in order to open their black box architecture, at least partially. Explainability can come at the level of the classification model or at the level of the decision made for each new test data. The explanations need to look into what was learnt from the training data (unless there is no training or minimal training) and also in the pre-trained model (LLM) that was used as a basis for the classifier. To explain the individual decisions for each test data, one step is to calculate feature importance with methods such as LIME, SHAP, or Integrated Gradients. More useful full-text explanations can be generated via customized prompting, or via joint learning of classes and explanations during training. I present generated explanations for several tasks: sentiment analysis, emotion detection from text, and legal text entailment. The evaluation of the generated explanations is done via automatic  measures, as well as with human judges, in order to see if they find the explanations relevant and useful.

Downloads

Published

06-05-2026

How to Cite

Inkpen, D. (2026). Explainable Neural Text Classifiers. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141996

Issue

Section

Main Conference Invited Talks