Explainable Neural Text Classifiers
DOI:
https://doi.org/10.32473/flairs.39.1.141996Abstract
Advances in Large Language Models (LLMs) allow us to develop highly accurate neural text classifiers. One of their major disadvantages is their lack of explainability, due to their black box nature. I am looking into neural text classifiers that are explainable, in order to open their black box architecture, at least partially. Explainability can come at the level of the classification model or at the level of the decision made for each new test data. The explanations need to look into what was learnt from the training data (unless there is no training or minimal training) and also in the pre-trained model (LLM) that was used as a basis for the classifier. To explain the individual decisions for each test data, one step is to calculate feature importance with methods such as LIME, SHAP, or Integrated Gradients. More useful full-text explanations can be generated via customized prompting, or via joint learning of classes and explanations during training. I present generated explanations for several tasks: sentiment analysis, emotion detection from text, and legal text entailment. The evaluation of the generated explanations is done via automatic measures, as well as with human judges, in order to see if they find the explanations relevant and useful.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Diana Inkpen

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.