The Uncertainty of Counterfactuals in Deep Learning
DOI:
https://doi.org/10.32473/flairs.v34i1.128795Abstract
Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterfactuals provide various perturbations to a data instance to yield an alternate classification from a machine learning model. Several algorithms have been designed to generate counterfactuals using deep neural networks; however, despite their growing use in many mission-critical fields, there has been no investigation to date as to the epistemic uncertainty of generated counterfactuals. This could result in the use of risk-prone explanations in these fields. In this work, we use several data sets to compare the epistemic uncertainty of original instances to that of counterfactuals generated from those instances. As part of our analysis, we also measure the extent to which counterfactuals can be considered anomalies in those data sets. We find that counterfactual uncertainty is higher in three of the four datasets tested. Moreover, our experiments suggest a possible connection between reconstruction error using a deep autoencoder and the difference in epistemic uncertainty between training data and counterfactuals generated from that training data for a deep neural network.