KNOW How to Make Up Your Mind! Adversarially Detecting and Remedying Inconsistencies in Natural Language Explanations
Myeongjun Jang‚ Bodhisattwa Prasad Majumder‚ Julian McAuley‚ Thomas Lukasiewicz and Oana−Maria Camburu
Abstract
While recent works have been considerably improving the quality of the natural language explanations (NLEs) generated by a model to justify its predictions, there is very limited research in detecting and remedying inconsistencies among generated NLEs. In this work, we leverage external knowledge bases to significantly improve on an existing adversarial framework for detecting inconsistent NLEs. We apply our framework to high-performing NLE models and show that models with higher NLE quality do not necessarily generate fewer inconsistencies. Moreover, we propose an off-the-shelf remedy that alleviates NLE inconsistency by injecting external background knowledge into the model. Our remedy decreases the inconsistencies of previous high-performing NLE models as detected by our framework.