A Robust Method to Protect Text Classification Models against Adversarial Attacks

Authors

  • BALA MALLIKARJUNARAO GARLAPATI TATA CONSULTANCY SERVICES
  • Ajeet Kumar Singh
  • Srinivasa Rao Chalamala

DOI:

https://doi.org/10.32473/flairs.v35i.130706

Abstract

Text classification is one of the main tasks in natural language processing. Recently, adversarial attacks have shown a substantial negative impact on neural network-based text classification models. There are few defenses to strengthen model predictions against adversarial attacks; popular among them are adversarial training and spelling correction. While adversarial training adds different synonyms to the training data, spelling correction methods defend against character variations at the word level. The diversity and sparseness of adversarial perturbations of different attack methods challenge these approaches. This paper proposes an approach to correct adversarial samples for text classification tasks. Our proposed approach combines grammar correction and spelling correction methods. In this, we use Gramformer for grammar correction and Textblob for spelling correction. These approaches are generic and can be applied to any text classification model without any retraining. We evaluated our approach with two state-of-the-art attacks, DeepWordBug and TextBugger, on three open-source datasets IMDB, CoLA, and AGNews. The experimental results show that our approach can effectively counter adversarial attacks on text classification models while maintaining classification performance on original clean data.

Downloads

Published

04-05-2022

How to Cite

GARLAPATI, B. M., Singh , A. K., & Chalamala, S. R. (2022). A Robust Method to Protect Text Classification Models against Adversarial Attacks. The International FLAIRS Conference Proceedings, 35. https://doi.org/10.32473/flairs.v35i.130706

Issue

Section

Special Track: Security, Privacy, Trust and Ethics in AI