Density-Aware Differentially Private Textual Perturbations Using Truncated Gumbel Noise
Deep Neural Networks, despite their success in diverse domains, are provably sensitive to small perturbations which cause the models to return erroneous predictions to minor transformations. Recently, it was proposed that this effect can be addressed in the text domain by optimizing for the worst case loss function over all possible word substitutions within the training examples. However, this approach is prone to weighing semantically unlikely word replacements higher, resulting in accuracy loss. In this paper, we study robustness to adversarial perturbations by using differentially private randomized substitutions while training the model. This approach has two immediate advantages: (1) by ensuring that the word replacement likelihood is weighted by its proximity to the original word in a metric space, we circumvent optimizing for worst case guarantees thereby achieve performance gains; and (2) the calibrated randomness results in training a privacy preserving model, while also guaranteeing robustness against adversarial attacks on the model outputs. Our approach uses a novel density-based differentially private mechanism based on truncated Gumbel noise. This ensures training on substitutions of words in dense and sparse regions of a metric space while maintaining semantic similarity for model robustness. Our experiments on two datasets suggest an improvement of up to 10% on the accuracy metrics.