Less is more: culling the training set to improve robustness of deep neural networks

Yongshuai Liu, Jiyu Chen, and Hao Chen

Deep neural networks are vulnerable to adversarial examples. Prior defenses attempted to make deep networks more robust by either changing the network architecture or augmenting the training set with adversarial examples, but both have inherent limitations. Motivated by recent research that shows that outliers in the training set have a high negative influence on the trained model, we studied the relationship between model robustness and the quality of the training set. We propose two methods for detecting outliers based on canonical examples and on training errors, respectively. After removing the outliers, we trained the classifier with the remaining examples to obtain a sanitized model. We evaluated the sanizied model on MNIST and SVHN and found that it forced the attacker to generate adversarial examples with much higher distortion. More importantly, we examined the Kullback-Leibler divergence from the output of the original model to that of the sanitized model and found that this divergence is much higher for adversarial examples than normal examples. Based on this difference, we could detect adversarial examples with accuracy between 94.67\% to 99.89\%. Our results show that improving the quality of the training set is a promising direction for increasing model robustness.