ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation


Yuzhe Yang      Guo Zhang      Zhi Xu      Dina Katabi
MIT CSAIL


Abstract


Deep neural networks are vulnerable to adversarial attacks. The literature is rich with algorithms that can easily craft successful adversarial examples. In contrast, the performance of defense techniques still lags behind. This paper proposes ME-Net, a defense method that leverages matrix estimation (ME). In ME-Net, images are preprocessed using two steps: first pixels are randomly dropped from the image; then, the image is reconstructed using ME. We show that this process destroys the adversarial structure of the noise, while re-enforcing the global structure in the original image. Since humans typically rely on such global structures in classifying images, the process makes the network mode compatible with human perception. We conduct comprehensive experiments on prevailing benchmarks such as MNIST, CIFAR-10, SVHN, and Tiny-ImageNet. Comparing ME-Net with state-of-the-art defense mechanisms shows that ME-Net consistently outperforms prior techniques, improving robustness against both black-box and white-box attacks.


Paper


ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation
Yuzhe Yang, Guo Zhang, Zhi Xu, and Dina Katabi
International Conference on Machine Learning (ICML 2019)
[Paper]  •  [arXiv]  •  [Code]  •  [Slides]  •  [Poster]  •  [BibTeX]  


Talk




Representative Results


Figure 1. Visualization of how ME affects the input images.


Figure 2. The approximate rank of different datasets (images are approximately low-rank).


Figure 3. (a) & (b): Quantitative results of the distance within and among classes; (c) & (d): Qualitative results of the class separation.


Figure 4. Adversarial robustness under PGD-based BPDA white-box attacks, with up to 1000 attack steps.



Downloads