03/05/2021

Provably robust classification of adversarial examples with detection

Fatemeh Sheikholeslami, Ali Lotfi, Zico Kolter

Keywords: Adversarial robustness, robust deep learning

Abstract: Adversarial attacks against deep networks can be defended against either by building robust classifiers or, by creating classifiers that can \emph{detect} the presence of adversarial perturbations. Although it may intuitively seem easier to simply detect attacks rather than build a robust classifier, this has not bourne out in practice even empirically, as most detection methods have subsequently been broken by adaptive attacks, thus necessitating \emph{verifiable} performance for detection mechanisms. In this paper, we propose a new method for jointly training a provably robust classifier and detector. Specifically, we show that by introducing an additional "abstain/detection" into a classifier, we can modify existing certified defense mechanisms to allow the classifier to either robustly classify \emph{or} detect adversarial attacks. We extend the common interval bound propagation (IBP) method for certified robustness under $\ell_\infty$ perturbations to account for our new robust objective, and show that the method outperforms traditional IBP used in isolation, especially for large perturbation sizes. Specifically, tests on MNIST and CIFAR-10 datasets exhibit promising results, for example with provable robust error less than $63.63\%$ and $67.92\%$, for $55.6\%$ and $66.37\%$ natural error, for $\epsilon=8/255$ and $16/255$ on the CIFAR-10 dataset, respectively.

 0
 1
 1
 0
This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers