Description
Adversarial attacks are maliciously crafted perturbations, whether perceptible or not, applied
to a benign image to force a classifier to misclassify it. Most existing defenses in the literature
either address imperceptible or visible and localized adversarial attacks, but rarely both.
Instead, we propose a universal defense; a framework to harden models against both kinds of
attacks. We define our solution in terms of a generalized specification, making it flexible and
extendable. We then demonstrate how state-of-the-art reconstruction-based approaches can
be reformulated as a realization of our framework. We also propose our own implementation,
test it on an image classification task against various adversarial attacks, and identify possible
improvement areas.
|