Extending FGSM to Other Norms
The Fast Gradient Sign Method, with perturbations limited by the or the norm.
- FGSM original definition
- FGSM as a maximum allowable attack problem
- FGSM with other norms
FGSM original definition
The Fast Gradient Sign Method (FGSM) by Goodfellow et al. (NIPS 2014) is designed to attack deep neural networks. The idea is to maximize certain loss function subject to an upper bound on the perturbation, for instance: .
Formally, we define FGSM as follows. Given a loss function , the FGSM creates an attack with:
Whereas the gradient is taken with respect to the input not the parameter . Therefore, should be interpreted as the gradient of with respect to evaluated at . It is the gradient of the loss function. And also, because of the bound on the perturbation magnitude, the perturbation direction is the sign of the gradient.
FGSM as a maximum allowable attack problem
Given a loss function , if there is an optimization that will result the FGSM attack, then we can generalize FGSM to a broader class of attacks. To this end, we notice that for general (possibly nonlinear) loss functions, FGSM can be interpreted by applying a first order of approximation:
where . Therefore, finding to maximize is approximately equivalent to finding which maximizes . Hence FGSM is the solution to:
which can be simplifed to:
where we flipped the maximization to minimization by putting a negative sign to the gradient. Here, we simplify this optimization problem's definition to:
Let and , for any and that , we have the following Inequality: .
We consider the Holder's inequality (the negative side), and so, we can show that:
and because we have:
Thus, the lower bound of is attained when . Therefore, the solution to the original FGSM optimization problem is:
And hence the perturbed data is .
FGSM with other norms
Considering FGSM as a maximum allowable attack problem, we can easily generalize the attack to other norms. Consider the norm. In this case, the Holder's inequality equation becomes:
and thus, is minimized when . As a result, the perturbed data becomes: