The CW Attack Algorithm

Cover image

Brief introduction

The CW attack algorithm is a very typical adversarial attack, which utilizes two separate losses:

  • An adversarial loss to make the generated image actually adversarial, i.e., is capable of fooling image classifiers.
  • An image distance loss to constraint the quality of the adversarial examples so as not to make the perturbation too obvious to the naked eye.

This paradigm makes CW attack and its variants capable of being integrated with many other image quality metrics like the PSNR or the SSIM.

Original CW attack algorithm

When adversarial examples were first discovered in 2013, the optimization problem to craft adversarial examples was formulated as:

minimize: D(x,x+δ)such that: C(x+δ)=tConstraint 1 x+δ[0,1]nConstraint 2\begin{aligned}\text{minimize}:&\ \mathcal{D}(x,x+\delta)\\ \text{such that}:&\ \mathcal{C}(x+\delta)=t&&\text{Constraint 1}\\&\ x+\delta\in[0,1]^n&&\text{Constraint 2}\end{aligned}

Where:

  • xx is the input image, δ\delta is the perturbation, nn is the dimension of the image and tt is the target class.
  • Function D\mathcal{D} serves as the distance metric between the adversarial and the real image, and function C\mathcal{C} is the classifier function.

Traditionally well known ways to solve this optimization problem is to define an objective function and to perform gradient descent on it, which will eventually guide us to an optimal point in the function. However, the formula above is difficult to solve because C(x+δ)=t\mathcal{C}(x+\delta)=t is highly non-linear (the classifier is not a straight forward linear function).

In CW, we express Constraint 1 in a different form as an objective function ff such that when C(x+δ)=t\mathcal{C}(x+\delta)=t is satisfied, f(x+δ)tf(x+\delta) \leq t is also satisfied.

Constructing the function ff

Conceptually, the objective function tells us how close we are getting to being classified as tt. One simple but not a very good choice for function ff is:

f(x+δ)=[1C[x+δ]T]f(x+\delta)=[1-\mathcal{C}[x+\delta]_T]

Where C[x+δ]TC[x+\delta]_T is the probability of x+δx+\delta being classified as tt. If the probability is low, then the value of ff is closer to 1 whereas when it is classified as tt, ff is equal to 0. This is how the objective function works, but clearly we can't use this in real world implementations.

In the original paper, seven different objective functions are assessed, and the best among them is given by:

f(x)=max(max{Z(x)i:it}Z(x)t,k)f(x')=\max(\max\{Z(x')_i:i\neq t\}-Z(x')_t, -k)

Where:

  • Z(x)Z(x') is the logit (the unnormalized raw probability predictions of the model for each class / a vector of probabilities) when the input is an adversarial xx'.
  • maxZ(x)i:it\max{Z(x')_i:i\neq t} is the probability of the target class (which represents how confident the model is on misclassifying the adversarial as the target).
  • So, maxZ(x)i:itZ(x)t\max{Z(x')_i:i\neq t}-Z(x')_t is the difference between what the model thinks the current image most probably is and what we want it to think.

The above term is essentially the difference of two probability values, so when we specify another term k-k and take a max, we are setting a lower limit on the value of loss. Hence, by controlling the parameter k-k we can specify how confident we want our adversarial to be classified as.

Formulating the loss function

We then reformulates the original optimization problem by moving the difficult of the given constraints into the minimization function.

minimize: D(x,x+δ)+cf(x+δ)such that: x+δ[0,1]nConstraint 2\begin{aligned}\text{minimize}:&\ \mathcal{D}(x,x+\delta)+c\cdot f(x+\delta)\\ \text{such that}:&\ x+\delta\in [0,1]^n&&\text{Constraint 2}\end{aligned}

Here we introduce a constant cc to formulate our final loss function, and by doing so we are left with only one of the prior two constraints. Constant cc is best found by doing a binary search, where the most often lower bound is 1×1041\times 10^{-4} and the upper bound is ++\infty.

The best constant

I personally found that the best constant is often found lying between 1 or 2 through my personal experiments.

Solving the last constraint

After formulating our final loss function, we are presented with this final constraint:

x+δ[0,1]nx+\delta\in[0,1]^n

This constraint is expressed in this particular form known as the "box constraint", which means that there is an upper bound and a lower bound set to this constraint. In order to solve this, we will need to apply a method called "change of variable", in which we optimize over ww instead of the original variable δ\delta, where ww is given by:

δ=12(tanh(w)+1)xor,x+δ=12(tanh(w)+1)Constraint 2\begin{aligned}\delta &= \frac{1}{2}\cdot (\tanh(w)+1)-x\\ \text{or,}\quad x+\delta&= \frac{1}{2}\cdot (\tanh(w)+1)&&\text{Constraint 2}\end{aligned}

Where tanh\tanh is the hyperbolic tangent function, so when tanh(W)\tanh(W) varies from -1 to 1, x+δx+\delta varies from 0 to 1.

Therefore, our final optimization problem is:

minimize: D(12(tanh(w)+1),x)+cf(12(tanh(w)+1))such that: tanh(w)[1,1]where: f(x)=max(max{Z(x)i:it}Z(x)t,k)\begin{aligned}\text{minimize}:&\ \mathcal{D}\left(\frac{1}{2}\cdot (\tanh(w)+1), x\right)+c\cdot f\left(\frac{1}{2}\cdot (\tanh(w)+1)\right)\\ \text{such that}:&\ \tanh(w)\in [-1,1]\\ \text{where}:&\ f(x')=\max(\max\{Z(x')_i:i\neq t\}-Z(x')_t, -k)\end{aligned}

The CW attack is the solution to the optimization problem (optimized over ww) given above using Adam optimizer. To avoid gradient descent getting stuck, we use multiple starting point gradient descent in the solver.

References

◀ Perceptual SimilarityMöbius Inversion and Beyond ▶