Dealing with Adversarial Inputs for Image Classification

Recently, I came across an article, Denoised Smoothing: A Provable Defense for Pretrained Classifiers, that suggested a method to deal with adversarial examples. The method consists of creating multiple noisy copies of an input image; these are then filtered using a custom-trained denoiser and fed to a trained classification model. The class label assigned to the input image is then taken as the label present in majority among the original input and its copies. The figure shown below from the referenced paper captures the essence of this approach to deal with adversarial inputs.

Motivated by the above work, I decided to experiment with a simpler approach using the smoothing idea to see how well the simpler approach will deal with adversarial inputs to trained models such as Resnet. Before I delve into details, let me go over the basic terminology related with adversarial inputs.

The adversarial inputs are purposely manipulated inputs to make a trained model predict an incorrect prediction. The manipulation of input is known as the adversarial attack and there are two ways to create such attacks. The Black Box attacks utilize adversarial inputs that are generated without using any information about the trained classification model. The White Box attacks, on the other hand, are attacks using the adversarial inputs created with access to the parameters and gradients of the trained network. There is another distinction that we can make when talking of adversarial attacks. As an adversarial attacker, you might be simply interested in causing the trained model to give incorrect predictions; you really do not care what those incorrect predictions are. Such attacks are known as untargeted attacks. On the other hand, you might be interested in perturbing the input in such a way that the model makes an incorrect prediction of a specific targeted class; such attacks are called targeted attacks.

Let us take a look at an example of adversarial attack. We will use the ResNet50 trained model and a picture of a convertible to demonstrate the attack. First let us get an example image for illustration.

from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
pil2torch = transforms.ToTensor()
img = Image.open("Convertible.jpg")
img_t = pil2torch(img)
plt.imshow(img_t.numpy().transpose(1,2,0))

Before feeding the convertible image, we will normalize it and size it to 224×224 pixels as required by the trained model.

transform = transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)])

Next, we load the pretrained ResNet50 model. The model outputs a 1000-dimensional vector of class scores of 1000 Imagenet classes. By looking at the index of the highest score in the 1000-dimensional score vector, we can get the class label and the corresponding confidence value. By normalizing the confidence values via the softmax function, we can also generate the class probabilities, if desired.

model = models.resnet50(pretrained=True)
model.eval()
import json
with open("imagenet_class_index.json") as f:
imagenet_classes = {int(i):x[1] for i,x in json.load(f).items()}
def show_class_and_confidence(img_n):
confidences = model(img_n.unsqueeze(0))
class_idx = torch.argmax(confidences, dim=1).item()
print (imagenet_classes[class_idx], ' | ', confidences[0,    class_idx].detach().item())

Now, let’s input the convertible image to the trained model and look at the prediction.

import torch
show_class_and_confidence(transform(img_t))
convertible  |  19.31500816345215

Vow! The ResNet50 is able to correctly predict the input image that of a convertible. Now, let’s corrupt the image by adding noise. We will use the function torch.randn_like(img_t) which returns a tensor of the same size as of img_t. The entries in this tensor come from a normal distribution of zero mean and unit variance. The multiplier 0.125 in the expression below controls the amount of the noise being added. Furthermore, we clamp the resulting noisy tensor image to lie between 0-1.

img_n = torch.clamp((img_t +0.125*(torch.randn_like(img_t))),min=0.0, max=1.0)

With the noise added, the freight_car image looks as shown below; not much different from the clean image as the noise level is low.

Feeding the noisy image, the ResNet50 still recognizes is correctly but with a slightly smaller confidence. This should not be surprising as the noise amount is small.

convertible  |  19.089618682861328

Let’s increase the amount of noise being added by changing the noise multiplier to 0.25. It is still recognized as convertible but the confidence has gone down further.

convertible  |  15.005626678466797

Increasing the noise multiplier to 0.375, the model incorrectly recognizes it as a golfcart.

golfcart  |  11.107773780822754

Repeating this step, the model is consistently wrong in its predictions although the noisy image, shown below, at this level of noise is clearly recognizable as a convertible to us.

As this example demonstrates, it is easy to make trained deep learning models give incorrect predictions by adding noise to input images. So the question is how do we defend the predictive models against such attacks.

## Smoothing to the Rescue

It is a common practice in image processing to apply averaging or smoothing to suppress noise in images. The simple averaging filter of size 3×3 for a single channel image consists of a 3×3 kernel whose elements are all equal to 1/9. More commonly preferred smoothing filter uses a Gaussian kernel whose elements are determined using the 2-dimensional Gaussian function, expressed by the following formula:

Gaussian filters of several mask sizes are popular. A 5×5 kernel mask approximating the 2-D Gaussian with a sigma of 1 is shown below. The different numbers in the mask represent the weights that are assigned to the underlying pixel values while doing the averaging. The division by 273, the sum of all kernel elements, is done to normalize the result.

We should remember the smoothing operation using the Gaussian kernel is a convolution operation which can be split into two separable 1-dimensional convolutions because of the symmetry of the Gaussian.

One way to perform Gaussian smoothing or blurring in PyTorch is by using the Kornia library developed for computer vision tasks. The library offers many operators that can be inserted within neural networks to train models to perform image transformations, and low level image processing such as filtering and edge detection that operate directly on tensors.

So let us apply the 5×5 Gaussian filtering to the noisy convertible image shown above and display the result.

import kornia
gauss = kornia.filters.GaussianBlur2d((5, 5), (1., 1.))
img_s = gauss(img_n.unsqueeze(0))
plt.imshow(imgres.squeeze(0).numpy().transpose(1,2,0))

Comparing the filtered image with the noisy convertible image, we note that the image has been slightly blurred. Let us check whether the blurring would impact the output of our trained ResNet50 model. So we input the blurred noisy image and look at the prediction.

show_class_and_confidence(transform(img_s.squeeze(0)))
convertible  |  12.393182754516602

Lo and behold; the model yields a correct prediction. Thus, it looks possible that the Gaussian smoothing might be of help in fighting with black box adversarial attacks. However, before getting too excited, let us check what will be the prediction if we blurred the original convertible image and fed that to the ResNet50 pre-trained model.

img_s = gauss(img_t.unsqueeze(0))
show_class_and_confidence(transform(img_s.squeeze(0)))
convertible  |  16.06728744506836

So the original image is still correctly recognized with a confidence value only slightly lower than that of the original convertible image. Thus, it appears that smoothing will not lead to misrecognition if the input was not manipulated but it will help recognition when the input has been manipulated.

## Strategy for Using Smoothing

Having seen through an example above that smoothing can help in dealing with adversarial attacks, the question is how do we make use of smoothing. For this, I suggest using a simple approach that does not need any additional training and performs well as shown by an experiment that I will describe later. The suggested scheme consists of the followings:

• Feed the input image to the pretrained model and note the model prediction.
• Perform Gaussian smoothing of the input image and feed the resulting single blurred image to the pretrained model and note the model prediction.
• Perform Gaussian smoothing of the single blurred image to get a double blurred image. Feed the double blurred image to the model and note the prediction. Remember that doing another round of smoothing on an already blurred image is equivalent to single smoothing with an expanded kernel mask. For example, doing blurring twice with a 5×5 mask is equivalent to blurring once with a 7×7 mask.
• Check if any class label is present in majority in the three predictions generated for the input image. Make that majority label as the final prediction of the model. Reject the input image if there is no majority label.

The above strategy for recognition using smoothing is illustrated by the figure given below. The reasoning behind the suggested strategy is straightforward. If the input image has not been manipulated then the predictions on the input and its single blurred version are likely to be identical yielding a majority. The prediction on the double blurred version will not matter in this case. On the other hand, predictions on the single and double blurred versions are likely to be identical if the input image has been moderately to heavily manipulated. If the input image is corrupted with a much higher noise level, then all three predictions may be different and may cause the input to be rejected. Of course in some noisy images, the smoothing will result in an incorrect prediction in majority and thus giving an incorrect output.

## Results from a Small Experiment

Now, let us look at what kind of performance the smoothing strategy gave in an experiment to evaluate the scheme. I ran the experiment by downloading 1,000 images from the site https://github.com/EliSchwartz/imagenet-sample-images that contains one sample image per category. It turned out that 25 of these images are gray-level images and thus I used only the remaining 975 images. I made four runs of the experiment. In the first run, the images were not perturbed. In the remaining three runs, I added three levels of noise: low (𝜎 = 0.125), medium (𝜎 = 0.25), and high (𝜎 = 0.375). I recorded classification accuracies for the original images and their noisy versions separately as well as using the majority rule to determine the final prediction.

The results using the 5×5 Gaussian kernel with sigma of one are shown in the table below. Repeating the experiment with a 7×7 mask resulted in similar numbers. The Acc_NS stands for the accuracy of the ResNet50 model with no smoothing of the input images. This accuracy is the baseline accuracy for each row in the table below. The Acc_SS is the accuracy of the model on blurred input images. Similarly, Acc_DS is the accuracy given by the model on doubly blurred images. The Acc_Maj is the accuracy given by the model when we take the majority label over the three versions, no smoothing, single smoothing, and double smoothing, of the input images. The Reject Rate is the percentage of the images where no predicted class label was in majority.

Looking at the results, we see that the ResNet50 baseline accuracy on the sample set of ImageNet images is 88.2%, although the ResNet50 model from PyTorch repository on the complete ImageNet set is about 75%. So the higher accuracy of the model in the current results can be attributed to the particular sample of images used in the experiment; however, the actual accuracy should not matter as we are interested in seeing the impact of noise and smoothing. Going across the No Noise row of the table, we see that single smoothing or using the majority rule with single and double smoothing give performance close to the baseline performance, although Acc_DS, if used alone to make the final prediction, is about 7% lower. Looking at the remaining three rows of the table, we see that smoothing does provide a significant improvement in the accuracy. Predicting using single or double smoothing on their own shows that black box attacks can be dealt with reasonable improvements in the performance. Making predictions using the majority rule offers even much better accuracy; however this improvement comes with the option of rejecting a number of inputs when there is no predicted label in majority.

Although the comparison with the results reported in the Denoised Smoothing: A Provable Defense for Pretrained Classifiers is not possible because of a much smaller dataset used in my experiment, one potential point for a rough comparison is the performance under the medium level of noise. In the scheme shown here, the performance on noisy images with 𝜎=0.25 and without any smoothing is 34.25% which goes up to 50% to 56% under smoothing and the majority rule. The corresponding numbers in the customized de-noising scheme are 69% and 48% meaning that the performance with noise and customized smoothing goes down.

Summarizing what I have observed through playing with the simple smoothing scheme, the following observations can be made:

• Smoothing does help in fighting with adversarial black box attacks
• Combining no smoothing, single smoothing and double smoothing with a majority rule significantly improves accuracy in presence of adversarial attacks. But the improvement comes with a reject trade-off
• Use single smoothing to improve performance where reject option is not viable