Abstract:
Class Activation Mappings (CAMs) are a popular group of methods for creating visual explanations of the reasons behind a network's prediction. These techniques create explanations by weighting and visualising the output of the final convolution layer. Recent CAM techniques have sought to improve these explanations by introducing methods that aim to produce weights that more accurately represent how the networks informs its prediction. However, none of these methods address the low spatial resolution of the final convolutional layer, leading to coarse explanations. In this paper, we propose Jitter-CAM, a method for producing and combining multiple CAM explanations that allow us to create explanations with a higher spatial resolution than previous comparable methods. We use ImageNet and a number of well known architectures to show that our technique produces explanations that are both more accurate and better at localising the target object. Anonymous code for Jitter-CAM is available at https://github.com/HartleyTW/Jitter-CAM.