Abstract:
We introduce a framework that enhances visual explanation of class activation map (CAM) with key-value memory structure for deep networks. We reveal challenging conditions inherently existing in several datasets that degrade the visual explanation quality of existing CAM-based visual explanation methods (e.g. imbalanced data, multi-object co-occurrence) and try to solve it with the proposed framework. The proposed Bias-reducing memory module learns spatial feature representation of different classes from trained networks and stores each different semantic information in separate memory slots, while it does not require any modification to the existing networks. Furthermore, we propose a novel visual explanation method accompanied by a memory slot searching algorithm to retrieve semantically relevant spatial feature representation from the memory module and make visual explanation of network decisions. We evaluate our visual explanation framework with datasets of challenging conditions including several medical image datasets and multi-label classification datasets. We qualitatively and quantitatively compare it with existing CAM-based methods to demonstrate the strength of our framework.