Abstract:
Leveraging distant contextual information and self-similarity of natural images in deep learning-based models is important for high-quality image completion with large missing regions. Most of the deep generative adversarial network (GAN)-based image completion methods attempt this via increasing receptive field size of convolutions and integrating an attention module. However, existing attention mechanisms treat the softness of the attention for different types of features with the same scale, which may be inferior since the same softness of the attention may lead attention made on limited spatial locations in feature space. To address this limitation, we design a new two-stage image completion model and propose an attention mechanism called Adaptive multi-Temperature Mask-guided Attention (ATMA). The ATMA performs non-local processing and controls the softness of attention by means of multiple self-adaptive temperatures. The proposed model infers a coarse inpainting result via a gated convolution neural network in the first stage and refines appearance consistency between generated regions and known regions via ATMA in the second stage. Experiments demonstrate superior performance compared to state-of-the-art methods on benchmark datasets including CelebA-HQ and Paris StreetView.