Abstract:
We present a Self-Attention MobileNet, called SA-MobileNet Network for tackling the fine-grained image tilt correction problem. SA-MobileNet contains self-attention modules integrated with the inverted bottleneck blocks of the MobileNetV3 model which results in modeling of both channel-wise attention and spatial attention of the image features and at the same time introduce a novel self-attention architecture for low-resource devices. We treat the problem of image tilt correction in a multi-label scenario where we predict multiple angles for a tilted input image in a narrow interval of range 1 or 2 degrees, depending on the dataset used. With the combination of our novel approach and the architecture, we present state-of-the-art results on detecting the image tilt angle on mobile devices as compared to the MobileNetV3 model. SA-MobileNet is more accurate than MobileNetV3 on SUN397, NYU-V1, and ADE20k datasets by 6.42%, 10.51%, and 9.09% points respectively. Furthermore, the proposed neural network architecture is faster by approximately 4ms from the MobileNetV3 model on Snapdragon 750 Octa-core, despite a slight overhead in the number of parameters.