Abstract:
Recently, many notable convolutional neural networks have powerful performance with compact and efficient structure. To further pursue performance improvement, previous methods either introduce more computation or design complex modules. In this paper, we propose an elegant weight-sharing based ensemble network embedded knowledge distillation (EKD-FWSNet) to enhance the generalization ability of baseline models with no increase of computation and complex modules. Specifically, we first design an auxiliary branch alongside with baseline model, then set branch points and shortcut connections between two branches to construct different forward paths. In this way, we form a weight-sharing ensemble network with multiple output predictions. Furthermore, we integrate the information from diverse posterior probabilities and intermediate feature maps, which are then transferred to baseline model through knowledge distillation strategy. Extensive image classification experiments on CIFAR-10/100 and tiny-ImageNet datasets demonstrate that our proposed EKD-FWSNet can help numerous baseline models improve the accuracy by large margin (sometimes more than 4%). We also conduct extended experiments on remote sensing datasets (AID, NWPU-RESISC45, UC-Merced) and achieve state-of-the-art results.