14/06/2020

Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval

Kaiyue Pang, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song

Keywords: fine-grained sbir, sbir, sketch research, jigsaw puzzle learning, self-supervised, representation pre-training, imagenet pre-training

Abstract: ImageNet pre-training has long been considered crucial by the ne-grained sketch-based image retrieval (FG-SBIR) community due to the lack of large sketch-photo paired datasets for FG-SBIR training. In this paper, we propose a self-supervised alternative for representation pre-training. Specically, we consider the jigsaw puzzle game of recomposing images from shufed parts. We identify two key facets of jigsaw task design that are required for effective FG-SBIR pre-training. The rst is formulating the puzzle in a mixed-modality fashion. Second we show that framing the optimisation as permutation matrix inference via Sinkhorn iterations is more effective than the common classier formulation of Jigsaw self-supervision. Experiments show that this self-supervised pre-training strategy signicantly outperforms the standard ImageNet-based pipeline across all four product-level FG-SBIR benchmarks. Interestingly it also leads to improved cross-category generalisation across both pre-train/ne-tune and ne-tune/testing stages.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers

 16:31