Abstract:
We introduce an information-theoretic loss function, RankMI, and an associated training algorithm for deep representation learning for image retrieval. Our proposed framework consists of alternating updates to a network that estimates the divergence between distance distributions of matching and non-matching pairs of learned embeddings, and an embedding network that maximizes this estimate via sampled negatives. In addition, under this information-theoretic lens we draw connections between RankMI and commonly-used ranking losses, e.g., triplet loss. We extensively evaluate RankMI on several standard image retrieval datasets, namely, CUB-200-2011, CARS-196, and Stanford Online Products. Our method achieves competitive results or significant improvements over previous reported results on all datasets.