Large scale long-tailed product recognition system at alibaba

Abstract: A practical large scale product recognition system suffers from the phenomenon of long-tailed imbalanced training data under the E-commercial circumstance at Alibaba. In addition to images of products at Alibaba, plenty of related side information (e.g. title and tags) reveal rich semantic information about images. Prior works mainly focus on addressing the long tail problem from the visual perspective only, but lack of consideration of leveraging the side information. In this paper, we present a novel side information based large scale visual recognition co-training (SICoT) system to deal with the long tail problem by leveraging the image related side information. In the proposed co-training system, we firstly introduce a bilinear word attention module which aims to construct a semantic embedding from the noisy side information. A visual feature and semantic embedding co-training scheme is then designed to transfer knowledge between those classes with abundant training data (head classes) to classes with few training data (tail classes) in an end-to-end fashion. Extensive experiments on four challenging large scale datasets, whose numbers of classes range from one thousand to one million, demonstrate the scalable effectiveness of the proposed SICoT system in alleviating the long tail problem.

02/02/2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

long-tailed visual recognition, large-scale visual recognition, bilateral-branch network, cumulative learning, re-balancing, representation learning

4:56

16/11/2020

object detection, long-tail, lvis, weight norm, classifier imbalance, balanced group softmax, bags, instance segmentation

4:57

14/06/2020

domain adaptive object detection, image-level categorical regularization, categorical consistency regularization, domain adaptive faster r-cnn

1:00

05/01/2021

visual saliency, salient object detection, rgb-d, depth information, joint learning, dense connections, multi-modal features, feature fusion, deep learning, encoder-decoder

1:01

14/06/2020

Jingjing Li, Wei Ji, Qi Bi and
Cheng Yan, Miao Zhang, Yongri Piao, Huchuan Lu, Li cheng

Weakly supervised segmentation, semi supervised segmentation, Pseudo-label generation, Class Activation Maps, Objectness, Saliency

3:02

02/02/2021

vison and language, video captioning, seq2seq learning, object relational graph, teacher-recommended learning, gcn, visual relational reasoning, external language model, knowledge distillation, long-tailed problem

1:05

02/02/2021

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

Liang Qiao, Ying Chen, Zhanzhan Cheng and
Yunlu Xu, Yi Niu, Shiliang Pu, Fei Wu

vision and language, video understanding, action recognition, action retrieval, instructional videos, weakly-supervised videos, action and behaviour, attributes, attention, adverbs

1:01

06/12/2021