Abstract:
Conventional methods for object detection typically rely on large, well-annotated datasets, which are in short supply due to the high costs of labeling. In this paper, we propose to label only large, easy-to-spot objects. We argue that these contain more pixels and therefore usually more information about the underlying object class than small ones. At the same time, they are easier to spot and hence cheaper to label. Unfortunately, standard supervised learning algorithms do not learn to detect small objects if only large ones are labeled. Instead, they erroneously take up unlabeled objects as negative examples and their accuracy consequently deteriorates. To address that, we propose PCIS, a novel combination of pseudo-labels, output consistency across scales, and an anchor scale-dependent ignore strategy. In experiments on CityPersons, EuroCityPersons and MS COCO, we show that our approach outperforms existing pseudo-label generation methods as well as an oracle which ensures that anchors overlapping missing annotations are ignored during training. We demonstrate that using our method it is possible to approach the performance of a fully labeled dataset with only a subset of the labels and also to train detectors on extremely sparsely labeled images, e.g. if only 1 out of 200 objects is annotated.