Abstract:
Observing a video stream and being able to predict target events of interest before they occur is an important but challenging task due to the stochastic nature of visual events. This task requires a classifier that can separate the precursory signals that lead to the events and the ones that do not. However, a naïve approach for training this classifier would require seeing many examples of the target events before a model with high precision can be obtained. In this paper, we propose a method for early prediction of visual events based on an ensemble of exemplar predictors. Each exemplar predictor is associated with an instance of the target event, being trained to separate the target event from negative samples. The exemplar predictors can be calibrated and integrated to create a stronger predictor. Experiments on several datasets show that the proposed exemplar-based framework outperforms other methods, yielding higher precision given fewer training samples. Our code and datasets can be found at https://github.com/cvlab-stonybrook/EnEx.