Abstract:
Research on continual learning has led to a variety of approaches to
mitigating catastrophic forgetting in feed-forward classification networks.
Until now surprisingly little attention has been focused on continual learning
of recurrent models applied to problems like image captioning. In this paper
we take a systematic look at continual learning of LSTM-based models for image
captioning. We propose an attention-based approach that explicitly
accommodates the transient nature of vocabularies in continual image
captioning tasks -- i.e. that task vocabularies are not disjoint. We call our
method Recurrent Attention to Transient Tasks (RATT), and also show how to
adapt continual learning approaches based on weight regularization and
knowledge distillation to recurrent continual learning problems. We apply our
approaches to incremental image captioning problem on two new continual
learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our
results demonstrate that RATT is able to sequentially learn five captioning
tasks while incurring no forgetting of previously learned ones.