Abstract:
Grid cells enable the brain to model the physical space of the world and navigate effectively via path integration, updating self-position using information from self-movement. Recent proposals suggest that the brain might use similar mechanisms to understand the structure of objects in diverse sensory modalities, including vision. In machine vision, object recognition given a sequence of sensory samples of an image, such as saccades, is a challenging problem when the sequence does not follow a consistent, fixed pattern - yet this is something humans do naturally and effortlessly. We explore how grid cell-based path integration in a cortical network can support reliable recognition of objects given an arbitrary sequence of inputs. Our network (GridCellNet) uses grid cell computations to integrate visual information and make predictions based on movements. We use local Hebbian plasticity rules to learn rapidly from a handful of examples (few-shot learning), and consider the task of recognizing MNIST digits given a sequence of image feature patches. Extending beyond the current literature, we show that GridCellNet can reliably perform classification, generalizing to both unseen examples and completely novel sequence trajectories. Furthermore, by utilizing grid cells for an internal reference frame derived from sensory inputs and internal motor information alone, the classification process represents an important step towards enabling translation invariance in sequential classifiers. In addition, we demonstrate that GridCellNet is able to predict unseen regions of the image, that inference can be successful after sampling a fraction of the input space, and that a natural benefit of the proposed architecture is robustness in the context of continual learning. We propose that agents with active sensors can use grid cell representations not only for navigation, but also for robust and efficient visual understanding.