Abstract:
Traffic forecasting on the entire road network is challenging due to the non-linear temporal dynamics and complex spatial correlations. Multi-step traffic forecasting further increases the difficulty because of the accumulated prediction errors. Existing forecasting models attempt to extract both spatial and temporal features of all locations on the road network for prediction, but often overlook the interaction between the two types of features, which has led to sub-optimal performance. In this work, we tackle this problem by proposing InterNet, which applies the multi-head attention mechanism on the extracted spatio-temporal features and enables the interaction of the spatial (temporal) features of one location with the temporal (spatial) features of all locations. Moreover, we extract the features of all locations using a graph convolutional layer and a bidirectional LSTM layer, before feeding them into the multi-head attention layer. The three layers are seamlessly integrated and thereby enable end-to-end learning. Experimental results show that the InterNet model outperforms the state-of-the-art models in terms of the prediction accuracy, which demonstrates the potential of such interactions.