Abstract:
Movie-Map, which is an interactive map with first-person view to engage the user in a simulated walking experience, is made up of short 360° video segments separated by traffic intersections which are seamlessly connected according to viewer's direction of travel. However, in a wide area of urban scale where many roads intersect, manual intersection segmentation requires significant human effort. Therefore, the automatic identification of intersections from 360° videos is important problem for Movie-Map to scale it up. In this paper, we propose a novel method that identifies the intersection from individual frames in 360° videos. Rather than formulating the intersection identification as the standard binary classification task taking a 360° image as input, we identify an intersection based on the number of the possible directions of travel (PDoT) in perspective images projected in eight directions from a single~omni image detected by the neural network for handling various types of intersections. We construct a large-scale 360° Image Intersection Identification (iii360) dataset for the training and evaluation where 360° videos are collected from various areas such as school campus, downtown, suburb, and china town and demonstrate that our PDoT-based method performs significantly better than a naive binary classification based algorithm. Source codes and a part of the dataset will be shared in the community when the paper is published.