Abstract:
Turn-entry timing is an important requirement for conversation, and one that spoken dialogue systems largely fail at. In this paper, we introduce a computational framework based on work from Psycholinguistics, which is aimed at achieving proper turn-taking timing for situated agents. The approach involves incremental processing and lexical prediction of the turn in progress, which allows a situated dialogue system to start its turn and initiate actions earlier than would otherwise be possible. We evaluate the framework by integrating it within a cognitive robotic architecture and testing performance on a corpus of task-oriented human-robot directives. We demonstrate that: 1) the system is superior to a non-incremental system in terms of faster responses, reduced gap between turns, and the ability to perform actions early, 2) the system can time its turn to come in immediately at a transition point or earlier to produce several types of overlap, and 3) the system is robust to various forms of disfluency in the input. Overall, this domain-independent framework can be integrated into various dialogue systems to improve responsiveness, and is a step toward more natural, human-like turn-taking behavior.