Abstract:
This paper describes the {Critical Role Dungeons and Dragons Dataset} ({CRD3}) and related analyses. {Critical Role} is an unscripted, live-streamed show where a fixed group of people play {Dungeons and Dragons}, an open-ended role-playing game. The dataset is collected from 159 {Critical Role} episodes transcribed to text dialogues, consisting of 398,682 turns. It also includes corresponding abstractive summaries collected from the {Fandom} wiki. The dataset is linguistically unique in that the narratives are generated entirely through player collaboration and spoken interaction. For each dialogue, there are a large number of turns, multiple abstractive summaries with varying levels of detail, and semantic ties to the previous dialogues. {I}n addition, we provide a data augmentation method that produces 34,243 summary-dialogue chunk pairs to support current neural {ML} approaches, and we provide an abstractive summarization benchmark and evaluation.