Abstract:
Document retrieval (DR) is a crucial task in NLP. Recently, the pre-trained BERT-like language models have achieved remarkable success, obtaining a state-of-the-art result in DR. In this paper, we come up with a new BERT-based ranking model for DR task, named TABLE. In the pre-training stage of TABLE, we present a domain-adaptive strategy. More essentially, in the fine-tuning stage, we develop a two-phase task-adaptive process, i.e., type-adaptive pointwise fine-tuning and listwise fine-tuning. In the type-adaptive pointwise fine-tuning phase, the model can learn different matching patterns regarding different query types. In the listwise fine-tuning phase, the model matches documents with regard to a given query in a listwise fashion. This task-adaptive process makes the model more robust. In addition, a simple but effective exact matching feature is introduced in fine-tuning, which can effectively compute matching of out-of-vocabulary (OOV) words between a query and a document. As far as we know, we are the first who propose a listwise ranking model with BERT. This work can explore rich matching features between queries and documents. Therefore it substantially improves model performance in DR. Notably, our TABLE model shows excellent performance on the MS MARCO leaderboard.