19/10/2020

What rankers can be statistically distinguished in multileaved comparisons?

Makoto P. Kato, Akiomi Nishida, Tomohiro Manabe, Sumio Fujita, Takehiro Yamamoto

Keywords: multileaving, online evaluation, interleaving

Abstract: This paper presents findings from an empirical study of multileaved comparisons, an efficient online evaluation methodology, in a commercial Web service. The most important difference from the previous studies is the number of rankers involved in the online evaluation: we compared 30 rankers for around 90 days by multileaved comparisons. A relatively large number of rankers answered several questions that could not be addressed in the previous work due to a small number of rankers: How much ranking difference is required for rankers to be statistically distinguished? How many impressions are necessary for finding statistically significant differences for correlated rankers? How large difference in offline evaluation can predict significant differences in a multileaved comparison? We answer these questions with the results of the multileaved comparisons, and generalized some of the findings by simulation-based experiments.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3340531.3412143#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CIKM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers