Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison

22/09/2020

Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison

Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, Cong Geng

Keywords: Benchmarks, Recommender Systems, Reproducible Evaluation

Abstract Paper Similar Papers

Abstract: With tremendous amount of recommendation algorithms proposed every year, one critical issue has attracted a considerable amount of attention: there are no effective benchmarks for evaluation, which leads to two major concerns, i.e., unreproducible evaluation and unfair comparison. This paper aims to conduct rigorous (i.e., reproducible and fair) evaluation for implicit-feedback based top-N recommendation algorithms. We first systematically review 85 recommendation papers published at eight top-tier conferences (e.g., RecSys, SIGIR) to summarize important evaluation factors, e.g., data splitting and parameter tuning strategies, etc. Through a holistic empirical study, the impacts of different factors on recommendation performance are then analyzed in-depth. Following that, we create benchmarks with standardized procedures and provide the performance of seven well-tuned state-of-the-arts across six metrics on six widely-used datasets as a reference for later study. Additionally, we release a user-friendly Python toolkit, which differs from existing ones in addressing the broad scope of rigorous evaluation for recommendation. Overall, our work sheds light on the issues in recommendation evaluation and lays the foundation for further investigation. Our code and datasets are available at GitHub (https://github.com/AmazingDD/daisyRec).

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at RECSYS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

29/06/2020

Do explicit review strategies improve code review performance?

Pavlı́na Wurzel Gonçalves, Enrico Fregnan, Tobias Baum and
Kurt Schneider, Alberto Bacchelli

Keywords Paper

0

0

0

0

5:33

15/11/2020

Designing Types for R, Empirically

Alexi Turcotte, Aviral Goel, Filip Křikava, Jan Vitek

Keywords Paper

R, dynamic languages, type declarations

0

0

0

0

16:04

26/04/2020

Measuring the Reliability of Reinforcement Learning Algorithms

Stephanie C.Y. Chan, Samuel Fishman, Anoop Korattikara and
John Canny, Sergio Guadarrama

Keywords Paper

reinforcement learning, metrics, statistics, reliability

0

0

0

0

5:32

06/12/2021

Searching Parameterized AP Loss for Object Detection

Tao Chenxin, Zizhang Li, Xizhou Zhu and
Gao Huang, Yong Liu, jifeng dai

Keywords Paper

machine learning, vision

0

0

0

0

6:13

29/06/2020

An empirical study on regular expression bugs

Peipei Wang, Chris Brown, Jamie A. Jennings, Kathryn T. Stolee

Keywords Paper

pull requests, Regular expression bug characteristics, bug fixes

0

0

0

0

12:22

25/07/2020

Automated embedding size search in deep recommender systems

Haochen Liu, Xiangyu Zhao, Chong Wang and
Xiaobing Liu, Jiliang Tang

Keywords Paper

embedding, recommender system, AutoML

0

0

0

0

16:19

15/11/2020

Taming Type Annotations in Gradual Typing

John Peter Campora, Sheng Chen

Keywords Paper

variational types, gradual typing, cast errors

0

0

0

0

14:33

16/11/2020

Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding

Jiaxin Huang, Yu Meng, Fang Guo and
Heng Ji, Jiawei Han

Keywords Paper

extracting aspects, classifying reviews, aspect-based analysis, aspect classification

0

0

0

0

11:23

02/02/2021

Instance Mining with Class Feature Banks for Weakly Supervised Object Detection

Yufei Yin, Jiajun Deng, Wengang Zhou, Houqiang Li

Keywords Paper

0

0

0

0

14:57

04/07/2020

Unsupervised Opinion Summarization as Copycat-Review Generation

Arthur Bražinskas, Mirella Lapata, Ivan Titov

Keywords Paper

Unsupervised Summarization, Copycat-Review Generation, Opinion summarization, automatically summaries

0

0

0

0

10:55

19/08/2021

AMEIR: Automatic Behavior Modeling, Interaction Exploration and MLP Investigation in the Recommender System

Pengyu Zhao, Kecheng Xiao, Yuanxing Zhang and
Kaigui Bian, Wei Yan

Keywords Paper

Knowledge Representation and Reasoning, Preference Modelling and Preference-Based Reasoning, Recommender Systems, Recommender Systems

0

0

0

0

15:05

15/11/2020

Finding Bugs in Database Systems via Query Partitioning

Manuel Rigger, Zhendong Su

Keywords Paper

database testing, three-valued logic, DBMS testing, test oracle

0

0

0

0

15:01

03/05/2021

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study

Marcin Andrychowicz, Anton Raichuk, Piotr Stanczyk and
Manu Orsini, Sertan Girgin, Raphaël Marinier, Hussenot Hussenot-Desenonges, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

Keywords Paper

continuous control, Reinforcement learning

0

0

0

0

15:34

06/12/2021

Automatic Unsupervised Outlier Model Selection

Yue Zhao, Ryan Rossi, Leman Akoglu

Keywords Paper

machine learning, self-supervised learning, meta learning, clustering

0

0

0

0

15:08

18/07/2021

Context-Aware Online Collective Inference for Templated Graphical Models

Charles Dickens, Connor Pryor, Eriq Augustine and
Alexander Miller, Lise Getoor

Keywords Paper

Probabilistic Methods, Graphical Models

0

0

0

0

5:19

26/08/2020

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

Ryan Rogers, Aaron Roth, Adam Smith and
Nathan Srebro, Om Dipakbhai Thakkar, Blake Woodworth

Keywords Paper

0

0

0

0

11:53

19/04/2021

Hidden biases in unreliable news detection datasets

Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos and
Thomas Butler, Mohit Bansal

Keywords Paper

0

0

0

0

10:57

23/06/2021

RbSyn: Type- and Effect-Guided Program Synthesis

Sankha Narayan Guria, Jeffrey S. Foster, David Van Horn

Keywords Paper

program synthesis, type and effect systems, Ruby

0

0

0

0

12:40

02/02/2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

Keywords Paper

0

0

0

0

15:15

25/07/2020

A general knowledge distillation framework for counterfactual recommendation via uniform data

Dugang Liu, Pengxiang Cheng, Zhenhua Dong and
Xiuqiang He, Weike Pan, Zhong Ming

Keywords Paper

counterfactual learning, uniform data, recommender systems, knowledge distillation

0

0

0

0

14:06

06/12/2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Tai-Yu Pan, Cheng Zhang, Yandong Li and
Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Keywords Paper

machine learning, vision

0

0

0

0

11:49

29/06/2020

SoftMon: A tool to compare similar open-source software from a performance perspective

Shubhankar Suman Singh, Smruti R. Sarangi

Keywords Paper

Performance debugging, Software comparison, NLP based matching

0

0

0

0

15:58

13/04/2021

PClean: Bayesian data cleaning at scale with domain-specific probabilistic programming

Alexander Lew, Monica Agrawal, David Sontag, Vikash Mansinghka

Keywords Paper

0

0

0

0

3:08

18/07/2021

When All We Need is a Piece of the Pie: A Generic Framework for Optimizing Two-way Partial AUC

Zhiyong Yang, Qianqian Xu, Shilong Bao and
Yuan He, Xiaochun Cao, Qingming Huang

Keywords Paper

Algorithms, Supervised Learning

0

0

0

0

15:48

05/01/2021

ChartOCR: Data Extraction From Charts Images via a Deep Hybrid Framework

Junyu Luo, Zekun Li, Jinpeng Wang, Chin-Yew Lin

Keywords Paper

0

0

0

0

4:58

03/05/2021

Benchmarks for Deep Off-Policy Evaluation

Justin Fu, Mohammad Norouzi, Ofir Nachum and
George Tucker, ziyu wang, Alexander Novikov, Sherry Yang, Michael Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Paine

Keywords Paper

reinforcement learning, benchmarks, off-policy evaluation

0

0

0

0

10:05

02/02/2021

A User-Adaptive Layer Selection Framework for Very Deep Sequential Recommender Models

Lei Chen, Fajie Yuan, Jiaxi Yang and
Xiang Ao, Chengming Li, Min Yang

Keywords Paper

0

0

0

0

18:18

06/12/2021

Few-Shot Segmentation via Cycle-Consistent Transformer

Gengwei Zhang, Guoliang Kang, Yi Yang, Yunchao Wei

Keywords Paper

transformers, vision, few shot learning

0

0

0

0

11:58

19/01/2020

Towards Verified Stochastic Variational Inference for Probabilistic Programs

Wonyeol Lee, Hangyeol Yu, Xavier Rival, Hongseok Yang

Keywords Paper

semantics, correctness, Probabilistic programming, static analysis

0

0

0

0

20:50

02/02/2021

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

George Stoica, Emmanouil Antonios Platanios, Barnabas Poczos

Keywords Paper

0

0

0

0

16:45

06/12/2021

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima and
Yutaka Matsuo, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

10:00

15/06/2020

PracExtractor: Extracting Configuration Good Practices from Manuals to Detect Server Misconfigurations

Chengcheng Xiang, Haochen Huang, Andrew Yoo and
Yuanyuan Zhou, Shankar Pasupathy

Keywords Paper

0

0

0

0

20:21

29/06/2020

Capture the feature flag: Detecting feature flags in open-source

Jens Meinicke, Juan Hoyos, Bogdan Vasilescu, Christian Kästner

Keywords Paper

0

0

0

0

7:47

07/08/2020

An Exploratory Study of Hardware Reverse Engineering — Technical and Cognitive Processes

Steffen Becker, Carina Wiesen, Nils Albartus and
Nikol Rummel, Christof Paar

Keywords Paper

0

0

0

0

5:30

03/05/2021

Discovering a set of policies for the worst case reward

Tom Zahavy, Andre Barreto, Daniel J Mankowitz and
Shaobo Hou, Brendan ODonoghue, Iurii Kemaev, Satinder Singh

Keywords Paper

0

0

0

0

10:33

02/02/2021

Structure-Consistent Weakly Supervised Salient Object Detection with Local Saliency Coherence

Siyue Yu, Bingfeng Zhang, Jimin Xiao, Eng Gee Lim

Keywords Paper

0

0

0

0

14:09

04/07/2020

A Re-evaluation of Knowledge Graph Completion Methods

Zhiqing Sun, Shikhar Vashishth, Soumya Sanyal and
Partha Talukdar, Yiming Yang

Keywords Paper

large-scale graphs, data mining, machine learning, natural processing

0

0

0

0

6:58

15/11/2020

Precise Inference of Expressive Units of Measurement Types

Tongtong Xiang, Jeff Y. Luo, Werner Dietl

Keywords Paper

Scientific computing, Pluggable type system, Dimensional analysis, Units of measurements, Type inference

0

0

0

0

13:39

04/07/2020

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Yuan Zang, Fanchao Qi, Chenghao Yang and
Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun

Keywords Paper

Textual attacking, Word-level attacking, combinatorial problem, Word-level Attacking

0

0

0

0

9:34

06/12/2021

TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

YU LI, Min LI, Qiuxia LAI and
Yannan Liu, Qiang Xu

Keywords Paper

deep learning, machine learning, vision, graph learning, semi-supervised learning

0

0

0

0

13:21