From the lab to production: A case study of session-based recommendations in the home-improvement domain

Abstract: E-commerce applications rely heavily on session-based recommendation algorithms to improve the shopping experience of their customers. Recent progress in session-based recommendation algorithms shows great promise. However, translating that promise to real-world outcomes is a challenging task for several reasons, but mostly due to the large number and varying characteristics of the available models. In this paper, we discuss the approach and lessons learned from the process of identifying and deploying a successful session-based recommendation algorithm for a leading e-commerce application in the home-improvement domain. To this end, we initially evaluate fourteen session-based recommendation algorithms in an offline setting using eight different popular evaluation metrics on three datasets. The results indicate that offline evaluation does not provide enough insight to make an informed decision since there is no clear winning method on all metrics. Additionally, we observe that standard offline evaluation metrics fall short for this application. Specifically, they reward an algorithm only when it predicts the exact same item that the user clicked next or eventually purchased. In a practical scenario, however, there are near-identical products which, although they are assigned different identifiers, they should be considered as equally-good recommendations. To overcome these limitations, we perform an additional round of evaluation, where human experts provide both objective and subjective feedback for the recommendations of five algorithms that performed the best in the offline evaluation. We find that the experts’ opinion is oftentimes different from the offline evaluation results. Analysis of the feedback confirms that the performance of all models is significantly higher when we evaluate near-identical product recommendations as relevant. Finally, we run an A/B test with one of the models that performed the best in the human evaluation phase. The treatment model increased conversion rate by 15.6

05/12/2020

information retrieval, ranking and preference learning, learning to rank, e-commerce search, implicit feedback, counterfactual risk minimization, dataset, mining data logs

11:21

05/12/2020

Xu HE, Bo An, Yanghua Li and
Haikai Chen, Qingyu Guo, Xin Li, Zhirong Wang

Decision making problems with funnel structure: A multi-task learning approach with application to email marketing campaigns

Xiaoru Qu, Zhao Li, Jialin Wang and
Zhipeng Zhang, Pengcheng Zou, Junxiao Jiang, Jiaming Huang, Rong Xiao, Ji Zhang, Jun Gao

Shengyu Zhang, Ziqi Tan, Zhou Zhao and
Jin Yu, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, Fei Wu

Chao Deng, Hao Wang, Qing Tan and
Jian Xu, Kun Gai

Kuan Xu, Chilin Fu, Xiaolu Zhang and
Cen Chen, Ya-Lin Zhang, Wenge Rong, Zujie Wen, Jun Zhou, Xiaolong Li, Yu Qiao

causal inference, evaluation metric, experiment, dose-response function, a/b test, mediation analysis, business kpi, meta-analysis

19:55

16/11/2020

Xinyi Dai, Jiawei Hou, Qing Liu and
Yunjia Xi, Ruiming Tang, Weinan Zhang, Xiuqiang He, Jun Wang, Yong Yu

Runsheng Yu, Yu Gong, Xu He and
Yu Zhu, Qingwen Liu, Wenwu Ou, Bo An

auto ml, hyperparameter optimization, meta learning, task aware, hyperband, hyperparameters, warm start, image classication, resnet, shufflenet

4:58