Abstract:
Recently online multi-armed bandit (MAB) is growing rapidly, as novel problem settings and algorithms motivated by various practical applications are being studied, building on the top of the classic bandit problem. However, identifying the best bandit algorithm from lots of potential candidates for a given application is not only time-consuming but also relying on human expertise, which hinders the practicality of MAB. To alleviate this problem, this paper outlines an intelligent system called AutoBandit, equipped with many out-of-the-box MAB algorithms, for automatically and adaptively choosing the best with suitable hyper-parameters online. It is effective to help a growing application for continuously maximizing cumulative rewards of its whole life-cycle. With a flexible architecture and user-friendly web-based interfaces, it is very convenient for the user to integrate and monitor online bandits in a business system. At the time of publication, AutoBandit has been deployed for various industrial applications.