11/08/2020

Zero downtime release: Disruption-free load balancing of a multi-billion user website

Usama Naseer, Luca Niccolini, Udip Pant, Alan Frindell, Ranjeeth Dasineni, Theophilus A. Benson

Keywords: Reliable networks, Update releases, Load-balancing

Abstract: Modern network infrastructure has evolved into a complex organism to satisfy the performance and availability requirements for the billions of users. Frequent releases such as code upgrades, bug fixes and security updates have become a norm. Millions of globally distributed infrastructure components including servers and load-balancers are restarted frequently from multiple times per-day to per-week. However, every release brings possibilities of disruptions as it can result in reduced cluster capacity, disturb intricate interaction of the components operating at large scales and disrupt the end-users by terminating their connections. The challenge is further complicated by the scale and heterogeneity of supported services and protocols.In this paper, we leverage different components of the end-to-end networking infrastructure to prevent or mask any disruptions in face of releases. Zero Downtime Release is a collection of mechanisms used at Facebook to shield the end-users from any disruptions, preserve the cluster capacity and robustness of the infrastructure when updates are released globally. Our evaluation shows that these mechanisms prevent any significant cluster capacity degradation when a considerable number of productions servers and proxies are restarted and minimizes the disruption for different services (notably TCP, HTTP and publish/subscribe).

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at SIGCOMM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers