Abstract:
The network representation used for conveying an application's objective in cloud environments, which we refer to as the Application Network Interface (ANI), has steadily evolved --- from packet to flow and flowlet, and more complex abstractions such as coflow. In this paper, we argue that state-of-the-art ANIs still fail to capture important application needs. Using distributed deep learning as a representative application, we show that application performance achievable using current ANIs are up to 25% lower than optimal. We analyze these ANIs to understand the missing pieces and put forward CadentFlow, an ANI with per-flow metrics and an optimization objective, to capture application requirements effectively. We discuss the opportunity for real-world implementation of a more expressive ANI and its implications on the design of network controllers and scheduling algorithms.