Abstract:
Network traffic typically traverses a sequence of middleboxes forming a service function chain, or simply a chain. Tolerating failures when they occur along chains is imperative to the availability and reliability of enterprise applications. Making a chain fault-tolerant is challenging since, in the event of failures, the state of faulty middleboxes must be correctly and quickly recovered while providing high throughput and low latency.In this paper, we introduce FTC, a system design and protocol for fault-tolerant service function chaining. FTC provides strong consistency with up to f middlebox failures for chains of length f + 1 or longer without requiring dedicated replica nodes. In FTC, state updates caused by packet processing at a middlebox are collected, piggybacked onto the packet, and sent along the chain to be replicated. Our evaluation shows that compared with the state of art [51], FTC improves throughput by 2-3.5X for a chain of two to five middleboxes.