4. Data Center Routing Overview

This section provides an overview of three general types of data center protocol designs -- Layer 2 only, Hybrid Layer L2/L3, and Layer 3 only.

4.1 L2-Only Designs

Originally, most data center designs used Spanning Tree Protocol (STP) originally defined in [IEEE8021D-1990] for loop-free topology creation, typically utilizing variants of the traditional DC topology described in Section 3.1. At the time, many DC switches either did not support Layer 3 routing protocols or supported them with additional licensing fees, which played a part in the design choice. Although many enhancements have been made through the introduction of Rapid Spanning Tree Protocol (RSTP) in the latest revision of [IEEE8021D-2004] and Multiple Spanning Tree Protocol (MST) specified in [IEEE8021Q] that increase convergence, stability, and load-balancing in larger topologies, many of the fundamentals of the protocol limit its applicability in large-scale DCs. STP and its newer variants use an active/standby approach to path selection, and are therefore hard to deploy in horizontally scaled topologies as described in Section 3.2. Further, operators have had many experiences with large failures due to issues caused by improper cabling, misconfiguration, or flawed software on a single device. These failures regularly affected the entire spanning-tree domain and were very hard to troubleshoot due to the nature of the protocol. For these reasons, and since almost all DC traffic is now IP, therefore requiring a Layer 3 routing protocol at the network edge for external connectivity, designs utilizing STP usually fail all of the requirements of large-scale DC operators. Various enhancements to link-aggregation protocols such as [IEEE8023AD], generally known as Multi-Chassis Link-Aggregation (M-LAG) made it possible to use Layer 2 designs with active-active network paths while relying on STP as the backup for loop prevention. The major downsides of this approach are the lack of ability to scale linearly past two in most implementations, lack of standards-based implementations, and the added failure domain risk of syncing state between the devices.

It should be noted that building large, horizontally scalable, L2-only networks without STP is possible recently through the introduction of the Transparent Interconnection of Lots of Links (TRILL) protocol in [RFC6325]. TRILL resolves many of the issues STP has for large-scale DC design however, due to the limited number of implementations, and often the requirement for specific equipment that supports it, this has limited its applicability and increased the cost of such designs.

Finally, neither the base TRILL specification nor the M-LAG approach totally eliminate the problem of the shared broadcast domain that is so detrimental to the operations of any Layer 2, Ethernet-based solution. Later TRILL extensions have been proposed to solve the this problem statement, primarily based on the approaches outlined in [RFC7067], but this even further limits the number of available interoperable implementations that can be used to build a fabric. Therefore, TRILL-based designs have issues meeting REQ2, REQ3, and REQ4.

4.2 Hybrid L2/L3 Designs

Operators have sought to limit the impact of data-plane faults and build large-scale topologies through implementing routing protocols in either the Tier 1 or Tier 2 parts of the network and dividing the Layer 2 domain into numerous, smaller domains. This design has allowed data centers to scale up, but at the cost of complexity in managing multiple network protocols. For the following reasons, operators have retained Layer 2 in either the access (Tier 3) or both access and aggregation (Tier 3 and Tier 2) parts of the network:

Supporting legacy applications that may require direct Layer 2 adjacency or use non-IP protocols.
Seamless mobility for virtual machines that require the preservation of IP addresses when a virtual machine moves to a different Tier 3 switch.
Simplified IP addressing = less IP subnets are required for the data center.
Application load balancing may require direct Layer 2 reachability to perform certain functions such as Layer 2 Direct Server Return (DSR). See [L3DSR].
Continued CAPEX differences between L2- and L3-capable switches.

4.3 L3-Only Designs

Network designs that leverage IP routing down to Tier 3 of the network have gained popularity as well. The main benefit of these designs is improved network stability and scalability, as a result of confining L2 broadcast domains. Commonly, an Interior Gateway Protocol (IGP) such as Open Shortest Path First (OSPF) [RFC2328] is used as the primary routing protocol in such a design. As data centers grow in scale, and server count exceeds tens of thousands, such fully routed designs have become more attractive.

Choosing a L3-only design greatly simplifies the network, facilitating the meeting of REQ1 and REQ2, and has widespread adoption in networks where large Layer 2 adjacency and larger size Layer 3 subnets are not as critical compared to network scalability and stability. Application providers and network operators continue to develop new solutions to meet some of the requirements that previously had driven large Layer 2 domains by using various overlay or tunneling techniques.