4.1. L2-Only Designs

Originally, most data center designs used Spanning Tree Protocol (STP) originally defined in [IEEE8021D-1990] for loop-free topology creation, typically utilizing variants of the traditional DC topology described in Section 3.1. At the time, many DC switches either did not support Layer 3 routing protocols or supported them with additional licensing fees, which played a part in the design choice. Although many enhancements have been made through the introduction of Rapid Spanning Tree Protocol (RSTP) in the latest revision of [IEEE8021D-2004] and Multiple Spanning Tree Protocol (MST) specified in [IEEE8021Q] that increase convergence, stability, and load-balancing in larger topologies, many of the fundamentals of the protocol limit its applicability in large-scale DCs. STP and its newer variants use an active/standby approach to path selection, and are therefore hard to deploy in horizontally scaled topologies as described in Section 3.2. Further, operators have had many experiences with large failures due to issues caused by improper cabling, misconfiguration, or flawed software on a single device. These failures regularly affected the entire spanning-tree domain and were very hard to troubleshoot due to the nature of the protocol. For these reasons, and since almost all DC traffic is now IP, therefore requiring a Layer 3 routing protocol at the network edge for external connectivity, designs utilizing STP usually fail all of the requirements of large-scale DC operators. Various enhancements to link-aggregation protocols such as [IEEE8023AD], generally known as Multi-Chassis Link-Aggregation (M-LAG) made it possible to use Layer 2 designs with active-active network paths while relying on STP as the backup for loop prevention. The major downsides of this approach are the lack of ability to scale linearly past two in most implementations, lack of standards-based implementations, and the added failure domain risk of syncing state between the devices.

It should be noted that building large, horizontally scalable, L2-only networks without STP is possible recently through the introduction of the Transparent Interconnection of Lots of Links (TRILL) protocol in [RFC6325]. TRILL resolves many of the issues STP has for large-scale DC design however, due to the limited number of implementations, and often the requirement for specific equipment that supports it, this has limited its applicability and increased the cost of such designs.

Finally, neither the base TRILL specification nor the M-LAG approach totally eliminate the problem of the shared broadcast domain that is so detrimental to the operations of any Layer 2, Ethernet-based solution. Later TRILL extensions have been proposed to solve the this problem statement, primarily based on the approaches outlined in [RFC7067], but this even further limits the number of available interoperable implementations that can be used to build a fabric. Therefore, TRILL-based designs have issues meeting REQ2, REQ3, and REQ4.

4.1. L2-Only Designs​

4.1. L2-Only Designs