10. Data-Center Interconnections (DCIs)
10. Data-Center Interconnections (DCIs)
For DCIs, the following two main scenarios are considered when connecting data centers running evpn-overlay (as described here) over an MPLS/IP core network:
- Scenario 1: DCI using GWs
- Scenario 2: DCI using ASBRs
The following two subsections describe the operations for each of these scenarios.
10.1. DCI Using GWs
This is the typical scenario for interconnecting data centers over WAN. In this scenario, EVPN routes are terminated and processed in each GW and MAC/IP route are always re-advertised from DC to WAN but from WAN to DC, they are not re-advertised if unknown MAC addresses (and default IP address) are utilized in the NVEs. In this scenario, each GW maintains a MAC-VRF (and/or IP-VRF) for each EVI. The main advantage of this approach is that NVEs do not need to maintain MAC and IP addresses from any remote data centers when default IP routes and unknown MAC routes are used; that is, they only need to maintain routes that are local to their own DC. When default IP routes and unknown MAC routes are used, any unknown IP and MAC packets from NVEs are forwarded to the GWs where all the VPN MAC and IP routes are maintained. This approach reduces the size of MAC-VRF and IP-VRF significantly at NVEs. Furthermore, it results in a faster convergence time upon a link or NVE failure in a multihomed network or device redundancy scenario, because the failure-related BGP routes (such as mass withdrawal message) do not need to get propagated all the way to the remote NVEs in the remote DCs. This approach is described in detail in Section 3.4 of [DCI-EVPN-OVERLAY].
10.2. DCI Using ASBRs
This approach can be considered as the opposite of the first approach. It favors simplification at DCI devices over NVEs such that larger MAC-VRF (and IP-VRF) tables need to be maintained on NVEs; whereas DCI devices don't need to maintain any MAC (and IP) forwarding tables. Furthermore, DCI devices do not need to terminate and process routes related to multihoming but rather to relay these messages for the establishment of an end-to-end Label Switched Path (LSP). In other words, DCI devices in this approach operate similar to ASBRs for inter-AS Option B (see Section 10 of [RFC4364]). This requires locally assigned VNIs to be used just like downstream-assigned MPLS VPN labels where, for all practical purposes, the VNIs function like 24-bit VPN labels. This approach is equally applicable to data centers (or Carrier Ethernet networks) with MPLS encapsulation.
In inter-AS Option B, when ASBR receives an EVPN route from its DC over internal BGP (iBGP) and re-advertises it to other ASBRs, it re-advertises the EVPN route by re-writing the BGP next hops to itself, thus losing the identity of the PE that originated the advertisement. This rewrite of BGP next hop impacts the EVPN mass withdrawal route (Ethernet A-D per ES) and its procedure adversely. However, it does not impact the EVPN Aliasing mechanism/procedure because when the Aliasing routes (Ethernet A-D per EVI) are advertised, the receiving PE first resolves a MAC address for a given EVI into its corresponding <ES, EVI>, and, subsequently, it resolves the <ES, EVI> into multiple paths (and their associated next hops) via which the <ES, EVI> is reachable. Since Aliasing and MAC routes are both advertised on a per-EVI-basis and they use the same RD and RT (per EVI), the receiving PE can associate them together on a per-BGP-path basis (e.g., per originating PE). Thus, it can perform recursive route resolution, e.g., a MAC is reachable via an <ES, EVI> which in turn, is reachable via a set of BGP paths; thus, the MAC is reachable via the set of BGP paths. Due to the per-EVI basis, the association of MAC routes and the corresponding Aliasing route is fixed and determined by the same RD and RT; there is no ambiguity when the BGP next hop for these routes is rewritten as these routes pass through ASBRs. That is, the receiving PE may receive multiple Aliasing routes for the same EVI from a single next hop (a single ASBR), and it can still create multiple paths toward that <ES, EVI>.
However, when the BGP next-hop address corresponding to the originating PE is rewritten, the association between the mass withdrawal route (Ethernet A-D per ES) and its corresponding MAC routes cannot be made based on their RDs and RTs because the RD for the mass Withdrawal route is different than the one for the MAC routes. Therefore, the functionality needed at the ASBRs and the receiving PEs depends on whether the Mass Withdrawal route is originated and whether there is a need to handle route resolution ambiguity for this route. The following two subsections describe the functionality needed by the ASBRs and the receiving PEs depending on whether the NVEs reside in a hypervisors or in ToR switches.
10.2.1. ASBR Functionality with Single-Homing NVEs
When NVEs reside in hypervisors as described in Section 7.1, there is no multihoming; thus, there is no need for the originating NVE to send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as noted in Section 7, in order to enable a single-homing ingress NVE to take advantage of fast convergence, Aliasing, and Backup Path when interacting with multihoming egress NVEs attached to a given ES, the single-homing NVE should be able to receive and process Ethernet A-D per ES and Ethernet A-D per EVI routes. The handling of these routes is described in the next section.
10.2.2. ASBR Functionality with Multihoming NVEs
When NVEs reside in ToR switches and operate in multihoming redundancy mode, there is a need, as described in Section 8, for the originating multihoming NVE to send Ethernet A-D per ES route(s) (used for mass withdrawal) and Ethernet A-D per EVI routes (used for Aliasing). As described above, the rewrite of BGP next hop by ASBRs creates ambiguities when Ethernet A-D per ES routes are received by the remote NVE in a different ASBR because the receiving NVE cannot associate that route with the MAC/IP routes of that ES advertised by the same originating NVE. This ambiguity inhibits the function of mass withdrawal per ES by the receiving NVE in a different AS.
As an example, consider a scenario where a CE is multihomed to PE1 and PE2, where these PEs are connected via ASBR1 and then ASBR2 to the remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but not PE2. Therefore, PE1 advertises Ethernet A-D per ES1, Ethernet A-D per EVI1, and M1; whereas, PE2 only advertises Ethernet A-D per ES1 and Ethernet A-D per EVI1. ASBR1 receives all these five advertisements and passes them to ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them to the remote PE3, with itself as the BGP next hop. PE3 receives these five routes where all of them have the same BGP next hop (i.e., ASBR2). Furthermore, the two Ethernet A-D per ES routes received by PE3 have the same information, i.e., same ESI and the same BGP next hop. Although both of these routes are maintained by the BGP process in PE3 (because they have different RDs and, thus, are treated as different BGP routes), information from only one of them is used in the L2 routing table (L2 RIB).
PE1
/ \
CE ASBR1---ASBR2---PE3
\ /
PE2
Figure 3: Inter-AS Option B
Now, when the AC between the PE2 and the CE fails and PE2 sends Network Layer Reachability Information (NLRI) withdrawal for Ethernet A-D per ES route, and this withdrawal gets propagated and received by the PE3, the BGP process in PE3 removes the corresponding BGP route; however, it doesn't remove the associated information (namely ESI and BGP next hop) from the L2 routing table (L2 RIB) because it still has the other Ethernet A-D per ES route (originated from PE1) with the same information. That is why the mass withdrawal mechanism does not work when doing DCI with inter-AS Option B. However, as described previously, the Aliasing function works and so does "mass withdrawal per EVI" (which is associated with withdrawing the EVPN route associated with Aliasing, i.e., Ethernet A-D per EVI route).
In the above example, the PE3 receives two Aliasing routes with the same BGP next hop (ASBR2) but different RDs. One of the Aliasing route has the same RD as the advertised MAC route (M1). PE3 follows the route resolution procedure specified in [RFC7432] upon receiving the two Aliasing routes; that is, it resolves M1 to <ES, EVI1>, and, subsequently, it resolves <ES, EVI1> to a BGP path list with two paths along with the corresponding VNIs/MPLS labels (one associated with PE1 and the other associated with PE2). It should be noted that even though both paths are advertised by the same BGP next hop (ASRB2), the receiving PE3 can handle them properly. Therefore, M1 is reachable via two paths. This creates two end-to-end LSPs, from PE3 to PE1 and from PE3 to PE2, for M1 such that when PE3 wants to forward traffic destined to M1, it can load-balance between the two LSPs. Although route resolution for Aliasing routes with the same BGP next hop is not explicitly mentioned in [RFC7432], this is the expected operation; thus, it is elaborated here.
When the AC between the PE2 and the CE fails and PE2 sends NLRI withdrawal for Ethernet A-D per EVI routes, and these withdrawals get propagated and received by the PE3, the PE3 removes the Aliasing route and updates the path list; that is, it removes the path corresponding to the PE2. Therefore, all the corresponding MAC routes for that <ES, EVI> that point to that path list will now have the updated path list with a single path associated with PE1. This action can be considered to be the mass withdrawal at the per-EVI level. The mass withdrawal at the per-EVI level has a longer convergence time than the mass withdrawal at the per-ES level; however, it is much faster than the convergence time when the withdrawal is done on a per-MAC basis.
If a PE becomes detached from a given ES, then, in addition to withdrawing its previously advertised Ethernet A-D per ES routes, it MUST also withdraw its previously advertised Ethernet A-D per EVI routes for that ES. For a remote PE that is separated from the withdrawing PE by one or more EVPN inter-AS Option B ASBRs, the withdrawal of the Ethernet A-D per ES routes is not actionable. However, a remote PE is able to correlate a previously advertised Ethernet A-D per EVI route with any MAC/IP Advertisement routes also advertised by the withdrawing PE for that <ES, EVI, BD>. Hence, when it receives the withdrawal of an Ethernet A-D per EVI route, it SHOULD remove the withdrawing PE as a next hop for all MAC addresses associated with that <ES, EVI, BD>.
In the previous example, when the AC between PE2 and the CE fails, PE2 will withdraw its Ethernet A-D per ES and per EVI routes. When PE3 receives the withdrawal of an Ethernet A-D per EVI route, it removes PE2 as a valid next hop for all MAC addresses associated with the corresponding <ES, EVI, BD>. Therefore, all the MAC next hops for that <ES, EVI, BD> will now have a single next hop, viz. the LSP to PE1.
In summary, it can be seen that Aliasing (and Backup Path) functionality should work as is for inter-AS Option B without requiring any additional functionality in ASBRs or PEs. However, the mass withdrawal functionality falls back from per-ES mode to per-EVI mode for inter-AS Option B. That is, PEs receiving a mass withdrawal route from the same AS take action on Ethernet A-D per ES route; whereas, PEs receiving mass withdrawal routes from different ASes take action on the Ethernet A-D per EVI route.