5.1. VXLAN/NVGRE Encapsulation
5.1. VXLAN/NVGRE Encapsulation
Both VXLAN and NVGRE are examples of technologies that provide a data plane encapsulation which is used to transport a packet over the common physical IP infrastructure between Network Virtualization Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN network. Both of these technologies include the identifier of the specific NVO instance, VNI in VXLAN and VSID in NVGRE, in each packet. In the remainder of this document we use VNI as the representation for NVO instance with the understanding that VSID can equally be used if the encapsulation is NVGRE unless it is stated otherwise.
Note that a PE is equivalent to an NVE/VTEP.
VXLAN encapsulation is based on UDP, with an 8-byte header following the UDP header. VXLAN provides a 24-bit VNI, which typically provides a one-to-one mapping to the tenant VID, as described in [RFC7348]. In this scenario, the ingress VTEP does not include an inner VLAN tag on the encapsulated frame, and the egress VTEP discards the frames with an inner VLAN tag. This mode of operation in [RFC7348] maps to VLAN-Based Service in [RFC7432], where a tenant VID gets mapped to an EVI.
VXLAN also provides an option of including an inner VLAN tag in the encapsulated frame, if explicitly configured at the VTEP. This mode of operation can map to VLAN Bundle Service in [RFC7432] because all the tenant's tagged frames map to a single bridge table / MAC-VRF, and the inner VLAN tag is not used for lookup by the disposition PE when performing VXLAN decapsulation as described in Section 6 of [RFC7348].
[RFC7637] encapsulation is based on GRE encapsulation, and it mandates the inclusion of the optional GRE Key field, which carries the VSID. There is a one-to-one mapping between the VSID and the tenant VID, as described in [RFC7637]. The inclusion of an inner VLAN tag is prohibited. This mode of operation in [RFC7637] maps to VLAN Based Service in [RFC7432].
As described in the next section, there is no change to the encoding of EVPN routes to support VXLAN or NVGRE encapsulation, except for the use of the BGP Encapsulation Extended Community to indicate the encapsulation type (e.g., VXLAN or NVGRE). However, there is potential impact to the EVPN procedures depending on where the NVE is located (i.e., in hypervisor or ToR) and whether multihoming capabilities are required.
5.1.1. Virtual Identifiers Scope
Although VNIs are defined as 24-bit globally unique values, there are scenarios in which it is desirable to use a locally significant value for the VNI, especially in the context of a data-center interconnect.
5.1.1.1. Data-Center Interconnect with Gateway
In the case where NVEs in different data centers need to be interconnected, and the NVEs need to use VNIs as globally unique identifiers within a data center, then a Gateway (GW) needs to be employed at the edge of the data-center network (DCN). This is because the Gateway will provide the functionality of translating the VNI when crossing network boundaries, which may align with operator span-of-control boundaries. As an example, consider the network of Figure 1. Assume there are three network operators: one for each of the DC1, DC2, and WAN networks. The Gateways at the edge of the data centers are responsible for translating the VNIs between the values used in each of the DCNs and the values used in the WAN.
+--------------+
| |
+---------+ | WAN | +---------+
+----+ | +---+ +----+ +----+ +---+ | +----+
|NVE1|-| | | |WAN | |WAN | | | |-|NVE3|
+----+ |IP |GW |-|Edge| |Edge|--|GW | IP | +----+
+----+ |Fabric +---+ +----+ +----+ +---+ Fabric | +----+
|NVE2|-| | | | | |-|NVE4|
+----+ +---------+ +--------------+ +---------+ +----+
|<------ DC 1 ------> <------ DC2 ------>|
Figure 1: Data-Center Interconnect with Gateway
5.1.1.2. Data-Center Interconnect without Gateway
In the case where NVEs in different data centers need to be interconnected, and the NVEs need to use locally assigned VNIs (e.g., similar to MPLS labels), there may be no need to employ Gateways at the edge of the DCN. More specifically, the VNI value that is used by the transmitting NVE is allocated by the NVE that is receiving the traffic (in other words, this is similar to a "downstream-assigned" MPLS label). This allows the VNI space to be decoupled between different DCNs without the need for a dedicated Gateway at the edge of the data centers. This topic is covered in Section 10.2.
+--------------+
| |
+---------+ | WAN | +---------+
+----+ | | +----+ +----+ | | +----+
|NVE1|-| | |ASBR| |ASBR| | |-|NVE3|
+----+ |IP Fabric|---| | | |--|IP Fabric| +----+
+----+ | | +----+ +----+ | | +----+
|NVE2|-| | | | | |-|NVE4|
+----+ +---------+ +--------------+ +---------+ +----+
|<------ DC 1 -----> <---- DC2 ------>|
Figure 2: Data-Center Interconnect with ASBR
5.1.2. Virtual Identifiers to EVI Mapping
Just like in [RFC7432], where two options existed for mapping broadcast domains (represented by VLAN IDs) to an EVI, when the EVPN control plane is used in conjunction with VXLAN (or NVGRE encapsulation), there are also two options for mapping broadcast domains represented by VXLAN VNIs (or NVGRE VSIDs) to an EVI:
Option 1: A Single Broadcast Domain per EVI
In this option, a single Ethernet broadcast domain (e.g., subnet) represented by a VNI is mapped to a unique EVI. This corresponds to the VLAN-Based Service in [RFC7432], where a tenant-facing interface, logical interface (e.g., represented by a VID), or physical interface gets mapped to an EVI. As such, a BGP Route Distinguisher (RD) and Route Target (RT) are needed per VNI on every NVE. The advantage of this model is that it allows the BGP RT constraint mechanisms to be used in order to limit the propagation and import of routes to only the NVEs that are interested in a given VNI. The disadvantage of this model may be the provisioning overhead if the RD and RT are not derived automatically from the VNI.
In this option, the MAC-VRF table is identified by the RT in the control plane and by the VNI in the data plane. In this option, the specific MAC-VRF table corresponds to only a single bridge table.
Option 2: Multiple Broadcast Domains per EVI
In this option, multiple subnets, each represented by a unique VNI, are mapped to a single EVI. For example, if a tenant has multiple segments/subnets each represented by a VNI, then all the VNIs for that tenant are mapped to a single EVI; for example, the EVI in this case represents the tenant and not a subnet. This corresponds to the VLAN-aware bundle service in [RFC7432]. The advantage of this model is that it doesn't require the provisioning of an RD/RT per VNI. However, this is a moot point when compared to Option 1 where auto-derivation is used. The disadvantage of this model is that routes would be imported by NVEs that may not be interested in a given VNI.
In this option, the MAC-VRF table is identified by the RT in the control plane; a specific bridge table for that MAC-VRF is identified by the <RT, Ethernet Tag ID> in the control plane. In this option, the VNI in the data plane is sufficient to identify a specific bridge table.
5.1.2.1. Auto-Derivation of RT
In order to simplify configuration, when the option of a single VNI per EVI is used, the RT used for EVPN can be auto-derived. RD can be auto-generated as described in [RFC7432], and RT can be auto-derived as described next.
Since a Gateway PE as depicted in Figure 1 participates in both the DCN and WAN BGP sessions, it is important that, when RT values are auto-derived from VNIs, there be no conflict in RT spaces between DCNs and WANs, assuming that both are operating within the same Autonomous System (AS). Also, there can be scenarios where both VXLAN and NVGRE encapsulations may be needed within the same DCN, and their corresponding VNIs are administered independently, which means VNI spaces can overlap. In order to avoid conflict in RT spaces, the 6-byte RT values with 2-octet AS number for DCNs can be auto-derived as follow:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator | Local Administrator |
+-----------------------------------------------+---------------+
| Local Administrator (Cont.) |
+-------------------------------+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator |A| TYPE| D-ID | Service ID |
+-----------------------------------------------+---------------+
| Service ID (Cont.) |
+-------------------------------+
The 6-octet RT field consists of two sub-fields:
-
Global Administrator sub-field: 2 octets. This sub-field contains an AS number assigned by IANA
<https://www.iana.org/assignments/as-numbers/>. -
Local Administrator sub-field: 4 octets
-
A: A single-bit field indicating if this RT is auto-derived
- 0: auto-derived
- 1: manually derived
-
Type: A 3-bit field that identifies the space in which the other 3 bytes are defined. The following spaces are defined:
- 0: VID (802.1Q VLAN ID)
- 1: VXLAN
- 2: NVGRE
- 3: I-SID
- 4: EVI
- 5: dual-VID (QinQ VLAN ID)
-
D-ID: A 4-bit field that identifies domain-id. The default value of domain-id is zero, indicating that only a single numbering space exist for a given technology. However, if more than one number space exists for a given technology (e.g., overlapping VXLAN spaces), then each of the number spaces need to be identified by its corresponding domain-id starting from 1.
-
Service ID: This 3-octet field is set to VNI, VSID, I-SID, or VID.
-
It should be noted that RT auto-derivation is applicable for 2-octet AS numbers. For 4-octet AS numbers, the RT needs to be manually configured because 3-octet VNI fields cannot be fit within the 2-octet local administrator field.
5.1.3. Constructing EVPN BGP Routes
In EVPN, an MPLS label, for instance, identifying the forwarding table is distributed by the egress PE via the EVPN control plane and is placed in the MPLS header of a given packet by the ingress PE. This label is used upon receipt of that packet by the egress PE for disposition of that packet. This is very similar to the use of the VNI by the egress NVE, with the difference being that an MPLS label has local significance while a VNI typically has global significance. Accordingly, and specifically to support the option of locally assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement route, the MPLS label field in the Ethernet A-D per EVI route, and the MPLS label field in the P-Multicast Service Interface (PMSI) Tunnel attribute of the Inclusive Multicast Ethernet Tag (IMET) route are used to carry the VNI. For the balance of this memo, the above MPLS label fields will be referred to as the VNI field. The VNI field is used for both local and global VNIs; for either case, the entire 24-bit field is used to encode the VNI value.
For the VLAN-Based Service (a single VNI per MAC-VRF), the Ethernet Tag field in the MAC/IP Advertisement, Ethernet A-D per EVI, and IMET route MUST be set to zero just as in the VLAN-Based Service in [RFC7432].
For the VLAN-Aware Bundle Service (multiple VNIs per MAC-VRF with each VNI associated with its own bridge table), the Ethernet Tag field in the MAC Advertisement, Ethernet A-D per EVI, and IMET route MUST identify a bridge table within a MAC-VRF; the set of Ethernet Tags for that EVI needs to be configured consistently on all PEs within that EVI. For locally assigned VNIs, the value advertised in the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware bundle service in [RFC7432]. Such setting must be done consistently on all PE devices participating in that EVI within a given domain. For global VNIs, the value advertised in the Ethernet Tag field SHOULD be set to a VNI as long as it matches the existing semantics of the Ethernet Tag, i.e., it identifies a bridge table within a MAC-VRF and the set of VNIs are configured consistently on each PE in that EVI.
In order to indicate which type of data-plane encapsulation (i.e., VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP Encapsulation Extended Community defined in [RFC5512] is included with all EVPN routes (i.e., MAC Advertisement, Ethernet A-D per EVI, Ethernet A-D per ESI, IMET, and Ethernet Segment) advertised by an egress PE. Five new values have been assigned by IANA to extend the list of encapsulation types defined in [RFC5512]; they are listed in Section 11.
The MPLS encapsulation tunnel type, listed in Section 11, is needed in order to distinguish between an advertising node that only supports non-MPLS encapsulations and one that supports MPLS and non-MPLS encapsulations. An advertising node that only supports MPLS encapsulation does not need to advertise any encapsulation tunnel types; i.e., if the BGP Encapsulation Extended Community is not present, then either MPLS encapsulation or a statically configured encapsulation is assumed.
The Next Hop field of the MP_REACH_NLRI attribute of the route MUST be set to the IPv4 or IPv6 address of the NVE. The remaining fields in each route are set as per [RFC7432].
Note that the procedure defined here -- to use the MPLS Label field to carry the VNI in the presence of a Tunnel Encapsulation Extended Community specifying the use of a VNI -- is aligned with the procedures described in Section 8.2.2.2 of [TUNNEL-ENCAP] ("When a Valid VNI has not been Signaled").