<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC8174 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
<!ENTITY RFC7432 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7432.xml">
<!ENTITY RFC8365 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8365.xml">
<!ENTITY RFC8584 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8584.xml">
<!ENTITY RFC9135 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9135.xml">
<!ENTITY RFC9136 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9136.xml">
<!ENTITY I-D.ietf-bess-rfc7432bis SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-bess-rfc7432bis.xml">
<!ENTITY I-D.ietf-bess-evpn-virtual-eth-segment SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-bess-evpn-virtual-eth-segment.xml">
<!ENTITY I-D.ietf-bess-evpn-unequal-lb SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-bess-evpn-unequal-lb.xml">
<!ENTITY I-D.ietf-bess-srv6-services SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-bess-srv6-services.xml">
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-sajassi-bess-evpn-ip-aliasing-04"
     ipr="trust200902" submissionType="IETF">
  <!---->

  <?rfc strict="yes"?>

  <?rfc compact="yes"?>

  <?rfc subcompact="no"?>

  <?rfc symrefs="yes"?>

  <?rfc sortrefs="no"?>

  <?rfc text-list-symbols="-o+*"?>

  <?rfc toc="yes"?>

  <front>
    <title abbrev="IP Aliasing Support for EVPN">EVPN Support for L3 Fast
    Convergence and Aliasing/Backup Path</title>

    <author fullname="A. Sajassi" initials="A." role="editor"
            surname="Sajassi">
      <organization>Cisco Systems</organization>

      <address>
        <email>sajassi@cisco.com</email>
      </address>
    </author>

    <author fullname="G. Badoni" initials="G." surname="Badoni">
      <organization>Cisco Systems</organization>

      <address>
        <email>gbadoni@cisco.com</email>
      </address>
    </author>

    <author fullname="P. Warade" initials="P." surname="Warade">
      <organization>Cisco Systems</organization>

      <address>
        <email>pwarade@cisco.com</email>
      </address>
    </author>

    <author fullname="S. Pasupula" initials="S." surname="Pasupula">
      <organization>Cisco Systems</organization>

      <address>
        <email>surpasup@cisco.com</email>
      </address>
    </author>

    <author fullname="L. Krattiger" initials="L." surname="Krattiger">
      <organization>Cisco Systems</organization>

      <address>
        <email>lkrattig@cisco.com</email>
      </address>
    </author>

    <author fullname="J. Drake" initials="J." role="editor" surname="Drake">
      <organization>Juniper</organization>

      <address>
        <email>jdrake@juniper.net</email>
      </address>
    </author>

    <author fullname="J. Rabadan" initials="J." role="editor"
            surname="Rabadan">
      <organization>Nokia</organization>

      <address>
        <postal>
          <street>520 Almanor Avenue</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94085</code>

          <country>USA</country>
        </postal>

        <email>jorge.rabadan@nokia.com</email>
      </address>
    </author>

    <date day="7" month="March" year="2022"/>

    <abstract>
      <t>This document proposes an EVPN extension to allow several of its
      multihoming functions, fast convergence and aliasing/backup path, to be
      used in conjunction with inter-subnet forwarding.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="sect-1" title="Introduction">
      <t>This document proposes an EVPN extension to allow several of its
      multihoming functions, fast convergence and aliasing/backup path, to be
      used in conjunction with inter-subnet forwarding. It re-uses the
      existing EVPN routes, the Ethernet A-D per ES and the Ethernet A-D per
      EVI routes, which are used for these multihoming functions. In
      particular, there are three use-cases that could benefit from the use of
      these multihoming functions:</t>

      <t><list style="letters">
          <t>Inter-subnet forwarding for host routes in symmetric IRB <xref
          target="RFC9135"/>.</t>

          <t>Inter-subnet forwarding for prefix routes in the interface-less
          IP-VRF-to-IP-VRF model <xref target="RFC9136"/>.</t>

          <t>Inter-subnet forwarding for prefix routes when the ESI is used
          exclusively as an L3 construct <xref target="RFC9136"/>.</t>
        </list></t>

      <section anchor="sect-1.1"
               title="Ethernet Segments for Host Routes in Symmetric IRB">
        <t>Consider a pair of multi-homing PEs, PE1 and PE2, as illustrated in
        <xref target="Figure1"/>. Let there be a host H1 attached to them.
        Consider PE3 and a host H3 attached to it.</t>

        <figure anchor="Figure1"
                title="Inter-subnet traffic between Multihoming PEs and Remote PE">
          <artwork><![CDATA[
                               +----------------+
                               |     EVPN       |
                            +------+            |
                            | PE1  | +--->      |
                     +------+      | RT-2       |
                     |      |      | IP1     +--+---+
              +---+  | ES1  +------+ ESI1    | PE3  |
         H1+--+CE1+--+         |             |      +-+H3
              +---+  |      +------+         |      |
                     |      | PE2  |         +--+---+
                     +------+      |            |
                            |      |            |
                            +------+            |
                               |                |
                               +----------------+


]]></artwork>
        </figure>

        <t>With Asymmetric IRB <xref target="RFC9135"/>, if H3 sends
        inter-subnet traffic to H1, routing will happen at PE3. PE3 will be
        attached to the destination IRB interface and will trigger ARP/ND
        requests if it does not have an ARP/ND adjacency to H1. A subsequent
        routing lookup will resolve the destination MAC to H1's MAC address.
        Furthermore, H1's MAC will point to an ECMP EVPN destination on PE1
        and PE2, either due to host route advertisement from both PE1 and PE2,
        or due to Ethernet Segment MAC Aliasing as detailed in <xref
        target="RFC7432"/>.</t>

        <t>With Symmetric IRB <xref target="RFC9135"/>, if H3 sends
        inter-subnet traffic to H1, a routing lookup will happen at PE3's
        IP-VRF and this routing lookup will not yield the destination IRB
        interface and therefore MAC Aliasing is not possible. In order to have
        per-flow load balancing for H3's routed traffic to H1, an IP ECMP list
        (to PE1/PE2) needs to be associated to H1's host route in the IP-VRF
        route-table. If H1 is locally learned only at one of the multi-homing
        PEs, PE1 or PE2, due to LAG hashing, PE3 will not be able to build an
        IP ECMP list for the H1 host route.</t>

        <t>With the extension described in this document, PE3's IP-VRF becomes
        Ethernet-Segment-aware and builds an IP ECMP list for H1 based on the
        advertisement of ES1 along with H1 in a MAC/IP route and the
        availability of ES1 on PE1 and PE2.</t>
      </section>

      <section anchor="sect-1.2"
               title="Inter-subnet Forwarding for Prefix Routes in the Interface-less IP-VRF-to-IP-VRF Model">
        <t>In the Interface-less IP-VRF-to-IP-VRF model described in <xref
        target="RFC9136"/> there is no Overlay Index and hence no recursive
        resolution of the IP Prefix route to either a MAC/IP Advertisement or
        an Ethernet A-D per ES/EVI route, which means that the fast
        convergence and aliasing/backup path functions are disabled. The
        recursive resolution of an IP Prefix route to an Ethernet A-D per
        ES/EVI route is already described in <xref target="RFC9136"/>.</t>

        <t>The scenario illustrated in <xref target="Figure2"/> will be used
        to explain the procedures.</t>

        <figure anchor="Figure2"
                title="Inter-subnet example with IP Prefix routes">
          <artwork><![CDATA[
                               +----------------+
                               |     EVPN       |
                            +------+            |
                            | PE1  | +--->      |
                     +------+      | RT-5       |
                     |      |      | IP1/32  +--+---+
              +---+  | ES1  +------+ ESI1    | PE3  |
         H1+--+CE1+--+         |             |      +-+H3
              +---+  |      +------+         |      |
                     |      | PE2  |         +--+---+
                     +------+      |            |
                            |      |            |
                            +------+            |
                               |                |
                               +----------------+

]]></artwork>
        </figure>

        <t>Consider PE1 and PE2 are multi-homed to CE1 (in an All-Active
        Ethernet Segment ES1), and PE1, PE2 and PE3 are attached to an IP-VRF
        of the same tenant. Suppose H1's host route is learned (via ARP or ND
        snooping) on PE1 only, and PE1 advertises an EVPN IP Prefix route for
        H1's host route. If H3 sends inter-subnet traffic to H1, a routing
        lookup on PE3 would normally yield a single next-hop, i.e., PE1.</t>

        <t>This document proposes the use of the ESI in the IP Prefix route
        and the recursive resolution to A-D per ES/EVI routes advertised from
        PE1 and PE2, so that H1's host route in PE3 can be associated to an IP
        ECMP list (to PE1/PE2) for aliasing purposes.</t>
      </section>

      <section anchor="sect-1.3"
               title="Ethernet Segments for Prefix routes in IP-VRF-to-IP-VRF use-cases">
        <t>This document also enables fast convergence and aliasing/backup
        path to be used even when the ESI is used exclusively as an L3
        construct, in an Interface-less IP-VRF-to-IP-VRF scenario <xref
        target="RFC9136"/>. There are two use cases analyzed and supported by
        this document:</t>

        <t><list style="symbols">
            <t>IP Aliasing for EVPN IP Prefix routes</t>

            <t>Centralized Routing Model</t>
          </list></t>

        <section anchor="sect-1.3.1"
                 title="IP Aliasing for EVPN IP Prefix routes">
          <t>As an example, consider the scenario in <xref target="Figure3"/>
          in which PE1 and PE2 are multi-homed to CE1. However, and contrary
          to CE1 in <xref target="Figure2"/>, in this case the links between
          CE1 and PE1/PE2 are used exclusively for L3 protocols and L3
          forwarding in different BDs, and a BGP session established between
          CE1's loopback address and PE1's IRB address.</t>

          <figure anchor="Figure3" title="Layer-3 Multihoming PEs">
            <artwork><![CDATA[                                      
                                      +-----------------------+
                                      |        EVPN           |
                        PE1           |                       |
                       +-------------------+                  |
                       |       IRB1        |                  |
                       |  +---+   +------+ | ------->         |
              +-----------|BD1|---|IPVRF1| | RT-5             |
      eBGP    |        |  +---+   |      | | 50.0/24          | PE3
   +------------------------>10.1 +------+ | ESI1  +----------------+
   |          |        +-------------------+       | +------+       |
  +-----+10.2 |                       |   ^        | |IPVRF1| +---+ |
  | CE1 |-----+    ES1                |   |        | |      |-|BD3| |
  |     |-----+                       |   +--------| +------+ +---+ |
  +-----+20.2 |         PE2           |        +---|            |   |
  lo1         |        +--------------+----+   |   +------------|---+
  1.1.1.1     |        |       IRB2        |   |              | |
  Prefixes:   |        |  +---+   +------+ |   |              | H4
  50.0/24     +-----------|BD2|---|IPVRF1| |<--+              |
  60.0/24              |  +---+   |      | |                  |
                       |     20.1 +------+ |                  |
                       +-------------------+                  |
                                      |                       |
                                      +-----------------------+
 
  Note: 
    IP addresses expanded by adding 0s 
    E.g., 50.0 expands to 50.0.0.0                                     ]]></artwork>
          </figure>

          <t>In these use-cases, sometimes the CE supports a single BGP
          session to one of the PEs (through which it advertises a number of
          IP Prefixes seating behind itself) and yet, it is desired that
          remote PEs can build an IP ECMP list or backup IP list including all
          the PEs multi-homed to the same CE. For example, in <xref
          target="Figure3"/>, CE1 has a single eBGP neighbor, i.e., PE1.
          Load-balancing for traffic from CE1 to H4 can be accomplished by a
          default route with next-hops PE1 and PE2, however, load-balancing
          from H4 to any of the prefixes attached to CE1 would not be possible
          since only PE1 would advertise EVPN IP Prefix routes for CE1's
          prefixes. This document provides a solution so that PE3 considers
          PE2 as a next-hop in the IP ECMP list for CE1's prefixes, even if
          PE2 did not advertise the IP Prefix routes for those prefixes in the
          first place.</t>
        </section>

        <section anchor="sect-1.3.2" title="Centralized Routing Model">
          <t><xref target="Centralized"/> illustrates a model in which
          multiple CEs establish an eBGP PE-CE session with a Centralized PE.
          </t>

          <figure anchor="Centralized" title="Centralized Routing Model">
            <artwork><![CDATA[                 +-------------------------------+
                 | PE1       EVPN                |
            +----------+                         |
            |  +------+|                         |
            |  |IP-VRF||                         |
     10.1 --------------------+                  |
   +---+    |+--+     ||      |eBGP              |
   |CE1|----||BD|-----+|      |PE-CE             |
   |   |-+  |+--+      |      |50.0/24           | PE3
   +---+ |  +----------+      |NH 10.1      +----------+
Prefixes:|       |            |             |+------+  |
50.0/24  |       |            |             ||IP-VRF|  |
60.0/24  |       | PE2        |   +--------->|    +--+ |
         |  +----------+      |   |         |+----|BD| |
         |  |  +------+|      |   |         |     +--+ |
         |  |  |IP-VRF||      |   |         +----------+
         |  |  |      ||      |   |              |  |
         |  |+--+     ||      |   |RT-5          |  |
         +--||BD|-----+|      |   |50.0/24       | H4
            |+--+      |      |   |ESI1          |
            +----------+      |   |NH PEC        |
                 |            |   |              |
                 |        30.1|   | PEC          |
                 |         +--V---|-+            |
                 |         |+------+|            |
                 +---------||IP-VRF||------------+
                           |+------+|
                           +--------+
  Note: 
    IP addresses expanded by adding 0s 
    E.g., 50.0 expands to 50.0.0.0                                     ]]></artwork>
          </figure>

          <t>The CEs in this case are usually VNFs (Virtual Network Function
          entities) or CNFs (Containerized Network Function entities) and by
          provisioning the same network parameters on all of them, the
          operation gets significantly simplified. The configuration on the
          PEs also gets simplified, since the PE-CE eBGP sessions to the CEs
          are only configured on a centralized PE. In the diagram, CE1 is one
          of these VNF/CNFs that sets up a multi-hop eBGP session to the
          centralized PEC. As an example, CE1 advertises prefix 50.0.0.0/24
          with Next Hop 10.0.0.1 (to PEC) via the multi-hop eBGP session. PEC
          then exports the prefix into a RT-5 route, following the
          Interface-less IP-VRF-to-IP-VRF model <xref target="RFC9136"/>, with
          Next Hop PEC. When H4 sends traffic to an IP address of the subnet
          50.0.0.0/24, the traffic will be forwarded to PEC first, and PEC
          will then forward to PE1 (or PE2). In other words, this model
          simplifies the configuration and operation of the CEs, however, it
          introduces an inefficiency since traffic needs to go through the
          Centralized PE (PEC) instead of going directly to the PE(s) attached
          to the destination CE. The IP Aliasing solution specified in this
          document overcomes this inefficiency and allows traffic from PE3 to
          be forwarded directly to PE1 or PE2, without going through PEC.</t>
        </section>
      </section>

      <section anchor="sect-1.4" title="Terminology and Conventions">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in BCP
        14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>

        <t><list style="symbols">
            <t>IRB: Integrated Routing and Bridging</t>

            <t>IRB Interface: Integrated Bridging and Routing Interface. A
            virtual interface that connects the Bridge Table and the IP-VRF on
            an NVE.</t>

            <t>BD: Broadcast Domain. An EVI may be comprised of one BD
            (VLAN-based or VLAN Bundle services) or multiple BDs (VLAN-aware
            Bundle services).</t>

            <t>Bridge Table: An instantiation of a broadcast domain on a
            MAC-VRF.</t>

            <t>CE: Customer Edge device, e.g., a host, router, or switch.s</t>

            <t>EVI: An EVPN instance spanning the Provider Edge (PE) devices
            participating in that EVPN.</t>

            <t>MAC-VRF: A Virtual Routing and Forwarding table for Media
            Access Control (MAC) addresses on a PE.</t>

            <t>Ethernet Segment (ES): When a customer site (device or network)
            is connected to one or more PEs via a set of Ethernet links, then
            that set of links is referred to as an 'Ethernet segment'.</t>

            <t>Ethernet Segment Identifier (ESI): A unique non-zero identifier
            that identifies an Ethernet segment is called an 'Ethernet Segment
            Identifier'.</t>

            <t>IP-VRF: A VPN Routing and Forwarding table for IP routes on an
            NVE/PE. The IP routes could be populated by any routing protocol,
            E.g., EVPN, IP-VPN and BGP PE-CE IP address families. An IP-VRF is
            also an instantiation of a layer 3 VPN in an NVE/PE.</t>

            <t>EVPN IP route: An EVPN IP Prefix route or an EVPN MAC/IP
            Advertisement route.</t>

            <t>LACP: Link Aggregation Control Protocol.</t>

            <t>PE: Provider Edge device.</t>

            <t>Single-Active Redundancy Mode: When only a single PE, among all
            the PEs attached to an Ethernet segment, is allowed to forward
            traffic to/from that Ethernet segment for a given VLAN, then the
            Ethernet segment is defined to be operating in Single-Active
            redundancy mode.</t>

            <t>All-Active Redundancy Mode: When all PEs attached to an
            Ethernet segment are allowed to forward known unicast traffic
            to/from that Ethernet segment for a given VLAN, then the Ethernet
            segment is defined to be operating in All-Active redundancy
            mode.</t>

            <t>RT-2: EVPN MAC/IP Advertisement route, as specified in <xref
            target="RFC7432"/>.</t>

            <t>RT-4: EVPN Ethernet Segment route, as specified in <xref
            target="RFC7432"/>.</t>

            <t>RT-5: EVPN IP Prefix route, as specified in <xref
            target="RFC9136"/>.</t>
          </list></t>
      </section>
    </section>

    <section anchor="sect-2"
             title="Ethernet Segments for L3 Aliasing/Backup Path and Fast Convergence ">
      <t>The first two use cases described in <xref target="sect-1"/> do not
      require any extensions to the Ethernet Segment definition and both cases
      support Ethernet Segments as a set of Ethernet links and specified in
      <xref target="RFC7432"/>, or virtual Ethernet Segments as a set of
      logical links specified in <xref
      target="I-D.ietf-bess-evpn-virtual-eth-segment"/>.</t>

      <t>The third use case in <xref target="sect-1"/> requires an extension
      to the way Ethernet Segments are defined and associated. In this case,
      the Ethernet Segment is a Layer-3 construct characterized as
      follows:</t>

      <t><list style="numbers">
          <t>The ES is defined as a set of Layer-3 links to the multi-homed CE
          and its state MUST be linked to the layer-3 reachability from each
          multi-homed PE to the CE's loopback address via a non-EVPN route in
          the PE's IP-VRF.</t>

          <t>The ESI SHOULD be of type 4 <xref target="RFC7432"/> and set to
          the router ID of the multi-homed CE.</t>

          <t>All-active or single-active multi-homing redundancy modes are
          supported, however, the redundancy mode only affects the procedures
          in <xref target="sect-3"/>.</t>

          <t>PEs attached to the same Layer-3 ES discover each other through
          the exchange of RT-4 routes (Ethernet Segment routes). DF Election
          procedures <xref target="RFC8584"/> MAY be used for single-active
          multi-homing mode.</t>

          <t>The routes advertised from the multi-homed CE's and installed in
          the PE's IP-VRF table with the CE's loopback as the next-hop MUST be
          re-advertised by the PE in EVPN IP Prefix routes with the ESI of the
          CE. The rest of the EVPN IP Prefix routes fields are set as per the
          Interface-less model in <xref target="RFC9136"/>. Note that the BGP
          PE-CE routes advertised by the multi-homed CE are installed in the
          IP-VRF normally irrespective of the Next Hop being resolved to an
          EVPN or a non-EVPN route, and they are exported as a RT-5 with the
          ESI.</t>
        </list>In the example depicted in <xref target="Figure3"/>, ES1 is
      defined as the set of layer-3 links that connects PE1 and PE2 to CE1.
      Its ESI, e.g., ESI-1, is derived as a type 4 ESI using the CE's router
      ID. ES-1 will be operationally up in the PE as long as CE1's loopback
      route is installed in the PE's IP-VRF and learned via any routing
      protocol except for an EVPN route. E.g., an active static route to
      1.1.1.1 via next-hop 10.0.0.2 would make the ES operationally up in PE1,
      and the eBGP routes received from CE1 with next-hop 1.1.1.1 will be
      re-advertised as RT-5 routes with ESI-1.</t>

      <t>In the example illustrated in <xref target="Centralized"/>, ES1 is a
      set of layer-3 links connecting PE1, PE2 and PEC to CE1. ESI-1 is
      derived as a type 4 ESI using the CE's router ID, as in the previous
      example. CE1's loopback route (which is associated to ES1) is installed
      in PE1 and PE2 via non-EVPN route, hence ES1 is operationally up in PE1
      and PE2. On PE-C though, CE1's loopback is installed via EVPN IP Prefix
      route, therefore, as per point 1 in the current section, ES1 is
      operationally down in PEC. As per point 5, this does not prevent PEC
      from exporting CE1's prefixes into RT-5 routes with ESI-1. However,
      since ES-1 is operationally down in PEC, no IP A-D per EVI routes (<xref
      target="sect-3"/>) and no IP A-D per ES routes <xref target="sect-4"/>
      for ESI-1 will be advertised from PEC, preventing PEC from attracting
      traffic destined to CE1.</t>
    </section>

    <section anchor="sect-3" title="IP Aliasing and Backup Path">
      <t>In order to address the use-cases described in <xref
      target="sect-1"/>, above, this document proposes that:<list
          style="numbers">
          <t>A PE that is attached to a given ES will advertise a set of one
          or more Ethernet A-D per ES routes for that ES. Each is termed an
          &lsquo;IP A-D per ES&rsquo; route and is tagged with the route
          targets (RTs) for one or more of the IP-VRFs defined on it for that
          ES; the complete set of IP A-D per ES routes contains the RTs for
          all of the IP-VRFs defined on it for that ES. <vspace
          blankLines="1"/>A remote PE imports an IP A-D per ES route into the
          IP-VRFs corresponding to the RTs with which the route is tagged.
          When the complete set of IP A-D per ES routes has been processed, a
          remote PE will have imported an IP A-D per ES route into each of the
          IP-VRFs defined on it for that ES; this enables fast convergence for
          each of these IP-VRFs.</t>

          <t>A PE advertises for this ES, an Ethernet A-D Per EVI route for
          each of the IP-VRFs defined on it. Each is termed an &lsquo;IP A-D
          per EVI&rsquo; route and is tagged with the RT for a given IP-VRF,
          and conveys a label that identifies that IP-VRF. <vspace
          blankLines="1"/>A remote PE imports an IP A-D per EVI route into the
          IP-VRF corresponding to the RT with which the route is tagged. The
          label contained in the route enables aliasing/backup path for the
          routes in that IP-VRF.</t>
        </list></t>

      <t>To address the third use-case described in <xref target="sect-1"/>,
      where the links between a CE and its multihomed PEs are used exclusively
      for L3 protocols and L3 forwarding, a PE uses the procedures described
      in 1) and 2), above.</t>

      <t>The processing of the IP A-D per ES and the IP A-D per EVI routes is
      as defined in <xref target="RFC7432"/> and <xref target="RFC8365"/>
      except that the fast convergence and aliasing/backup path functions
      apply to the routes contained in an IP-VRF. In particular, a remote PE
      that receives an EVPN MAC/IP Advertisement route or an IP Prefix route
      with a non-reserved ESI and the RT of a particular IP-VRF SHOULD
      consider it reachable by every PE that has advertised an IP A-D per ES
      and IP A-D per EVI route for that ESI and IP-VRF.</t>

      <section anchor="sect-3.1"
               title="Constructing the IP A-D per EVI Route ">
        <t>The construction of the IP A-D per EVI route is the same as that of
        the Ethernet A-D per EVI route, as described in <xref
        target="RFC7432"/>, with the following exceptions:</t>

        <t><list style="symbols">
            <t>The Route-Distinguisher is for the corresponding IP-VRF.</t>

            <t>The Ethernet Tag should be set to 0.</t>

            <t>The route SHOULD carry the Route Target of the corresponding
            IP-VRF.</t>

            <t>The route MUST carry the MPLS label, VNI (VXLAN Network
            Identifier <xref target="RFC8365"/>) or Segment Routing IPv6 SID
            (Segment Identifier <xref target="I-D.ietf-bess-srv6-services"/>)
            that identifies the corresponding IP-VRF.</t>

            <t>The route MUST carry the PE&rsquo;s MAC Extended Community if
            the encapsulation used between the PEs for inter-subnet forwarding
            is an Ethernet NVO tunnel <xref target="RFC9136"/>.</t>

            <t>The route SHOULD carry the EVPN Layer 2 Extended Community
            <xref target="I-D.ietf-bess-rfc7432bis"/>. For all-active
            multihoming, all PEs attached to the specified ES will advertise
            P=1. For backup path, the Primary PE will advertise P=1 and the
            Backup PE will advertise P=0, B=1.<list style="symbols">
                <t>The Primary PE SHOULD be a PE with a routing adjacency to
                the attached CE.</t>

                <t>The Primary PE MAY be determined by policy or MAY be
                elected by a DF Election as in <xref target="RFC8584"/> as
                described in <xref target="sect-2"/>.</t>
              </list></t>
          </list></t>
      </section>
    </section>

    <section anchor="sect-4" title="Fast Convergence for Routed Traffic">
      <t>Host or Prefix reachability is learned via the BGP-EVPN control plane
      over the MPLS/NVO network. EVPN IP routes for a given ES are advertised
      by one or more of the PEs attached to that ES. When one of these PEs
      fails, a remote PE needs to quickly invalidate the EVPN IP routes
      received from it.</t>

      <t>To accomplish this, EVPN defined the fast convergence function
      specified in <xref target="RFC7432"/>. This document extends fast
      convergence to inter-subnet forwarding by having each PE advertise a set
      of one or more IP A-D per ES routes for each locally attached Ethernet
      segment (refer to <xref target="sect-4.1"/> below for details on how
      these routes are constructed). A PE may need to advertise more than one
      IP A-D per ES route for a given ES because the ES may be in a
      multiplicity of IP-VRFs and the Route Targets for all of these IP-VRFs
      may not fit into a single route. Advertising a set of IP A-D per ES
      routes for the ES allows each route to contain a subset of the complete
      set of Route Targets. Each IP A-D per ES route is differentiated from
      the other routes in the set by a different Route Distinguisher (RD).</t>

      <t>Upon failure in connectivity to the attached ES, the PE withdraws the
      corresponding set of IP A-D per ES routes. This triggers all PEs that
      receive the withdrawal to update their next-hop adjacencies for all IP
      addresses associated with the Ethernet Segment in question, across
      IP-VRFs. If no other PE has advertised an IP A-D per ES route for the
      same Ethernet Segment, then the PE that received the withdrawal simply
      invalidates the IP entries for that segment. Otherwise, the PE updates
      its next-hop adjacencies accordingly.</t>

      <t>These routes should be processed with higher priority than EVPN IP
      route withdrawals upon failure. Similar priority processing is needed
      even on the intermediate Route Reflectors.</t>

      <section anchor="sect-4.1"
               title="Constructing IP A-D per Ethernet Segment Route">
        <t>This section describes the procedures used to construct the IP A-D
        per ES route, which is used for fast convergence (as discussed in
        <xref target="sect-4"/>). The usage/construction of this route remains
        similar to that described in section 8.2.1. of <xref
        target="RFC7432"/> with a few notable exceptions as explained in
        following sections.</t>

        <section anchor="sect-4.1.1" title="IP A-D per ES Route Targets">
          <t>Each IP A-D per ES route MUST carry one or more Route Targets.
          The set of IP A-D per ES routes MUST carry the entire set of IP-VRF
          Route Targets for all the IP-VRFs defined on that ES.</t>
        </section>
      </section>

      <section anchor="sect-4.2"
               title="Avoiding convergence issues by synchronizing IP prefixes">
        <t>Consider a pair of multi-homing PEs, PE1 and PE2. Let there be a
        host H1 attached to them. Consider PE3 and a host H3 attached to
        it.</t>

        <t>If the host H1 is learned on both the PEs, the ECMP path list is
        formed on PE3 pointing to (PE1/PE2). Traffic from H3 to H1 is not
        impacted even if one of the PEs fails as the path list gets corrected
        upon receiving the withdrawal of the fast convergence route(s) (IP A-D
        per ES routes).</t>

        <t>In a case where H1 is locally learned only on PE1 due to LAG
        hashing or a single routing protocol adjacency to PE1, at PE3, H1 has
        ECMP path list (PE1/PE2) using Aliasing as described in this document.
        Traffic from H3 can reach H1 via either PE1 or PE2.</t>

        <t>PE2 should install local forwarding state for EVPN IP routes
        advertised by other PEs attached to the same ES (i.e., PE1) but not
        advertise them as local routes. When the traffic from H3 reaches PE2,
        PE2 will be able forward the traffic to H1 without any convergence
        delay (caused by triggering ARP/ND to H1 or to the next-hop to reach
        H1). The synchronization of the EVPN IP routes across all PEs of the
        same Ethernet Segment is important to solve convergence issues.</t>
      </section>

      <section anchor="sect-4.3"
               title="Handling Silent Host MAC/IP route for IP Aliasing">
        <t>Consider the example of <xref target="Figure1"/> for IP aliasing.
        If PE1 fails, PE3 will receive the withdrawal of the fast convergence
        route(s) and update the ECMP list for H1 to be just PE2. When the EVPN
        IP route for H1 is also withdrawn, neither PE2 nor PE3 will have a
        route to H1, and traffic from H3 to H1 is blackholed until PE2 learns
        H1 and advertises an EVPN IP route for it.</t>

        <t>This blackholing can be much worse if the H1 behaves like a silent
        host. IP address of H1 will not be re-learned on PE2 till H1 ARP/ND
        messages or some traffic triggers ARP/ND for H1.</t>

        <t>PE2 can detect the failure of PE1's reachability in different
        ways:</t>

        <t><list style="letters">
            <t>When PE1 fails, the next hop tracking to PE1 in the underlay
            routing protocols can help detect the failure.</t>

            <t>Upon the failure of its link to CE1, PE1 will withdraw its IP
            A-D route(s) and PE2 can use this as a trigger to detect
            failure.</t>
          </list>Thus to avoid blackholing, when PE2 detects loss of
        reachability to PE1, it should trigger ARP/ND requests for all remote
        IP prefixes received from PE1 across all affected IP-VRFs. This will
        force host H1 to reply to the solicited ARP/ND messages from PE2 and
        refresh both MAC and IP for the corresponding host in its tables.</t>

        <t>Even in core failure scenario on PE1, PE1 must withdraw all its
        local layer-2 connectivity, as Layer-2 traffic should not be received
        by PE1. So when ARP/ND is triggered from PE2 the replies from host H1
        can only be received by PE2. Thus H1 will be learned as local route
        and also advertised from PE2.</t>

        <t>It is recommended to have a staggered or delayed deletion of the
        EVPN IP routes from PE1, so that ARP/ND refresh can happen on PE2
        before the deletion.</t>
      </section>

      <section anchor="sect-4.4" title="MAC Aging">
        <t>In the same example as in <xref target="sect-4.3"/>, PE1 would do
        ARP/ND refresh for H1 before it ages out. During this process, H1 can
        age out genuinely or due to the ARP/ND reply landing on PE2. PE1 must
        withdraw the local entry from BGP when H1 entry ages out. PE1 deletes
        the entry from the local forwarding only when there are no remote
        synced entries.</t>
      </section>
    </section>

    <section anchor="sect-5"
             title="Determining Reachability to Unicast IP Addresses">
      <section title="Local Learning">
        <t>The procedures for local learning do not change from <xref
        target="RFC7432"/> or <xref target="RFC9136"/>.</t>
      </section>

      <section anchor="sect-5.2" title="Remote Learning">
        <t>The procedures for remote learning do not change from <xref
        target="RFC7432"/> or <xref target="RFC9136"/>.</t>
      </section>

      <section anchor="sect-5.3" title="Constructing the EVPN IP Routes">
        <t>The procedures for constructing MAC/IP Address or IP Prefix
        Advertisements do not change from <xref target="RFC7432"/> or <xref
        target="RFC9136"/>.</t>

        <section anchor="sect-5.3.1" title="Route Resolution">
          <t>If the ESI field is set to reserved values of 0 or MAX-ESI, the
          EVPN IP route resolution MUST be based on the EVPN IP route
          alone.</t>

          <t>If the ESI field is set to a non-reserved ESI, the EVPN IP route
          resolution MUST happen only when both the EVPN IP route and the
          associated set of IP A-D per ES routes have been received. To
          illustrate this with an example, consider a pair of multi-homed PEs,
          PE1 and PE2, connected to an all-active Ethernet Segment. A given
          host with IP address H1 is learned by PE1 but not by PE2. When the
          EVPN IP route from PE1 and a set of IP A-D per ES and IP A-D per EVI
          routes from PE1 and PE2 are received, then (1) PE3 can forward
          traffic destined to H1 to both PE1 and PE2.</t>

          <t>If after (1) PE1 withdraws the IP A-D per ES route, then PE3 will
          forward the traffic to PE2 only.</t>

          <t>If after (1) PE2 withdraws the IP A-D per ES route, then PE3 will
          forward the traffic to PE1 only.</t>

          <t>If after (1) PE1 withdraws the EVPN IP route, then PE3 will do
          delayed deletion of H1, as described in <xref
          target="sect-4.3"/>.</t>

          <t>If after (1) PE2 advertised the EVPN IP route, but PE1 withdraws
          it, PE3 will continue forwarding to both PE1 and PE2 as long as it
          has the IP A-D per ES and the IP A-D per EVI route from both.</t>
        </section>
      </section>
    </section>

    <section anchor="sect-6" title="Forwarding Unicast Packets">
      <t>Refer to Section 5 in <xref target="RFC9135"/> and <xref
      target="RFC9136"/>.</t>
    </section>

    <section anchor="sect-7" title="Load Balancing of Unicast Packets">
      <t>The procedures for load balancing of Unicast Packets do not change
      from <xref target="RFC7432"/></t>
    </section>

    <section anchor="sect-8"
             title="IP Aliasing and Unequal ECMP for IP Prefix Routes">
      <t><xref target="I-D.ietf-bess-evpn-unequal-lb"/> specifies the use of
      the EVPN Link bandwidth extended community to achieve weighted load
      balancing to an ES or Virtual ES for unicast traffic. The procedures in
      <xref target="I-D.ietf-bess-evpn-unequal-lb"/> MAY be used along with
      the procedures described in this document for any of the three cases
      described in <xref target="sect-1"/>, with the following
      considerations:</t>

      <t><list style="symbols">
          <t>The ES weight is signaled by the multi-homed PEs in the IP A-D
          per ES routes.</t>

          <t>The remote ingress PE learning an EVPN IP Route to prefix/host P
          that is associated to a weighted load balancing ES, will follow the
          procedures in <xref target="I-D.ietf-bess-evpn-unequal-lb"/> to
          influence the load balancing for traffic to P.</t>

          <t><xref target="I-D.ietf-bess-evpn-unequal-lb"/> also allows the
          use of the EVPN Link Bandwidth Extended Community along with RT-5s.
          If the ingress PE learns a prefix P via a non-reserved ESI RT-5
          route with a weight (for which IP A-D per ES routes also signal a
          weight) and a zero ESI RT-5 that includes a weight, the ingress PE
          will consider all the PEs attached to the ES as a single PE when
          normalizing weights.<vspace blankLines="1"/>As an example, consider
          PE1 and PE2 are attached to ES-1 and PE1 advertises an RT-5 for
          prefix P with ESI-1 (and EVPN Link Bandwidth of 1). Consider PE3
          advertises an RT-5 for P with ESI=0 and EVPN Link Bandwidth of 2. If
          PE1 and PE2 advertise an EVPN Link Bandwidth of 1 and 2,
          respectively, in the IP A-D per ES routes for ES-1, an ingress PE4
          SHOULD assign a normalized weight of 1 to ES-1 and a normalized
          weight of 2 to PE3. When PE4 sprays the flows to P, it will send
          twice as many flows to PE3. For the flows sent to ES-1, the
          individual PE EVPN Link Bandwidths advertised in the IP A-D per ES
          routes will be considered.</t>
        </list></t>
    </section>

    <section anchor="sect-9" title="Security Considerations">
      <t>The mechanisms in this document use EVPN control plane as defined in
      <xref target="RFC7432"/>. Security considerations described in <xref
      target="RFC7432"/> are equally applicable. This document uses MPLS and
      IP-based tunnel technologies to support data plane transport. Security
      considerations described in <xref target="RFC7432"/> and in <xref
      target="RFC8365"/> are equally applicable.</t>
    </section>

    <section anchor="sect-10" title="IANA Considerations">
      <t>No IANA considerations.</t>
    </section>

    <section title="Contributors">
      <t/>
    </section>

    <section title="Acknowledgments">
      <t/>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &RFC7432;

      &RFC8365;

      &RFC2119;

      &RFC8174;

      &RFC8584;

      &RFC9135;

      &RFC9136;

      &I-D.ietf-bess-rfc7432bis;
    </references>

    <references title="Informative References">
      &I-D.ietf-bess-evpn-virtual-eth-segment;

      &I-D.ietf-bess-evpn-unequal-lb;

      &I-D.ietf-bess-srv6-services;
    </references>
  </back>
</rfc>
