<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-filsfils-rtgwg-lightweight-host-routing-00"
     ipr="trust200902">
  <front>
    <title abbrev="Lightweight Host Routing using LLDP">Lightweight Host
    Routing using LLDP</title>

    <author fullname="Clarence Filsfils" initials="C" surname="Filsfils">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>Belgium</country>
        </postal>

        <email>cf@cisco.com</email>
      </address>
    </author>

    <author fullname="Pablo Camarillo" initials="P" role="editor"
            surname="Camarillo">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>Spain</country>
        </postal>

        <email>pcamaril@cisco.com</email>
      </address>
    </author>

    <author fullname="Daniel Bernier" initials="D" surname="Bernier">
      <organization>Bell Canada</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>Canada</country>
        </postal>

        <email>daniel.bernier@bell.ca</email>
      </address>
    </author>

    <date year=""/>

    <area>Routing</area>

    <workgroup>Routing Area</workgroup>

    <keyword>LLDP</keyword>

    <abstract>
      <t>Link Layer Discovery Protocol (LLDP) is widely deployed today for
      discovery of information across network elements like routers, switches,
      and hosts. This document extends LLDP to allow hosts to advertise their
      IP prefixes to their attached routers which can then propagate the
      reachability of these host prefixes into routing protocols for enabling
      network-wide connectivity.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="INTRO" title="Introduction">
      <t>Link Layer Discovery Protocol (LLDP) <xref target="LLDP"/> is widely
      deployed today for discovery of information across network elements like
      routers, switches, and hosts. LLDP is supported by most switches and
      routers as well as open source implementations available for hosts. The
      protocol is often used to discover connections between networking
      elements, build topology information as well as for monitoring and
      troubleshooting.</t>

      <t>In a typical layer-3 data center (DC), servers (i.e., hosts) are
      connected to the leaf routers as show in <xref target="L3DC"/> below.
      These Layer-3 DCs, which typically run BGP routing protocol, are
      described in <xref target="RFC7938"/>. These servers run applications
      either natively or within containers or virtual machines (VMs). These
      applications are allocated IP addresses from the IP prefixes assigned to
      the servers. Furthermore, these server IP prefixes need to be advertised
      into the DC network and beyond via routing protocols to provide
      reachability for the application. The server IP prefixes used by
      applications need not be in the same subnet as the layer-3 link that
      connects the server hosts to leaf routers. This requires a mechanism for
      the leaf routers to discover the server IP prefixes connected to it.</t>

      <figure align="center" anchor="L3DC"
              title="5-stage Clos Layer-3 DC Topology">
        <artwork align="left"><![CDATA[
                    Tier 2       Tier 1       Tier 2
                   +-----+      +-----+      +-----+
     +-------------| T2A |------| T1A |------| T2C |-------------+
     |       +-----|     |--++--|     |--++--|     |-----+       |
     |       |     +-----+  ||  +-----+  ||  +-----+     |       |
     |       |              ||           ||              |       |
     |       |     +-----+  ||  +-----+  ||  +-----+     |       |
     | +-----+-----| T2B |--++--| T1B |--++--| T2D |-----+-----+ |
     | |     | +---|     |------|     |------|     |---+ |     | |
     | |     | |   +-----+      +-----+      +-----+   | |     | |
     | |     | |                                       | |     | |
   +-----+ +-----+                                   +-----+ +-----+
   | T3  | | T3  |         <- Leaf Routers ->        | T3  | | T3  |
   |  A  | |  B  | Tier 3                     Tier 3 |  C  | |  D  |
   +-----+ +-----+                                   +-----+ +-----+
     | |     | |                                       | |     | |
     O O     O O             <- Servers ->             O O     O O

]]></artwork>
      </figure>

      <t>Typically, the advertisement of server IP prefixes is done by running
      a BGP stack on the server and establishing BGP sessions to BGP running
      on the leaf routers. This requires the provisioning of a BGP stack on
      the host, the configuration a BGP session between the host and the leaf
      router. There is also the requirement and expectation that the host is
      able to announce and withdraw the IP prefixes used by applications
      running on it in a dynamic manner. The leaf routers themselves also run
      DC routing protocols (BGP being a popular choice) for the further
      advertisement of the IP prefixes within the DC network and beyond.</t>

      <t>The deployment and use of LLDP is quite common in the layer-3 DC
      networks as it aids in topology discovery and troubleshooting. Open
      source LLDP implementations are also widely deployed between the leaf
      routers and the hosts connected to them. This document introduces LLDP
      extensions to enable the hosts to advertise their prefixes that can be
      discovered by LLDP running on the leaf routers and used for routing of
      traffic to those prefixes towards the host. The routers can further
      advertise or withdraw these host IP prefixes discovered via LLDP into
      the DC routing protocols like BGP <xref target="RFC4271"/>, IS-IS <xref
      target="ISO10589"/> or OSPF <xref target="RFC2328"/> <xref
      target="RFC5340"/>. This avoids the provisioning and management of BGP
      on the hosts towards the leaf routers, thereby simplifying operations in
      certain deployments.</t>

      <t>As LLDP is not a routing protocol, the specifications in this
      document is applicable to layer-3 DCs where the hosts are connected via
      layer-3 interfaces to the leaf routers and the requirement is simply to
      provide server IP prefix reachability. This solution works for both
      IPv4/IPv6 prefixes and enables application/container/VM orchestration
      mechanisms on the hosts to advertise basic routing information related
      to these prefixes. These orchestration mechanisms typically interact
      with the LLDP daemon running locally in the user space on the host to
      announce and withdraw prefix reachability on demand. The details of
      these orchestration mechanisms are outside the scope of this
      document.</t>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in BCP
        14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section anchor="IPA" title="LLDP Extensions For Lightweight Host Routing">
      <t>The IPv4 and IPv6 Host Prefix TLVs are used by a host to advertise
      its own local IP Prefixes in the Link Layer Discovery Protocol data unit
      (LLDPDU) <xref target="LLDP"/>. The format of these TLV is as
      follows:</t>

      <figure align="left" anchor="TLV" title="IPv4/IPv6 Host Prefix TLV">
        <artwork align="left"><![CDATA[ 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   TLV Type  |    TLV Length   |              OUI              ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ OUI continued |    sub-type   |       Prefix Block(s)         ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

]]></artwork>
      </figure>

      <t>Where:</t>

      <t><list style="symbols">
          <t>TLV Type: 7 bits size carrying the value 127 that indicates
          vendor-specific TLV</t>

          <t>TLV Length: 9 bits size carrying the length of the TLV after the
          TLV Length field in terms of octets</t>

          <t>Organization Unique Identifier (OUI): 3 octet field carrying the
          hexadecimal value 0x00005E that indicates IANA as the organization
          managing the underlying allocation space</t>

          <t>Sub-Type: 1-octet field that carries the IANA allocated LLDP TLV
          sub-type TBD1 for IPv4 and TBD2 for IPv6</t>

          <t>Prefix Block(s): at least one or more prefix blocks as specified
          below</t>
        </list></t>

      <figure align="left" anchor="PrefixBlock"
              title="IPv4/IPv6 Host Prefix TLV Prefix Block">
        <artwork align="left"><![CDATA[ 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Prefix Flags  | IGP Algorithm |    Metric     | Prefix Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Num Prefixes  | Prefix 1 (variable)  ... Prefix N (variable)  ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

]]></artwork>
      </figure>

      <t>Where:</t>

      <t><list style="symbols">
          <t>Prefix Flags: 1-octet field carrying the IPv4 or IPv6 prefix flag
          as described below: <figure align="center" anchor="IPV4PFXFLGS"
              title="IPv4 Prefix Flags">
              <artwork><![CDATA[ 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|A|  Reserved   |
+-+-+-+-+-+-+-+-+

]]></artwork>
            </figure><list style="symbols">
              <t>A-Flag: Anycast flag. If set, then the prefixes in the block
              are anycast.</t>

              <t>Reserved bits: Reserved for future use and MUST be zero when
              originated and ignored when received.</t>
            </list> <figure align="center" anchor="IPV6PFXFLGS"
              title="IPv6 Prefix Flags">
              <artwork><![CDATA[ 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|A|  Reserved |L|
+-+-+-+-+-+-+-+-+

]]></artwork>
            </figure><list style="symbols">
              <t>A-Flag: Anycast flag. If set, then the prefixes in the block
              are anycast.</t>

              <t>Reserved bits: Reserved for future use and MUST be zero when
              originated and ignored when received.</t>

              <t>L-Flag: Locator Flag. If set, then the prefixes in the block
              are SRv6 Locators <xref target="RFC8986"/>.</t>
            </list></t>

          <t>IGP Algorithm: 1-octet value providing the algorithm associated
          with the prefixes in the block. The algorithm values are from IGP
          Algorithm Types registry under the IANA Interior Gateway Protocol
          (IGP) Parameters.</t>

          <t>Metric: 1-octet carrying the metric value from the range 0 to 255
          associated with the prefixes in the block.</t>

          <t>Prefix Length: 1-octet carrying the length in bits of each of the
          prefixes in the block. The valid values are 1-32 for IPv4 and 1-128
          for IPv6.</t>

          <t>List of Prefixes: One or more prefixes where each prefix is
          encoded up to the number of bits as indicated by the Prefix Length
          followed by the minimum number of trailing bits needed to make the
          end of each prefix field falls on an octet boundary. Any trailing
          bits MUST be set to 0. Thus, each prefix field contains the most
          significant octets of the prefix, i.e., 1 octet for prefix length 1
          up to 8, 2 octets for prefix length 9 to 16, 3 octets for prefix
          length 17 up to 24, 4 octets for prefix length 25 up to 32, and so
          on.</t>
        </list></t>

      <t>To ensure efficient encoding of the LLDPDU, the following rules
      apply:<list style="symbols">
          <t>All prefixes that have the same properties (i.e., prefix length,
          flags, metric, and algorithm) MUST be encoded in a single prefix
          block unless doing so makes the block larger than the size that can
          be accommodated in a single IPv4/IPv6 Host Prefix TLV.</t>

          <t>More than one instance of the IPv4 or IPv6 Host Prefix TLV MUST
          NOT be used unless the prefix blocks to be advertised do not fit
          into a single TLV instance.</t>
        </list></t>

      <t>If the same prefix is present in multiple TLV instances or prefix
      blocks, only the first occurrence of that prefix in the LLDPDU MUST be
      considered and the rest MUST be ignored.</t>

      <t>LLDP has been extended to support multi-frame LLDPDUs <xref
      target="LLDP-MULTIFRAME"/>. The above rules also apply to multi-frame
      LLDPDUs.</t>
    </section>

    <section anchor="PROCEDURES" title="Procedures">
      <t>The LLDP extensions for lightweight host routing in this document
      enable the advertisement of the prefixes from the host towards its
      directly connected router. The host includes its local prefixes in the
      LLDPDU that it sends on its port(s) connected to the router. The router
      on receiving these LLDPDU discovers the host prefixes and programs them
      in its forwarding table with the outgoing interface pointing towards the
      port over which the LLDPDU was received and with the nexthop as the IP
      address on the host on that port. This nexthop address MAY be the one
      that is received via the Management Address TLV of LLDP or discovered
      via a protocol like ARP or ND on that specific interface. The same
      prefix may be learnt via LLDP from the same or different hosts over
      different ports; these MAY be installed as an equal cost multipath
      (ECMP) route by the router.</t>

      <t>Further the router MAY advertise these host prefixes learnt via LLDP
      into other protocols like BGP, OSPF, or IS-IS via route redistribution.
      The details of route redistribution mechanism for conveying information
      like metric and algorithm along with the prefix are local to the router
      implementation and outside the scope of this draft.</t>

      <t>LLDPDUs are sent periodically by the host and the router including
      the host IP prefixes that are active on the host. The host MAY also
      trigger an LLDPDU on demand when the set of IP host prefixes that are
      active change (e.g., when a prefix is removed, or a new prefix is
      provisioned).</t>

      <t>The processing of the host IP prefix information on the receiving
      router side follows the LLDP specification <xref target="LLDP"/>. This
      includes performing a mark and sweep operation between the existing set
      of host IP prefixes learnt on a specific port previously against the set
      of IP prefixes received on the same port in a subsequent LLDPDU. Any
      newly learnt prefixes are installed in the forwarding and made available
      for advertisement into other routing protocols via redistribution. Any
      prefixes that are no longer being received via LLDPDU on that port are
      deleted from the forwarding and withdrawn from routing protocols where
      they might have been previously redistributed into.</t>

      <t>The semantics of the mandatory Time To Live TLV of LLDP <xref
      target="LLDP"/> also affect the IP host prefix information learnt via
      LLDP. This includes removing all the learnt IP prefixes if an LLDPDU is
      not received within the period specified in the previous LLDPDU.
      Additionally, an implementation SHOULD delete the learnt host IP
      prefixes as soon as the port over which they are learnt goes down.</t>

      <t>There is no change the LLDPDUs sent from the routers towards the
      host.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document requests IANA to allocate code points from the "Link
      Layer Discovery Protocol (LLDP) TLV Subtypes" registry of the "IANA OUI
      Ethernet Numbers" registry group.</t>

      <figure align="left" anchor="IANACP" title="LLDP Extensions Code Points">
        <artwork align="left"><![CDATA[
+-------+------------------------------------------+---------------+
| Code  |                                          |               |
| Point |         Description                      |   Reference   |
+-------+------------------------------------------+---------------+
|  TBD1 | IPv4 Host Prefix TLV                     | this document |
|  TBD2 | IPv6 Host Prefix TLV                     | this document |
+-------+------------------------------------------+---------------+

]]></artwork>
      </figure>

      <t>This document also requests the creation of two registries called
      "LLDP IPv4 Host Prefix Flags" and "LLDP IPv6 Host Prefix Flags" under
      the "IANA OUI Ethernet Numbers" registry group. The allocation policy
      for these registries is "Expert Review" according to <xref
      target="RFC8126"/> with the guidance for Designated Experts being the
      same as for the LLDP TLV Subtypes registry in <xref
      target="RFC9542"/>.</t>

      <t>The initial allocations are as follows:</t>

      <figure align="left" anchor="IANAV4FLGS"
              title="LLDP IPv4 Host Prefix Flags">
        <artwork align="left"><![CDATA[ Bit     Description                               Reference 
-----------------------------------------------------------------
   0     Anycast (A-Flag)                          This document
 1-7     Unassigned  

]]></artwork>
      </figure>

      <t/>

      <t><figure align="left" anchor="IANAV6FLGS"
          title="LLDP IPv6 Host Prefix Flags">
          <artwork align="left"><![CDATA[ Bit     Description                               Reference 
-----------------------------------------------------------------
   0     Anycast (A-Flag)                          This document
 1-6     Unassigned  
   7     SRv6 Locator (L-Flag)                     This document

]]></artwork>
        </figure></t>
    </section>

    <section anchor="Manageability" title="Manageability Considerations">
      <t>The extensions in this document MUST NOT be enabled by default.
      Implementations on both the host and router side MUST provide a per-port
      configuration option to enable this feature. The implementation on the
      router side SHOULD log the activity of prefix discovery for monitoring
      and troubleshooting purposes.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The extensions in this document introduce additional information in
      LLDP. The IEEE 802.1AE <xref target="MACsec"/> standard can be used for
      encryption and/or authentication to provide privacy and integrity.
      MACsec utilizes the Galois/Counter Mode Advanced Encryption Standard
      (AES-GCM) for authenticated encryption and Galois Message Authentication
      Code (GMAC) if only authentication, but not encryption is required.</t>

      <t>The MACsec Key Agreement (MKA) is included as part of the IEEE
      802.1X-20200 Port-Based Network Access Control Standard <xref
      target="MKA"/>. The purpose of MKA is to provide a method for
      discovering MACsec peers and negotiating the security keys needed to
      secure the link.</t>

      <t>A rogue host may inject arbitrary and invalid prefixes into its
      connected router that could result in diversion of traffic and
      disruption for applications and services. This feature is expected to be
      used in environments where the router and the hosts are secured and
      within a single administrative control - e.g., a DC.</t>
    </section>

    <section anchor="ACK" title="Acknowledgements">
      <t>The authors of this document would like to acknowledge the review and
      inputs provided by Ketan Talaulikar during the early stages of this
      work.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.8126'?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include="reference.RFC.9542"?>

      <?rfc include='reference.RFC.8986'?>

      <reference anchor="LLDP">
        <front>
          <title>IEEE Standard for Local and metropolitan area networks -
          Station and Media Access Control Connectivity Discovery</title>

          <author>
            <organization>IEEE</organization>
          </author>

          <date day="11" month="March" year="2016"/>

          <abstract>
            <t>Technical and editorial errors identified by IEEE 802.1 Working
            Groups maintenance activty are corrected by this corrigendum to
            IEEE 802.1AB-2009.</t>
          </abstract>
        </front>

        <seriesInfo name="IEEE" value="802.1AB-2016"/>

        <seriesInfo name="DOI" value="10.1109/IEEESTD.2016.7433915"/>
      </reference>

      <reference anchor="MACsec">
        <front>
          <title>IEEE Standard for Local and metropolitan area networks -
          Media Access Control (MAC) Security</title>

          <author>
            <organization>IEEE</organization>
          </author>

          <date day="27" month="September" year="2018"/>

          <abstract>
            <t>How all or part of a network can be secured transparently to
            peer protocol entities that use the MAC Service provided by IEEE
            802 LANs to communicate is specified in this standard. MAC
            security (MACsec) provides connectionless user data
            confidentiality, frame data integrity, and data origin
            authenticity.</t>
          </abstract>
        </front>

        <seriesInfo name="IEEE" value="Standard 802.1AE-2018"/>
      </reference>

      <reference anchor="LLDP-MULTIFRAME">
        <front>
          <title>IEEE Standard for Local and metropolitan area networks--
          Station and Media Access Control Connectivity Discovery Amendment 2:
          Support for Multiframe Protocol Data Units</title>

          <author>
            <organization>IEEE</organization>
          </author>

          <date day="19" month="April" year="2022"/>

          <abstract>
            <t>This amendment to the IEEE Std 802.1AB(TM)-2016 specifies
            protocols, procedures, and managed objects that support the
            transmission and reception o fa set of Link Layer Discovery
            Protocol (LLDP) Type/Lenth/Values (TLVs) that exceed teh space
            available in a single frame.</t>
          </abstract>
        </front>

        <seriesInfo name="IEEE" value="802.1ABdh-2021"/>

        <seriesInfo name="DOI" value="10.1109/IEEESTD.2022.9760302"/>
      </reference>

      <reference anchor="MKA">
        <front>
          <title>IEEE Standard for Local and metropolitan area networks - Port
          Based Network Access Control</title>

          <author>
            <organization>IEEE</organization>
          </author>

          <date day="30" month="January" year="2020"/>

          <abstract>
            <t>Port-based network access control allows a network
            administrator to restrict the use of IEEE 802 LAN service access
            points (ports) to secure communication between authenticated and
            authorized devices. This standard specifies a common architecture,
            functional elements, and protocols that support mutual
            authentication between the clients of ports attached to the same
            LAN and that secure communication between the ports, including the
            media access method independent protocols that are used to
            discover and establish the security associations used by IEEE
            802.1AE MAC Security.</t>
          </abstract>
        </front>

        <seriesInfo name="IEEE" value="Standard 802.1X-2020"/>
      </reference>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.2328'?>

      <?rfc include='reference.RFC.4271'?>

      <?rfc include='reference.RFC.5340'?>

      <?rfc include='reference.RFC.7938'?>

      <reference anchor="ISO10589">
        <front>
          <title>Intermediate System to Intermediate System intra-domain
          routeing information exchange protocol for use in conjunction with
          the protocol for providing the connectionless-mode network service
          (ISO 8473)</title>

          <author>
            <organization>International Organization for
            Standardization</organization>
          </author>

          <date month="November" year="2002"/>
        </front>

        <seriesInfo name="ISO/IEC" value="10589"/>
      </reference>
    </references>
  </back>
</rfc>
