<?xml version="1.0" encoding="iso-8859-1"?>
<!--
     vim: set softtabstop=2 shiftwidth=2 expandtab
     version=20150108
-->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="no" ?>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC4360 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4360.xml">
<!ENTITY RFC7752 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7752.xml">
]>

<rfc category="info" docName="draft-drake-bess-datacenter-gateway-01" ipr="trust200902">
  <front>
    <title abbrev="SR DC Gateways">Gateway Auto-Discovery and Route Advertisement for Segment Routing Enabled Data Center Interconnection</title>

    <author fullname="John Drake" initials="J." surname="Drake">
      <organization>Juniper Networks</organization>
      <address>
        <email>jdrake@juniper.net</email>
      </address>
    </author>

    <author fullname="Adrian Farrel" initials="A." surname="Farrel">
      <organization>Juniper Networks</organization>
      <address>
        <email>adrian@olddog.co.uk</email>
      </address>
    </author>

    <author fullname="Eric Rosen" initials="E." surname="Rosen">
      <organization>Juniper Networks</organization>
      <address>
        <email>erosen@juniper.net</email>
      </address>
    </author>

    <date year="2016" />
    <area>Routing</area>
    <workgroup>BESS Working Group</workgroup>
    <keyword>DC</keyword>
    <keyword>SR</keyword>
    <keyword>GW</keyword>
    <keyword>BGP</keyword>

    <abstract>
       <t>Data centers have become critical components of the infrastructure used by network operators
          to provide services to their customers.  Data centers are attached to the Internet or a backbone
          network by gateway routers and one data center typically has more than one gateway for commercial,
          load balancing, and resiliency reasons.</t>

       <t>Segment routing is a popular protocol mechanism for operating within a data center, but also
          for steering traffic that flows between two data center sites. In order that one data center site
          may load balance the traffic it sends to another data center site it needs to know the complete
          set of gateway routers at the remote data center, the points of connection from those gateways to
          the backbone network, and the connectivity across the backbone network.</t>

       <t>This document defines a mechanism using the BGP Tunnel Encapsulation attribute to allow each
          gateway router to advertise the routes to the prefixes in the data center site to which it provides
          access, and also to advertise on behalf of each other gateway to the same data center site.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.
      </t>
    </note>
</front>

<middle>

  <section anchor="introduction" title="Introduction">
     <t>Data centers (DCs) have become critical components of the infrastructure used by network operators
        to provide services to their customers.  DCs are attached to the Internet or a backbone network by
        gateway routers (GWs) and one DC typically has more than one GW for various reasons including commercial
        preferences, load balancing, and resiliency against connection of device failure.</t>

     <t>Segment routing (SR) <xref target="I-D.ietf-spring-segment-routing" /> is a popular protocol mechanism
        for operating within a DC, but also for steering traffic that flows between two DC sites.  In order for an
        ingress DC that uses SR to load balance the flows it sends to an egress DC, it needs to know the complete
        set of entry nodes (i.e., GWs) for that egress DC from the backbone network connecting the two DCs.  Note
        that it is assumed that the connected set of DCs and the backbone network connecting them are part of the
        same SR BGP Link State (LS) instance (<xref target="RFC7752" /> and
        <xref target="I-D.ietf-idr-bgpls-segment-routing-epe" />) so that traffic engineering using SR may be used
        for these flows.</t>

     <t>Suppose that there are two gateways, GW1 and GW2 as shown in <xref target="david_hockney" />, for a given
        egress DC and they both advertise a route to prefix X which is located within that DC with each setting
        itself as next hop.  One might think that the GWs for X could be inferred from the routes&apos; next hop
        fields, but typically both routes do not get distributed across the backbone, rather only the best route,
        as selected by BGP, is distributed.  This precludes load balancing flows across both GWs.</t>

     <figure anchor="david_hockney" title="Example Data Center Interconnection">
       <artwork align="center">
         <![CDATA[
           -----------------                    --------------------
          | Ingress         |                  | Egress    ------   |
          | DC Site         |                  | DC Site  |Prefix|  |
          |                 |                  |          |   X  |  |
          |                 |                  |           ------   |
          |       --        |                  |   ---         ---  |
          |      |GW|       |                  |  |GW1|       |GW2| |
           -------++---------                   ----+----------+-+--
                  | \                               |         /  |
                  |  \                              |        /   |
                  |  -+-------------        --------+-------+--  |
                  | ||PE|       ----|      |----   |PE|   |PE| | |
                  | | --       |ASBR+------+ASBR|   --     --  | |
                  | |           ----|      |----               | |
                  | |               |      |                   | |
                  | |           ----|      |----               | |
                  | | AS1      |ASBR+------+ASBR|          AS2 | |
                  | |           ----|      |----               | |
                  |  ---------------        -------------------  |
                --+----------------------------------------------+--
               | |PE|                                          |PE| |
               |  --                 AS3                        --  |
               |                                                    |
                ----------------------------------------------------
         ]]>
       </artwork>
     </figure>

     <t>The obvious solution to this problem is to use add-paths <xref target="I-D.ietf-idr-add-paths" />
        to ensure that all routes to X get advertised by BGP.  However, even if this is done,
        the identity of the GWs will be lost as soon as the routes get distributed through an Autonomous
        System Border Router (ASBR) that will set itself to be the next hop.  And if there are multiple
        Autonomous Systems (ASes) in the backbone, not only will the next hop change several times, but
        the add-paths technique will experience scaling issues.  This all means that this approach is
        limited to DC sites connected over a single AS.</t>

     <t>This document defines a solution that overcomes this limitation and works equally well with a
        backbone constructed from one or more AS.  This solution uses the Tunnel Encapsulation attribute
        <xref target="I-D.ietf-idr-tunnel-encaps" /> as follows:
        <list>
           <t>We define a new tunnel type, "SR tunnel", and when the GWs to a given DC advertise a route to
              a prefix X within the DC, they will each include a Tunnel Encapsulation attribute with
              multiple tunnel instances each of type "SR tunnel", one for each GW and each containing a
              Remote Endpoint sub-TLV with that GW&apos;s address.</t>
        </list></t>

     <t>In other words, each route advertised by any GW identifies all of the GWs to the same DC (see
        <xref target="DCGWautodisco" /> for a discussion of how GWs discover each other). Therefore, even if
        only one of the routes is distributed to other ASes, it will not matter how many times the next hop
        changes, as the Tunnel Encapsulation attribute (and its remote endpoint sub-TLVs) will remain
        unchanged.</t>

     <t>To put this in the context of <xref target="david_hockney" />, GW1 and GW2 discover each other as
        gateways for the egress data center site.  Both GW1 and GW2 advertise themselves as having routes
        to prefix X.  Furthermore, GW1 includes a Tunnel Encapsulation attribute with a tunnel instance
        of type "SR tunnel" for itself and another for GW2.  Similarly, GW2 includes a Tunnel Encapsulation
        for itself and another for GW1.  The gateway in the ingress data center site can now see the possible
        paths to the egress data center site and choose one or balance traffic flows as it sees fit.</t>
  </section>

  <section anchor="DCGWautodisco" title="DC Gateway Auto-Discovery">
     <t>To allow a given DC&apos;s GWs to auto-discover each other and to coordinate their operations, the
        following procedures are implemented:

        <list style="symbols">
           <t>Each GW is configured with an identifier for the DC that is common across all GWs to the DC (i.e.,
              all GWs to all DC sites that are connected) and unique across all DCs that are connected.</t>

           <t>A route target (<xref target="RFC4360" />) is attached to each GW&apos;s auto-discovery route and
              has its value set to the DC identifier.</t>

           <t>Each GW constructs an import filtering rule to import any route that carries a route target with
              the same DC identifier that the GW itself uses.  This means that only these GWs will import those
              routes and that all GWs to the same DC will import each other&apos;s routes and will learn (auto-
              discover) the current set of active GWs for the DC.</t>
         </list>
     </t>

     <t>The auto-discovery route each GW advertises consists of the following:
        <list style="symbols">
           <t>An IPv4 or IPv6 NLRI containing one of the GW&apos;s loopback addresses (that is, with AFI/SAFI that
              is one of 1/1, 2/1, 1/4, 2/4)</t>
           <t>A Tunnel Encapsulation attribute containing the GW&apos;s encapsulation information, which at a minimum
              consists of an SR tunnel TLV (type to be allocated by IANA) with a Remote Endpoint sub-TLV as
              specified in <xref target="I-D.ietf-idr-tunnel-encaps" />.</t>
         </list>
     </t>

     <t>To avoid the side effect of applying the Tunnel Encapsulation attribute to any packet that is addressed to the GW,
        the GW SHOULD use a different loopback address.</t>

     <t>As described in <xref target="introduction" />, each GW will include a Tunnel Encapsulation attribute for
        each GW that is active for the DC site (including itself), and will include these in every route advertised
        externally to the DC site by each GW.  As the current set of active GWs changes (due to the addition of a new
        GW or the failure/removal of an existing GW) each externally advertised route will be re-advertised with the
        set of SR tunnel instances reflecting the current set of active GWs.</t>

     <t>If a gateway becomes disconnected from the backbone network, or if the DC operator decides to terminate the
        gateway&apos;s activity, it withdraws the advertisements described above.  This means that remote gateways at
        other sites will stop seeing advertisements from this gateway.  It also means that other local gateways at
        this site will "unlearn" the removed gateway and stop including a Tunnel Encapsulation attribute for the
        removed gateway in their advertisements.</t>
  </section>

  <section anchor="EPE" title="Relationship to BGP Link State and Egress Peer Engineering">
     <t>When a remote GW receives a route to a prefix X it can use the SR tunnel instances within the contained
        Tunnel Encapsulation attribute to identify the GWs through which X can be reached.  It uses this
        information to compute SR TE paths across the backbone network looking at the information advertised to
        it in SR BGP Link State (BGP-LS) <xref target="I-D.gredler-idr-bgp-ls-segment-routing-ext" /> and correlated
        using the DC identity.  SR Egress Peer Engineering (EPE) <xref target="I-D.ietf-idr-bgpls-segment-routing-epe" />
        can be used to supplement the information advertised in the BGP-LS.</t>
  </section>

  <section anchor="advertising" title="Advertising a DC Route Externally">
     <t>When a packet destined for prefix X is sent on an SR TE path to a GW for the DC site containing X, it needs to
        carry the receiving GW&apos;s label for X such that this label rises to the top of the stack before the GW
        complete its processing of the packet.  To achieve this we place a prefix-SID sub-TLV for X in each SR tunnel
        instance in the Tunnel Encapsulation attribute in the externally advertised route for X.</t>

     <t>Alternatively, if the GWs for a given DC are configured to allow remote GWs to perform SR TE through that DC for
        a prefix X, then each GW computes an SR TE path through that DC to X from each of the current active GWs and
        places each in an MPLS label stack sub-TLV <xref target="I-D.ietf-idr-tunnel-encaps" /> in the SR tunnel instance
        for that GW.</t>
  </section>

  <section anchor="encaps" title="Encapsulation">
     <t>If the GWs for a given DC are configured to allow remote GWs send them a packet in that DC&apos;s native
        encapsulation, then each GW will also include multiple instances of a tunnel TLV for that native encapsulation,
        one for each GW and each containing a  remote endpoint sub-TLV with that GW&apos;s address, in externally
        advertised routes.  A remote GW may then encapsulate a packet according to the rules defined via the sub-TLVs
        included in each of the tunnel TLV instances.</t>
  </section>

  <section anchor="iana" title="IANA Considerations">
     <t>IANA maintains a registry called "BGP parameters" with a sub-registry called "BGP Tunnel Encapsulation Tunnel Types."
        The registration policy for this registry is First-Come First-Served.</t>

     <t>IANA is requested to assign a codepoint from this sub-registry for "SR Tunnel".  The next available value may be
        used and reference should be made to this document.</t>

     <t>[[Note: This text is likely to be replaced with a specific code point value once FCFS allocation has been made.]]</t>
  </section>

  <section anchor="security" title="Security Considerations">
     <t>TBD</t>
  </section>

  <section anchor="manageability" title="Manageability Considerations">
     <t>TBD</t>
  </section>

  <section anchor="contrib" title="Contributors">
     <t>The following people contributed to discussions that led to the
        development of this document:</t>

     <figure>
       <artwork  align="left">
         <![CDATA[
           TBD
           name
           Email: email
         ]]>
       </artwork>
     </figure>
  </section>

  <section anchor="acks" title="Acknowledgements">
     <t>Thanks to Bruno Rijsman for review comments, and to Robert Raszuk for useful discussions.</t>
  </section>

</middle>

<back>
  <references title="Normative References">
    <?rfc include="reference.I-D.ietf-idr-bgpls-segment-routing-epe"?>
    <?rfc include="reference.I-D.ietf-idr-tunnel-encaps"?>
    &RFC2119;
    &RFC4360;
    &RFC7752;
  </references>

  <references title="Informative References">
    <?rfc include="reference.I-D.ietf-idr-add-paths"?>
    <?rfc include="reference.I-D.ietf-spring-segment-routing"?>
    <?rfc include="reference.I-D.gredler-idr-bgp-ls-segment-routing-ext"?>
  </references>

</back>
</rfc>
