<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>  <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->


<!DOCTYPE rfc [
  <!ENTITY rfc2119 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
  <!ENTITY rfc8174 SYSTEM "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
]>
<!-- If further character entities are required then they should be added to the DOCTYPE above.
     Use of an external entity file is not recommended. -->

<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="std"
  docName="draft-an-cats-usecase-ai-01"
  ipr="trust200902"
  obsoletes=""
  updates=""
  submissionType="IETF"
  xml:lang="en"
  version="3"
  consensus="true">


  <front>
    <title abbrev="AI-model">
    Use Case of Computing-Aware AI large model
    </title>

    <author fullname="Qing An" initials="Q." surname="An">
      <organization>Alibaba Group</organization>
      <address>
        <postal>
          <street/>
          <city/>
          <country>China</country>
        </postal>
	      <email>anqing.aq@alibaba-inc.com</email>
      </address>
    </author>
    
    
    <date year="2023" />

    <area>RTG</area>
    <workgroup>CATS</workgroup>

    <keyword>AI large model</keyword>

    <abstract>
    <t>AI models, especially AI large models have been fastly developed and widely deployed to serve the needs of users and multiple industries. Due to that AI large models involve mega-scale data and parameters, high consumption on computing and network resources, distributed computing becomes a natural choice to deploy AI large models.</t>

    <t>This document desribes the key concepts and deployment scenarios of AI large model, to demonstrate the necessity of considering computing and network resources to meet the requirements of AI tasks.</t>
   </abstract>
   
   </front>

    <middle>
    <section anchor="sec-intro" title="Introduction">

    <t>AI large model refers to a type of artificial intelligence model that is trained on massive amounts of data using deep learning techniques. These models are characterized by their large size, high complexity, and high computational requirements. AI large models have become increasingly important in various fields, such as natural language processing, computer vision, and speech recognition.</t>

    <t>There are usually two types of AI large models, AI foundation model and customized model. AI foundation large model is a model that can handle multiple tasks and domains, and has wider applicability and flexibility, but may not perform as well as customized model in specific domain tasks. Customized model is trained for specific industries or domains, and more focused on solving specific problems, but may not be applicable to other domains. AI foundation model usually involve mega-scale parametere, while customized model involves large or middle-scale parameters.
    </t>
    
    <t> Also, AI large model contains two key phases: training and inference. Training refers to the process of developing an AI model by feeding it with large amounts of data and optimizing it to learn and improve its performance. Training has high demand on computing and memory resource. On the other hand, inference is the process of using the trained AI model to make predictions or decisions based on new input data. Inference focuses more on the balance between computing resource, latency and power cost.
    </t>
    
    <t>There are mainly four types of AI tasks: </t>
    <ul>
        <li>Text: text-to-text (conversation), text classification (e.g. sentiment analysis)</li>
        <li>Vision: image classification (label images), object detection.</li>
        <li>Audio: speech-to-text, text-to-speech</li>
        <li>Multimodal: text-to-image, image-to-text, text-to-video, image-to-image, image-to-video, etc.</li>
    </ul>
   <t> Vison, audio, multimodal tasks often bring on high demand on network resource and computing resource.
    </t>
    
    <t>There are two AI large model deployment cases that will benefit from the dynamic selection of service instances and the traffic steering.
    </t>
    
    <t><xref target="fig-cloud-edge"></xref> shows the Cloud-edge co-inference AI model deployment. It can achieve low latency as the AI inference is deployed near to device. And it requires low demand on device resources. But when handling AI inference tasks, if traffic load between device and edge is high or edge computing resource is overloaded,  traffic steering is needed to ensure the QoS.</t>

    <figure anchor="fig-cloud-edge"
               title="Cloud-edge co-inference">
    <artwork><![CDATA[
                          Training + Inference
         +------------------------------------------------------+
         |                                                      |
         |                       Cloud                          |
         |                                                      |
         |                 +------------------+                 |
         |                 | Foundation Model |                 |
         |                 +------------------+                 |
         +--------------------------+---------------------------+
                                    |
                                    |     Training + Inference
       +----------------------------+-----------------------------+
       |  +--------------+  +--------------+   +--------------+   |
       |  |     Edge     |  |     Edge     |   |     Edge     |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  | |Customized| |  | |Customized| |   | |Customized| |   |
       |  | |  Models  | |  | |  Models  | |   | |  Models  | |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  +--------------+  +--------------+   +--------------+   |
       +----------+-----------------+---------------+-------------+
                  |                 |               |
                  |                 |               |
             +----+---+        +----+---+       +---+----+
             | Device |        | Device |   ... | Device |
             +--------+        +--------+       +--------+

]]></artwork>
          </figure>
          
    <t><xref target="fig-cloud-edge-device"></xref> shows the Cloud-edge-device co-inference AI model deployment. It is a more flexible deployment (also more complex). It can achieve low latency as the AI inference is deployed locally or near to device. And device can work when edge isn’t available. Careful consideration to ensure that edge will only be used when the trade-offs are right. Similar to Cloud-edge co-inference AI model deployment, traffic steering is needed.</t>

    <figure anchor="fig-cloud-edge-device"
               title="Cloud-edge-device co-inference">
    <artwork><![CDATA[

                          Training + Inference
         +------------------------------------------------------+
         |                                                      |
         |                       Cloud                          |
         |                                                      |
         |                 +------------------+                 |
         |                 | Foundation Model |                 |
         |                 +------------------+                 |
         +--------------------------+---------------------------+
                                    |
                                    |     Training + Inference
       +----------------------------+-----------------------------+
       |  +--------------+  +--------------+   +--------------+   |
       |  |     Edge     |  |     Edge     |   |     Edge     |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  | |Customized| |  | |Customized| |   | |Customized| |   |
       |  | |  Models  | |  | |  Models  | |   | |  Models  | |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  +--------------+  +--------------+   +--------------+   |
       +----------+-----------------+---------------+-------------+
                  |                 |                  |
                  |                 |                  |
             +----+-----+      +----+-----+       +----+-----+
             |  Device  |      |  Device  |   ... |  Device  |
             | +------+ |      | +------+ |       | +------+ |
             | |Pruned| |      | |Pruned| |       | |Pruned| |
             | |Model | |      | |Model | |       | |Model | |
             | +------+ |      | +------+ |       | +------+ |              
             +----------+      +----------+       +----------+
               Inference         Inference          Inference
                   
]]></artwork>
          </figure>
          
    <t>Many AI tasks brings on high demand on network resource and computing resource: vison, audio, multimodal. Also, it is common that same customized model is deployed in multiple edge sites to achieve load balance and high reliability.</t>
    
    <t>The edge site’s computing resource and network info should be collectively considered to make suitable traffic steering decision. For example, if the available computing resource in nearest edge site is low, the traffic of AI tasks should be steered to another edge with high resource. Also, if multiple AI tasks, delay-sensitive task (live streaming with AI-generated avatar) and delay-tolerant task (text-to-image) arrive in edge, delay-tolerate task should be steered to another edge if the nearest edge’s resource is limited.</t>

    </section>

    <section anchor="sec-term" title="Terminology">
	    
    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 <xref target="RFC2119"></xref> <xref target="RFC8174"></xref>
   when, and only when, they appear in all capitals, as shown here.</t>
    </section>



<section anchor = "sec-iana"
	 title = "IANA Considerations">

<t>This document makes no request of IANA.</t>

</section>

<section title="Security Considerations"  anchor="Security">
 <t>TBD</t>
</section>


</middle>
  
<back>
    <references title="Normative References">
 
     &rfc2119;  <!-- RFCs -->
     &rfc8174;   <!--- Ambiguity of Uppercase vs Lowercase -->
     </references>
  </back>
</rfc>
