cleanup wording and internal links

huggingface · Feb 21, 2024 · 68820aa · 68820aa
1 parent 1300c16
commit 68820aa
Showing 1 changed file with 30 additions and 10 deletions.
diff --git a/notebooks/en/tgi_messages_api_demo.ipynb b/notebooks/en/tgi_messages_api_demo.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "_Authored by: [Andrew Reed](https://huggingface.co/andrewrreed)_\n",
     "\n",
-    "This notebook demonstrates how you can easily transition from OpenAI models for Open LLMs without needing to refactor any existing code.\n",
+    "This notebook demonstrates how you can easily transition from OpenAI models to Open LLMs without needing to refactor any existing code.\n",
     "\n",
     "[Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) now offers a Messages API, making it directly compatible with the OpenAI Chat Completion API. This means that any existing scripts that use OpenAI models (via the OpenAI client library or third-party tools like LangChain or LlamaIndex) can be directly swapped out to use any open LLM running on a TGI endpoint!\n",
     "\n",
@@ -20,9 +20,20 @@
     "\n",
     "In this notebook, we'll show you how to:\n",
     "\n",
-    "- [Create Inference Endpoint to Deploy a Model with TGI](#create-an-inference-endpoint)\n",
-    "- [Query the Inference Endpoint with OpenAI Client Libraries](#using-inference-endpoints-with-openai-client-libraries)\n",
-    "- [Integrate the Endpoint with LangChain and LlamaIndex Workflows](#integrate-with-langchain-and-llamaindex)\n"
+    "1. [Create Inference Endpoint to Deploy a Model with TGI](#section_1)\n",
+    "2. [Query the Inference Endpoint with OpenAI Client Libraries](#section_2)\n",
+    "3. [Integrate the Endpoint with LangChain and LlamaIndex Workflows](#section_3)\n",
+    "\n",
+    "**Let's dive in!**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "First we need to install dependencies and set an HF API key.\n"
    ]
   },
   {
@@ -51,7 +62,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Create an Inference Endpoint\n",
+    "<a id=\"section_1\"></a>\n",
+    "\n",
+    "## 1. Create an Inference Endpoint\n",
     "\n",
     "To get started, let's deploy [Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), a fine-tuned Mixtral model, to Inference Endpoints using TGI.\n",
     "\n",
@@ -116,14 +129,16 @@
     "\n",
     "Great, we now have a working endpoint!\n",
     "\n",
-    "> Note: When deploying with `huggingface_hub`, your endpoint will scale-to-zero after 15 minutes of idle time by default to optimize cost during periods of inactivity. Check out [the Hub Python Library documentation](https://huggingface.co/docs/huggingface_hub/guides/inference_endpoints) to see all the functionality available for managing your endpoint lifecycle.\n"
+    "_Note: When deploying with `huggingface_hub`, your endpoint will scale-to-zero after 15 minutes of idle time by default to optimize cost during periods of inactivity. Check out [the Hub Python Library documentation](https://huggingface.co/docs/huggingface_hub/guides/inference_endpoints) to see all the functionality available for managing your endpoint lifecycle._\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Using Inference Endpoints with OpenAI client libraries\n",
+    "<a id=\"section_2\"></a>\n",
+    "\n",
+    "## 2. Query the Inference Endpoint with OpenAI Client Libraries\n",
     "\n",
     "As mentioned above, since our model is hosted with TGI it now supports a Messages API meaning we can query it directly using the familiar OpenAI client libraries.\n"
    ]
@@ -197,7 +212,7 @@
    "source": [
     "Behind the scenes, TGI’s Messages API automatically converts the list of messages into the model’s required instruction format using its [chat template](https://huggingface.co/docs/transformers/chat_templating).\n",
     "\n",
-    "> Note: Certain OpenAI features, like function calling, are not compatible with TGI. Currently, the Messages API supports the following chat completion parameters: `stream`, `max_new_tokens`, `frequency_penalty`, `logprobs`, `seed`, `temperature`, and `top_p`.\n"
+    "_Note: Certain OpenAI features, like function calling, are not compatible with TGI. Currently, the Messages API supports the following chat completion parameters: `stream`, `max_new_tokens`, `frequency_penalty`, `logprobs`, `seed`, `temperature`, and `top_p`._\n"
    ]
   },
   {
@@ -239,7 +254,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Integrate with LangChain and LlamaIndex\n",
+    "<a id=\"section_3\"></a>\n",
+    "\n",
+    "## 3. Integrate with LangChain and LlamaIndex\n",
     "\n",
     "Now, let’s see how to use this newly created endpoint with popular RAG frameworks like LangChain and LlamaIndex.\n"
    ]
@@ -285,6 +302,7 @@
    "metadata": {},
    "source": [
     "We’re able to directly leverage the same `ChatOpenAI` class that we would have used with the OpenAI models. This allows all previous code to work with our endpoint by changing just one line of code.\n",
+    "\n",
     "Let’s now use our Mixtral model in a simple RAG pipeline to answer a question over the contents of a HF blog post.\n"
    ]
   },
@@ -363,7 +381,9 @@
    "source": [
     "### How to use with LlamaIndex\n",
     "\n",
-    "Similarly, you can also use a TGI endpoint in [LlamaIndex](https://www.llamaindex.ai/). We’ll use the `OpenAILike` class, and instantiate it by configuring some additional arguments (i.e. `is_local`, `is_function_calling_model`, `is_chat_model`, `context_window`). Note that the context window argument should match the value previously set for `MAX_TOTAL_TOKENS` of your endpoint.\n"
+    "Similarly, you can also use a TGI endpoint in [LlamaIndex](https://www.llamaindex.ai/). We’ll use the `OpenAILike` class, and instantiate it by configuring some additional arguments (i.e. `is_local`, `is_function_calling_model`, `is_chat_model`, `context_window`).\n",
+    "\n",
+    "_Note: that the context window argument should match the value previously set for `MAX_TOTAL_TOKENS` of your endpoint._\n"
    ]
   },
   {