Skip to content

Commit

Permalink
cleanup wording and internal links
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewrreed committed Feb 21, 2024
1 parent 1300c16 commit 68820aa
Showing 1 changed file with 30 additions and 10 deletions.
40 changes: 30 additions & 10 deletions notebooks/en/tgi_messages_api_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"_Authored by: [Andrew Reed](https://huggingface.co/andrewrreed)_\n",
"\n",
"This notebook demonstrates how you can easily transition from OpenAI models for Open LLMs without needing to refactor any existing code.\n",
"This notebook demonstrates how you can easily transition from OpenAI models to Open LLMs without needing to refactor any existing code.\n",
"\n",
"[Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) now offers a Messages API, making it directly compatible with the OpenAI Chat Completion API. This means that any existing scripts that use OpenAI models (via the OpenAI client library or third-party tools like LangChain or LlamaIndex) can be directly swapped out to use any open LLM running on a TGI endpoint!\n",
"\n",
Expand All @@ -20,9 +20,20 @@
"\n",
"In this notebook, we'll show you how to:\n",
"\n",
"- [Create Inference Endpoint to Deploy a Model with TGI](#create-an-inference-endpoint)\n",
"- [Query the Inference Endpoint with OpenAI Client Libraries](#using-inference-endpoints-with-openai-client-libraries)\n",
"- [Integrate the Endpoint with LangChain and LlamaIndex Workflows](#integrate-with-langchain-and-llamaindex)\n"
"1. [Create Inference Endpoint to Deploy a Model with TGI](#section_1)\n",
"2. [Query the Inference Endpoint with OpenAI Client Libraries](#section_2)\n",
"3. [Integrate the Endpoint with LangChain and LlamaIndex Workflows](#section_3)\n",
"\n",
"**Let's dive in!**\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"First we need to install dependencies and set an HF API key.\n"
]
},
{
Expand Down Expand Up @@ -51,7 +62,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Inference Endpoint\n",
"<a id=\"section_1\"></a>\n",
"\n",
"## 1. Create an Inference Endpoint\n",
"\n",
"To get started, let's deploy [Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), a fine-tuned Mixtral model, to Inference Endpoints using TGI.\n",
"\n",
Expand Down Expand Up @@ -116,14 +129,16 @@
"\n",
"Great, we now have a working endpoint!\n",
"\n",
"> Note: When deploying with `huggingface_hub`, your endpoint will scale-to-zero after 15 minutes of idle time by default to optimize cost during periods of inactivity. Check out [the Hub Python Library documentation](https://huggingface.co/docs/huggingface_hub/guides/inference_endpoints) to see all the functionality available for managing your endpoint lifecycle.\n"
"_Note: When deploying with `huggingface_hub`, your endpoint will scale-to-zero after 15 minutes of idle time by default to optimize cost during periods of inactivity. Check out [the Hub Python Library documentation](https://huggingface.co/docs/huggingface_hub/guides/inference_endpoints) to see all the functionality available for managing your endpoint lifecycle._\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Inference Endpoints with OpenAI client libraries\n",
"<a id=\"section_2\"></a>\n",
"\n",
"## 2. Query the Inference Endpoint with OpenAI Client Libraries\n",
"\n",
"As mentioned above, since our model is hosted with TGI it now supports a Messages API meaning we can query it directly using the familiar OpenAI client libraries.\n"
]
Expand Down Expand Up @@ -197,7 +212,7 @@
"source": [
"Behind the scenes, TGI’s Messages API automatically converts the list of messages into the model’s required instruction format using its [chat template](https://huggingface.co/docs/transformers/chat_templating).\n",
"\n",
"> Note: Certain OpenAI features, like function calling, are not compatible with TGI. Currently, the Messages API supports the following chat completion parameters: `stream`, `max_new_tokens`, `frequency_penalty`, `logprobs`, `seed`, `temperature`, and `top_p`.\n"
"_Note: Certain OpenAI features, like function calling, are not compatible with TGI. Currently, the Messages API supports the following chat completion parameters: `stream`, `max_new_tokens`, `frequency_penalty`, `logprobs`, `seed`, `temperature`, and `top_p`._\n"
]
},
{
Expand Down Expand Up @@ -239,7 +254,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Integrate with LangChain and LlamaIndex\n",
"<a id=\"section_3\"></a>\n",
"\n",
"## 3. Integrate with LangChain and LlamaIndex\n",
"\n",
"Now, let’s see how to use this newly created endpoint with popular RAG frameworks like LangChain and LlamaIndex.\n"
]
Expand Down Expand Up @@ -285,6 +302,7 @@
"metadata": {},
"source": [
"We’re able to directly leverage the same `ChatOpenAI` class that we would have used with the OpenAI models. This allows all previous code to work with our endpoint by changing just one line of code.\n",
"\n",
"Let’s now use our Mixtral model in a simple RAG pipeline to answer a question over the contents of a HF blog post.\n"
]
},
Expand Down Expand Up @@ -363,7 +381,9 @@
"source": [
"### How to use with LlamaIndex\n",
"\n",
"Similarly, you can also use a TGI endpoint in [LlamaIndex](https://www.llamaindex.ai/). We’ll use the `OpenAILike` class, and instantiate it by configuring some additional arguments (i.e. `is_local`, `is_function_calling_model`, `is_chat_model`, `context_window`). Note that the context window argument should match the value previously set for `MAX_TOTAL_TOKENS` of your endpoint.\n"
"Similarly, you can also use a TGI endpoint in [LlamaIndex](https://www.llamaindex.ai/). We’ll use the `OpenAILike` class, and instantiate it by configuring some additional arguments (i.e. `is_local`, `is_function_calling_model`, `is_chat_model`, `context_window`).\n",
"\n",
"_Note: that the context window argument should match the value previously set for `MAX_TOTAL_TOKENS` of your endpoint._\n"
]
},
{
Expand Down

0 comments on commit 68820aa

Please sign in to comment.