From 2e8be33c82375e35eb7570d34a7c6c30809694cf Mon Sep 17 00:00:00 2001 From: Jo Kristian Bergum Date: Fri, 2 Feb 2024 13:17:18 +0100 Subject: [PATCH] Add example using nomic embedding model (#668) --- .../examples/nomic-embeddings-cloud.ipynb | 638 ++++++++++++++++++ 1 file changed, 638 insertions(+) create mode 100644 docs/sphinx/source/examples/nomic-embeddings-cloud.ipynb diff --git a/docs/sphinx/source/examples/nomic-embeddings-cloud.ipynb b/docs/sphinx/source/examples/nomic-embeddings-cloud.ipynb new file mode 100644 index 00000000..b0fa4114 --- /dev/null +++ b/docs/sphinx/source/examples/nomic-embeddings-cloud.ipynb @@ -0,0 +1,638 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b3ae8a2b", + "metadata": {}, + "source": [ + "\n", + " \n", + " \n", + " \"#Vespa\"\n", + "\n", + "\n", + "\n", + "# Arxiv AI-powered search\n", + "\n", + "This notebook demonstrates how to load a ArxiV dataset hosted on [HF datasets](https://huggingface.co/datasets/somewheresystems/dataclysm-arxiv) \n", + "and feed it to a Vespa instance. The dataset comprises of English language arXiv papers from the Cornell/arXiv dataset, with two new columns added: title-embeddings and abstract-embeddings. Embeddings generated using the [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) embeddings model. \n", + "\n", + "In this notebook, we use Vespa's embedder functionality to include the [bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) embedding\n", + "model into Vespa for query serving. \n", + "\n", + "This is work in progress - we want to demonstrate more query examples. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4ffa3cbe", + "metadata": {}, + "outputs": [], + "source": [ + "!pip3 install -U pyvespa " + ] + }, + { + "cell_type": "markdown", + "id": "da356d25", + "metadata": {}, + "source": [ + "## Definining the Vespa application\n", + "\n", + "[PyVespa](https://pyvespa.readthedocs.io/en/latest/) helps us build the [Vespa application package](https://docs.vespa.ai/en/application-packages.html). \n", + "A Vespa application package consists of configuration files, schemas, models, and code (plugins). \n", + "\n", + "First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html) with the fields we want to store and their type. This is a translation\n", + "of the dataset features:" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "id": "0dca2378", + "metadata": {}, + "outputs": [], + "source": [ + "from vespa.package import Schema, Document, Field, FieldSet, HNSW\n", + "paper_schema = Schema(\n", + " name=\"paper\",\n", + " mode=\"index\",\n", + " document=Document(\n", + " fields=[\n", + " Field(name=\"id\", type=\"string\", indexing=[\"summary\", \"index\"], match=[\"word\"]),\n", + " Field(name=\"submitter\", type=\"string\", indexing=[\"summary\", \"index\"]),\n", + " Field(name=\"authors\", type=\"string\", indexing=[\"summary\", \"index\"]),\n", + " Field(name=\"title\", type=\"string\", indexing=[\"summary\", \"index\"], index = \"enable-bm25\"),\n", + " Field(name=\"abstract\", type=\"string\", indexing=[\"summary\", \"index\"], index=\"enable-bm25\"),\n", + " Field(name=\"journal_ref\", type=\"string\", indexing=[\"summary\", \"index\"]),\n", + " Field(name=\"doi\", type=\"string\", indexing=[\"summary\", \"index\"]),\n", + " Field(name=\"categories\", type=\"array\", indexing=[\"summary\", \"index\"], match=[\"word\"]),\n", + " Field(name=\"title_embedding\", type=\"tensor(x[384])\",\n", + " indexing=[\"attribute\", \"index\"],\n", + " ann=HNSW(distance_metric=\"angular\")\n", + " ),\n", + " Field(name=\"abstract_embedding\", type=\"tensor(x[384])\",\n", + " indexing=[\"attribute\", \"index\"],\n", + " ann=HNSW(distance_metric=\"angular\")\n", + " ),\n", + " ],\n", + " ),\n", + " fieldsets=[\n", + " FieldSet(name = \"default\", fields = [\"title\", \"abstract\", \"authors\", \"submitter\"])\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "id": "66c5da1d", + "metadata": {}, + "outputs": [], + "source": [ + "from vespa.package import ApplicationPackage, Component, Parameter\n", + "\n", + "vespa_app_name = \"arxivsearch\"\n", + "vespa_application_package = ApplicationPackage(\n", + " name=vespa_app_name,\n", + " schema=[paper_schema],\n", + " components=[Component(id=\"bge\", type=\"hugging-face-embedder\",\n", + " parameters=[\n", + " Parameter(\"transformer-model\", {\"url\": \"https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/onnx/model.onnx\"}),\n", + " Parameter(\"tokenizer-model\", {\"url\": \"https://huggingface.co/Xenova/bge-small-en-v1.5/raw/main/tokenizer.json\"}),\n", + " Parameter(\"pooling-strategy\", args=dict(), children=\"cls\")\n", + " ]\n", + " )]\n", + ") " + ] + }, + { + "cell_type": "markdown", + "id": "7fe3d7bd", + "metadata": {}, + "source": [ + "In the last step, we configure [ranking](https://docs.vespa.ai/en/ranking.html) by adding `rank-profile`'s to the schema. \n", + "\n", + "Vespa supports [phased ranking](https://docs.vespa.ai/en/phased-ranking.html) and has a rich set of built-in [rank-features](https://docs.vespa.ai/en/reference/rank-features.html), including many\n", + "text-matching features such as:\n", + "\n", + "- [BM25](https://docs.vespa.ai/en/reference/bm25.html).\n", + "- [nativeRank](https://docs.vespa.ai/en/reference/nativerank.html) and many more. \n", + "\n", + "Users can also define custom functions using [ranking expressions](https://docs.vespa.ai/en/reference/ranking-expressions.html). \n", + "\n", + "The following defines a `hybrid` Vespa ranking profile and a plain `bm25` profile." + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "id": "a8ce5624", + "metadata": {}, + "outputs": [], + "source": [ + "from vespa.package import RankProfile, FirstPhaseRanking, GlobalPhaseRanking\n", + "\n", + "bm25 = RankProfile(\n", + " name=\"bm25\", \n", + " inputs=[(\"query(q)\", \"tensor(x[384])\")],\n", + " \n", + " first_phase=FirstPhaseRanking(\n", + " expression=\"bm25(title) + bm25(abstract)\",\n", + " )\n", + ")\n", + "\n", + "hybrid = RankProfile(\n", + " name=\"hybrid\", \n", + " inputs=[(\"query(q)\", \"tensor(x[384])\")],\n", + " first_phase=FirstPhaseRanking(\n", + " expression=\"closeness(field, title_embedding) + closeness(field, abstract_embedding)\"\n", + " ),\n", + " global_phase=GlobalPhaseRanking(\n", + " expression=\"reciprocal_rank_fusion(closeness(field,title_embedding), bm25(title), bm25(abstract), closeness(field,abstract_embedding))\"\n", + " ),\n", + " match_features=[\"bm25(title)\", \"bm25(abstract)\", \"closeness(field, title_embedding)\", \"closeness(field, abstract_embedding)\"]\n", + ")\n", + "paper_schema.add_rank_profile(bm25)\n", + "paper_schema.add_rank_profile(hybrid)" + ] + }, + { + "cell_type": "markdown", + "id": "846545f9", + "metadata": {}, + "source": [ + "## Deploy the application to Vespa Cloud\n", + "\n", + "With the configured application, we can deploy it to [Vespa Cloud](https://cloud.vespa.ai/en/). \n", + "It is also possible to deploy the app using docker; see the [Hybrid Search - Quickstart](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html) guide for\n", + "an example of deploying it to a local docker container. " + ] + }, + { + "cell_type": "markdown", + "id": "16179d9b", + "metadata": {}, + "source": [ + "Install the Vespa CLI using [homebrew](https://brew.sh/) - or download a binary from GitHub as demonstrated below. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "343981ce", + "metadata": {}, + "outputs": [], + "source": [ + "!brew install vespa-cli" + ] + }, + { + "cell_type": "markdown", + "id": "863d0700", + "metadata": {}, + "source": [ + "Alternatively, if running in Colab, download the Vespa CLI:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d5670bb6", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "res = requests.get(url=\"https://api.github.com/repos/vespa-engine/vespa/releases/latest\").json()\n", + "os.environ[\"VERSION\"] = res[\"tag_name\"].replace(\"v\", \"\")\n", + "!curl -fsSL https://github.com/vespa-engine/vespa/releases/download/v${VERSION}/vespa-cli_${VERSION}_linux_amd64.tar.gz | tar -zxf -\n", + "!ln -sf /content/vespa-cli_${VERSION}_linux_amd64/bin/vespa /bin/vespa" + ] + }, + { + "cell_type": "markdown", + "id": "0ff00727", + "metadata": {}, + "source": [ + "To deploy the application to Vespa Cloud we need to create a tenant in the Vespa Cloud:\n", + "\n", + "Create a tenant at [console.vespa-cloud.com](https://console.vespa-cloud.com/) (unless you already have one). \n", + "This step requires a Google or GitHub account, and will start your [free trial](https://cloud.vespa.ai/en/free-trial). \n", + "Make note of the tenant name, it is used in the next steps." + ] + }, + { + "cell_type": "markdown", + "id": "df9f9a1c", + "metadata": {}, + "source": [ + "### Configure Vespa Cloud date-plane security\n", + "\n", + "Create Vespa Cloud data-plane mTLS cert/key-pair. The mutual certificate pair is used to talk to your Vespa cloud endpoints. See [Vespa Cloud Security Guide](https://cloud.vespa.ai/en/security/guide) for details.\n", + "\n", + "We save the paths to the credentials for later data-plane access without using pyvespa APIs. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6a766d6", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.environ[\"TENANT_NAME\"] = \"vespa-team\" # Replace with your tenant name\n", + "\n", + "vespa_cli_command = f'vespa config set application {os.environ[\"TENANT_NAME\"]}.{vespa_app_name}'\n", + "\n", + "!vespa config set target cloud\n", + "!{vespa_cli_command}\n", + "!vespa auth cert -N " + ] + }, + { + "cell_type": "markdown", + "id": "b228381b", + "metadata": {}, + "source": [ + "Validate that we have the expected data-plane credential files:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "1f0b97c8", + "metadata": {}, + "outputs": [], + "source": [ + "from os.path import exists\n", + "from pathlib import Path\n", + "\n", + "cert_path = Path.home() / \".vespa\" / f\"{os.environ['TENANT_NAME']}.{vespa_app_name}.default/data-plane-public-cert.pem\"\n", + "key_path = Path.home() / \".vespa\" / f\"{os.environ['TENANT_NAME']}.{vespa_app_name}.default/data-plane-private-key.pem\"\n", + "\n", + "if not exists(cert_path) or not exists(key_path):\n", + " print(\"ERROR: set the correct paths to security credentials. Correct paths above and rerun until you do not see this error\")" + ] + }, + { + "cell_type": "markdown", + "id": "85ce80e0", + "metadata": {}, + "source": [ + "Note that the subsequent Vespa Cloud deploy call below will add `data-plane-public-cert.pem` to the application before deploying it to Vespa Cloud, so that\n", + "you have access to both the private key and the public certificate. At the same time, Vespa Cloud only knows the public certificate. \n", + "\n", + "### Configure Vespa Cloud control-plane security \n", + "\n", + "Authenticate to generate a tenant level control plane API key for deploying the applications to Vespa Cloud, and save the path to it. \n", + "\n", + "The generated tenant api key must be added in the Vespa Console before attemting to deploy the application. \n", + "\n", + "```\n", + "To use this key in Vespa Cloud click 'Add custom key' at\n", + "https://console.vespa-cloud.com/tenant/TENANT_NAME/account/keys\n", + "and paste the entire public key including the BEGIN and END lines.\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5bf8731c", + "metadata": {}, + "outputs": [], + "source": [ + "!vespa auth api-key\n", + "\n", + "from pathlib import Path\n", + "api_key_path = Path.home() / \".vespa\" / f\"{os.environ['TENANT_NAME']}.api-key.pem\"" + ] + }, + { + "cell_type": "markdown", + "id": "21db1010", + "metadata": {}, + "source": [ + "### Deploy to Vespa Cloud\n", + "\n", + "Now that we have data-plane and control-plane credentials ready, we can deploy our application to Vespa Cloud! \n", + "\n", + "`PyVespa` supports deploying apps to the [development zone](https://cloud.vespa.ai/en/reference/environments#dev-and-perf).\n", + "\n", + ">Note: Deployments to dev and perf expire after 7 days of inactivity, i.e., 7 days after running deploy. This applies to all plans, not only the Free Trial. Use the Vespa Console to extend the expiry period, or redeploy the application to add 7 more days." + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "id": "b5fddf9f", + "metadata": {}, + "outputs": [], + "source": [ + "from vespa.deployment import VespaCloud\n", + "\n", + "def read_secret():\n", + " \"\"\"Read the API key from the environment variable. This is \n", + " only used for CI/CD purposes.\"\"\"\n", + " t = os.getenv(\"VESPA_TEAM_API_KEY\")\n", + " if t:\n", + " return t.replace(r\"\\n\", \"\\n\")\n", + " else:\n", + " return t\n", + "\n", + "vespa_cloud = VespaCloud(\n", + " tenant=os.environ[\"TENANT_NAME\"],\n", + " application=vespa_app_name,\n", + " key_content=read_secret() if read_secret() else None,\n", + " key_location=api_key_path,\n", + " application_package=vespa_application_package)" + ] + }, + { + "cell_type": "markdown", + "id": "fa9baa5a", + "metadata": {}, + "source": [ + "Now deploy the app to Vespa Cloud dev zone. \n", + "\n", + "The first deployment typically takes 2 minutes until the endpoint is up. " + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "id": "fe954dc4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Deployment started in run 7 of dev-aws-us-east-1c for samples.arxivsearch. This may take a few minutes the first time.\n", + "INFO [12:01:11] Deploying platform version 8.284.4 and application dev build 7 for dev-aws-us-east-1c of default ...\n", + "INFO [12:01:11] Using CA signed certificate version 0\n", + "INFO [12:01:12] Using 1 nodes in container cluster 'arxivsearch_container'\n", + "INFO [12:01:13] Using 1 nodes in container cluster 'arxivsearch_container'\n", + "INFO [12:01:15] Deployment successful.\n", + "INFO [12:01:15] Session 247 for tenant 'samples' prepared and activated.\n", + "INFO [12:01:15] ######## Details for all nodes ########\n", + "INFO [12:01:15] h90001f.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n", + "INFO [12:01:15] --- platform vespa/cloud-tenant-rhel8:8.284.4\n", + "INFO [12:01:15] --- logserver-container on port 4080 has config generation 247, wanted is 247\n", + "INFO [12:01:15] --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n", + "INFO [12:01:15] h90001g.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n", + "INFO [12:01:15] --- platform vespa/cloud-tenant-rhel8:8.284.4\n", + "INFO [12:01:15] --- container-clustercontroller on port 19050 has config generation 247, wanted is 247\n", + "INFO [12:01:15] --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n", + "INFO [12:01:15] h90024a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n", + "INFO [12:01:15] --- platform vespa/cloud-tenant-rhel8:8.284.4\n", + "INFO [12:01:15] --- container on port 4080 has config generation 247, wanted is 247\n", + "INFO [12:01:15] --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n", + "INFO [12:01:15] h90026a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n", + "INFO [12:01:15] --- platform vespa/cloud-tenant-rhel8:8.284.4\n", + "INFO [12:01:15] --- storagenode on port 19102 has config generation 246, wanted is 247\n", + "INFO [12:01:15] --- searchnode on port 19107 has config generation 247, wanted is 247\n", + "INFO [12:01:15] --- distributor on port 19111 has config generation 246, wanted is 247\n", + "INFO [12:01:15] --- metricsproxy-container on port 19092 has config generation 247, wanted is 247\n", + "INFO [12:01:21] Found endpoints:\n", + "INFO [12:01:21] - dev.aws-us-east-1c\n", + "INFO [12:01:21] |-- https://fa63b7b7.e9029380.z.vespa-app.cloud/ (cluster 'arxivsearch_container')\n", + "INFO [12:01:21] Installation succeeded!\n", + "Using mTLS (key,cert) Authentication against endpoint https://fa63b7b7.e9029380.z.vespa-app.cloud//ApplicationStatus\n", + "Application is up!\n", + "Finished deployment.\n" + ] + } + ], + "source": [ + "from vespa.application import Vespa\n", + "app:Vespa = vespa_cloud.deploy()" + ] + }, + { + "cell_type": "markdown", + "id": "27d17774", + "metadata": {}, + "source": [ + "## Index the dataset\n", + "\n", + "The following streams the hf dataset into the Vespa instance. Notice the mapping of the dataset fields to Vespa feed\n", + "format. " + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "id": "8f422178", + "metadata": {}, + "outputs": [], + "source": [ + "# app:Vespa = vespa_cloud.deploy()\n", + "\n", + "from datasets import load_dataset\n", + "dataset = load_dataset(\"somewheresystems/dataclysm-arxiv\", split=\"train\", streaming=True).take(100)\n", + "vespa_feed = dataset.map(lambda x: \n", + "{\n", + " \"id\": x[\"id\"],\n", + " \"fields\" : {\n", + " \"id\": x[\"id\"],\n", + " \"title\": x[\"title\"],\n", + " \"abstract\": x[\"abstract\"],\n", + " \"title_embedding\": x[\"title_embedding\"],\n", + " \"abstract_embedding\": x[\"abstract_embedding\"],\n", + " \"journal_ref\": x.get(\"journal-ref\",None),\n", + " \"doi\": x.get(\"doi\",None),\n", + " \"categories\": x[\"categories\"],\n", + " \"authors\": x[\"authors\"],\n", + " \"submitter\": x[\"submitter\"]\n", + " }\n", + "})\n", + "from vespa.io import VespaResponse\n", + "\n", + "def callback(response:VespaResponse, id:str):\n", + " if not response.is_successful():\n", + " print(f\"Document {id} failed to feed with status code {response.status_code}, url={response.url} response={response.json}\")\n", + " else:\n", + " print(f\"Document {id} success.\")\n", + "\n", + "app.feed_iterable(schema=\"paper\", iter=vespa_feed, callback=callback, max_connections=12, max_workers=14, max_queue_size=10000)\n" + ] + }, + { + "cell_type": "markdown", + "id": "20b007ec", + "metadata": {}, + "source": [ + "### Querying data\n", + "\n", + "Now, we can start exploring querying the arxiv papers. \n", + "\n", + "The query request uses the Vespa Query API and the `Vespa.query()` function \n", + "supports passing any of the Vespa query API parameters. \n", + "\n", + "Read more about querying Vespa in:\n", + "\n", + "- [Vespa Query API](https://docs.vespa.ai/en/query-api.html)\n", + "- [Vespa Query API reference](https://docs.vespa.ai/en/reference/query-api-reference.html)\n", + "- [Vespa Query Language API (YQL)](https://docs.vespa.ai/en/query-language.html)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 109, + "id": "b9349fb4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " {\n", + " \"id\": \"index:arxivsearch_content/0/cfdff72f28cffdb0b73f6026\",\n", + " \"relevance\": 0.06384129063829451,\n", + " \"source\": \"arxivsearch_content\",\n", + " \"fields\": {\n", + " \"matchfeatures\": {\n", + " \"bm25(abstract)\": 0.0,\n", + " \"bm25(title)\": 0.0,\n", + " \"closeness(field,abstract_embedding)\": 0.6178772298066597,\n", + " \"closeness(field,title_embedding)\": 0.6288338602029975\n", + " },\n", + " \"id\": \"0812.3122\",\n", + " \"title\": \"Cosmological constraints on unifying Dark Fluid models\"\n", + " }\n", + " },\n", + " {\n", + " \"id\": \"index:arxivsearch_content/0/c77e9d766bd90c894a5d0481\",\n", + " \"relevance\": 0.06198484047241319,\n", + " \"source\": \"arxivsearch_content\",\n", + " \"fields\": {\n", + " \"matchfeatures\": {\n", + " \"bm25(abstract)\": 0.0,\n", + " \"bm25(title)\": 0.0,\n", + " \"closeness(field,abstract_embedding)\": 0.5754037589718138,\n", + " \"closeness(field,title_embedding)\": 0.6644048114912198\n", + " },\n", + " \"id\": \"0711.0466\",\n", + " \"title\": \"A Model for Dark Matter Halos\"\n", + " }\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "from vespa.io import VespaQueryResponse\n", + "import json\n", + "\n", + "response:VespaQueryResponse = app.query(\n", + " yql=\"select title, id from paper where ({targetHits:10}nearestNeighbor(title_embedding,q)) or ({targetHits:10}nearestNeighbor(abstract_embedding,q))\",\n", + " ranking=\"hybrid\",\n", + " query=\"dark matter field fluid model\",\n", + " body={\n", + " \"presentation.format.tensors\": \"short-value\",\n", + " \"input.query(q)\": \"embed(bge, \\\"dark matter field fluid model\\\")\",\n", + " }\n", + ")\n", + "assert(response.is_successful())\n", + "print(json.dumps(response.hits[0:2], indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "id": "405cdb72", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " {\n", + " \"id\": \"index:arxivsearch_content/0/cfdff72f28cffdb0b73f6026\",\n", + " \"relevance\": 31.398304828681407,\n", + " \"source\": \"arxivsearch_content\",\n", + " \"fields\": {\n", + " \"id\": \"0812.3122\",\n", + " \"title\": \"Cosmological constraints on unifying Dark Fluid models\"\n", + " }\n", + " },\n", + " {\n", + " \"id\": \"index:arxivsearch_content/0/6033639d686a018894cdd4ec\",\n", + " \"relevance\": 30.574650705468287,\n", + " \"source\": \"arxivsearch_content\",\n", + " \"fields\": {\n", + " \"id\": \"0812.3611\",\n", + " \"title\": \"Dark Energy vs. Dark Matter: Towards a Unifying Scalar Field?\"\n", + " }\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "\n", + "\n", + "response:VespaQueryResponse = app.query(\n", + " yql=\"select title, id from paper where userQuery()\",\n", + " ranking=\"bm25\",\n", + " query=\"dark matter field fluid model\",\n", + ")\n", + "assert(response.is_successful())\n", + "print(json.dumps(response.hits[0:2], indent=2))" + ] + }, + { + "cell_type": "markdown", + "id": "4d3ca1da", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "This notebook demonstrates how to interact with HF datasets, including embedding models in Vespa and querying. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "71e310e3", + "metadata": {}, + "outputs": [], + "source": [ + "vespa_cloud.delete()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.11.4 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + }, + "vscode": { + "interpreter": { + "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}