diff --git a/docs/sphinx/source/index.rst b/docs/sphinx/source/index.rst index 9d706e1b..a284a781 100644 --- a/docs/sphinx/source/index.rst +++ b/docs/sphinx/source/index.rst @@ -17,7 +17,7 @@ Vespa python API getting-started-pyvespa-cloud application-packages query - exchange-data-with-app + reads-writes reference-api troubleshooting examples diff --git a/docs/sphinx/source/exchange-data-with-app.ipynb b/docs/sphinx/source/reads-writes.ipynb similarity index 78% rename from docs/sphinx/source/exchange-data-with-app.ipynb rename to docs/sphinx/source/reads-writes.ipynb index d83b2496..7c9b2b4f 100644 --- a/docs/sphinx/source/exchange-data-with-app.ipynb +++ b/docs/sphinx/source/reads-writes.ipynb @@ -7,21 +7,41 @@ "source": [ "![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)\n", "\n", - "# Exchange data with applications\n", + "# Read and write operations\n", "\n", - "This notebook demonstrates ways to feed, get, update and delete data.\n", + "This notebook documents ways to feed, get, update and delete data:\n", "\n", - "See end of this notebook to use the Vespa CLI for high-throughput feeding, instead of using pyvespa functions.\n", + "* Batch feeding vs feeding single operations\n", + "* Asynchronous vs synchronous operations\n", + "* Using the Vespa CLI for high-throughput feeding, instead of using pyvespa functions.\n", + "\n", + "
\n", + "\n", + "**Note**: The asynchronous code below runs from a Jupyter Notebook\n", + "because it already has its async event loop running in the background.\n", + "One must create your event loop when running this code on an environment without one,\n", + "just like any asyncio code requires.\n", + "
\n", + "\n", + "## Deploy a sample application\n", "\n", "[Install pyvespa](https://pyvespa.readthedocs.io/) and start Docker, validate minimum 4G available:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 22, "id": "166bc50c", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " Total Memory: 15.63GiB\r\n" + ] + } + ], "source": [ "!docker info | grep \"Total Memory\"" ] @@ -36,7 +56,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 23, "id": "congressional-friendly", "metadata": {}, "outputs": [ @@ -110,7 +130,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 24, "id": "mental-amazon", "metadata": {}, "outputs": [ @@ -120,7 +140,7 @@ "['text', 'dataset', 'questions', 'context_id', 'sentence_embedding']" ] }, - "execution_count": 5, + "execution_count": 24, "metadata": {}, "output_type": "execute_result" } @@ -136,19 +156,11 @@ }, { "cell_type": "markdown", - "id": "distributed-tribute", + "id": "furnished-wound", "metadata": {}, "source": [ "## Feed data\n", "\n", - "Feed data in a batch or one document at a time." - ] - }, - { - "cell_type": "markdown", - "id": "furnished-wound", - "metadata": {}, - "source": [ "### Batch\n", "\n", "Prepare the data as a list of dicts having the `id` key holding a unique id of the data point\n", @@ -157,7 +169,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 25, "id": "breeding-steal", "metadata": {}, "outputs": [], @@ -176,12 +188,12 @@ "id": "hybrid-dominant", "metadata": {}, "source": [ - "Feed the batch using [feed_batch](reference-api.rst#vespa.application.Vespa.feed_batch):" + "Feed using [feed_batch](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.feed_batch):" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 26, "id": "meaning-jamaica", "metadata": {}, "outputs": [ @@ -204,13 +216,13 @@ "metadata": {}, "source": [ "### Individual data points\n", - "#### Synchronous\n", - "Syncronously feeding individual data points is similar to batch feeding, except that you have more control when looping through your dataset:" + "\n", + "Syncronously feeding individual data points is similar to batch feeding:" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 27, "id": "electric-moisture", "metadata": {}, "outputs": [], @@ -227,15 +239,14 @@ "id": "exciting-tourist", "metadata": {}, "source": [ - "#### Asynchronous\n", - "`app.asyncio()` returns a `VespaAsync` instance that contains async operations such as `feed_data_point`.\n", - "Using the `async with` context manager ensures that we open and close the appropriate connections\n", - "required for async feeding:" + "`app.asyncio()` returns a `VespaAsync` instance that contains async operations such as\n", + "[feed_data_point](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.feed_data_point).\n", + "Using the `async with` context manager ensures that we open and close the connections for async feeding:" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 7, "id": "settled-talent", "metadata": {}, "outputs": [], @@ -253,12 +264,12 @@ "id": "voluntary-convenience", "metadata": {}, "source": [ - "We can then use asyncio constructs like `create_task` and `wait` to create different types of asynchronous flows:" + "Use asyncio constructs like `create_task` and `wait` to create different types of asynchronous flows:" ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 8, "id": "protected-marine", "metadata": {}, "outputs": [], @@ -281,37 +292,22 @@ " response = [x.result() for x in feed]" ] }, - { - "cell_type": "markdown", - "id": "racial-border", - "metadata": {}, - "source": [ - "
\n", - "\n", - "**Note**: The code above runs from a Jupyter Notebook\n", - "because it already has its async event loop running in the background.\n", - "You must create your event loop when running this code on an environment without one,\n", - "just like any asyncio code requires.\n", - "
" - ] - }, { "cell_type": "markdown", "id": "drawn-closure", "metadata": {}, "source": [ "## Get data\n", - "Similarly to the examples about feeding, we can get a batch of data or get individual data points.\n", "\n", "### Batch\n", "Prepare the data as a list of dicts having the `id` key holding a unique id of the data point.\n", - "We then get the batch from the desired schema using the\n", - "[get_batch](reference-api.rst#vespa.application.Vespa.get_batch) method." + "Get the batch from the schema using\n", + "[get_batch](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.get_batch)." ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 9, "id": "growing-pioneer", "metadata": {}, "outputs": [], @@ -326,14 +322,13 @@ "metadata": {}, "source": [ "### Individual data points\n", - "We can get individual data points synchronously or asynchronously.\n", "\n", - "#### Synchronous" + "Synchronous:" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 10, "id": "interpreted-warrant", "metadata": {}, "outputs": [], @@ -346,12 +341,12 @@ "id": "surface-spending", "metadata": {}, "source": [ - "#### Asynchronous" + "Asynchronous:" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 11, "id": "aggressive-pocket", "metadata": {}, "outputs": [], @@ -360,24 +355,12 @@ " response = await async_app.get_data(schema=\"sentence\",data_id=0)" ] }, - { - "cell_type": "markdown", - "id": "apart-legislature", - "metadata": {}, - "source": [ - "
\n", - "\n", - "**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.\n", - "
" - ] - }, { "cell_type": "markdown", "id": "circular-session", "metadata": {}, "source": [ "## Update data\n", - "Similarly to the examples about feeding, we can update a batch of data or update individual data points.\n", "\n", "### Batch\n", "Prepare the data as a list of dicts having the `id` key holding a unique id of the data point,\n", @@ -388,7 +371,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 12, "id": "induced-correction", "metadata": {}, "outputs": [], @@ -409,12 +392,13 @@ "id": "presidential-kitchen", "metadata": {}, "source": [ - "We then update the batch using [update_batch](reference-api.rst#vespa.application.Vespa.update_batch):" + "Read more about [create-if-nonexistent](https://docs.vespa.ai/en/document-v1-api-guide.html#create-if-nonexistent).\n", + "Update using [update_batch](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.update_batch):" ] }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 13, "id": "otherwise-directive", "metadata": {}, "outputs": [], @@ -428,14 +412,13 @@ "metadata": {}, "source": [ "### Individual data points\n", - "We can update individual data points synchronously or asynchronously.\n", "\n", - "#### Synchronous" + "Synchronous:" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 14, "id": "varied-radio", "metadata": {}, "outputs": [], @@ -448,12 +431,12 @@ "id": "champion-light", "metadata": {}, "source": [ - "#### Asynchronous" + "Asynchronous:" ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 15, "id": "grave-china", "metadata": {}, "outputs": [], @@ -462,34 +445,22 @@ " response = await async_app.update_data(schema=\"sentence\",data_id=0, fields=sentence_data[0], create=True)" ] }, - { - "cell_type": "markdown", - "id": "organized-montreal", - "metadata": {}, - "source": [ - "
\n", - "\n", - "**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.\n", - "
" - ] - }, { "cell_type": "markdown", "id": "cross-serum", "metadata": {}, "source": [ "## Delete data\n", - "Similarly to the examples about feeding, we can delete a batch of data or delete individual data points.\n", "\n", "### Batch\n", "Prepare the data as a list of dicts having the `id` key holding a unique id of the data point.\n", - "We then delete the batch from the desired schema using the\n", - "[delete_batch](reference-api.rst#vespa.application.Vespa.delete_batch) method." + "Delete from the schema using\n", + "[delete_batch](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.delete_batch)." ] }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 16, "id": "healthy-spell", "metadata": {}, "outputs": [], @@ -504,14 +475,13 @@ "metadata": {}, "source": [ "### Individual data points\n", - "We can delete individual data points synchronously or asynchronously.\n", "\n", - "#### Synchronous" + "Synchronous:" ] }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 17, "id": "white-chamber", "metadata": {}, "outputs": [], @@ -524,12 +494,12 @@ "id": "pacific-implement", "metadata": {}, "source": [ - "#### Asynchronous" + "Asynchronous:" ] }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 18, "id": "wrapped-actor", "metadata": {}, "outputs": [], @@ -538,17 +508,6 @@ " response = await async_app.delete_data(schema=\"sentence\",data_id=0)" ] }, - { - "cell_type": "markdown", - "id": "entitled-conservative", - "metadata": {}, - "source": [ - "
\n", - "\n", - "**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.\n", - "
" - ] - }, { "cell_type": "markdown", "id": "dd299858", @@ -563,7 +522,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 21, "id": "7feda6ab", "metadata": {}, "outputs": [ @@ -572,11 +531,11 @@ "output_type": "stream", "text": [ "\u001b[34m==>\u001b[0m \u001b[1mDownloading https://formulae.brew.sh/api/formula.jws.json\u001b[0m\n", - "#=#=# \n", + "##O=# # \n", "\u001b[34m==>\u001b[0m \u001b[1mDownloading https://formulae.brew.sh/api/cask.jws.json\u001b[0m\n", - "######################################################################### 100.0%\n", - "\u001b[33mWarning:\u001b[0m vespa-cli 8.198.18 is already installed and up-to-date.\n", - "To reinstall 8.198.18, run:\n", + "##O=# # \n", + "\u001b[33mWarning:\u001b[0m vespa-cli 8.209.11 is already installed and up-to-date.\n", + "To reinstall 8.209.11, run:\n", " brew reinstall vespa-cli\n" ] } @@ -587,7 +546,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 20, "id": "f3c8c6fd", "metadata": {}, "outputs": [ @@ -596,7 +555,7 @@ "output_type": "stream", "text": [ "{\r\n", - " \"feeder.seconds\": 0.073,\r\n", + " \"feeder.seconds\": 0.142,\r\n", " \"feeder.ok.count\": 3,\r\n", " \"feeder.ok.rate\": 3.000,\r\n", " \"feeder.error.count\": 0,\r\n", @@ -609,9 +568,9 @@ " \"http.response.bytes\": 246,\r\n", " \"http.response.MBps\": 0.000,\r\n", " \"http.response.error.count\": 0,\r\n", - " \"http.response.latency.millis.min\": 51,\r\n", - " \"http.response.latency.millis.avg\": 58,\r\n", - " \"http.response.latency.millis.max\": 72,\r\n", + " \"http.response.latency.millis.min\": 121,\r\n", + " \"http.response.latency.millis.avg\": 127,\r\n", + " \"http.response.latency.millis.max\": 139,\r\n", " \"http.response.code.counts\": {\r\n", " \"200\": 3\r\n", " }\r\n", @@ -659,6 +618,16 @@ "vespa_docker.container.stop()\n", "vespa_docker.container.remove()" ] + }, + { + "cell_type": "markdown", + "id": "40fc7079", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "Read more on writing to Vespa in [reads-and-writes](https://docs.vespa.ai/en/reads-and-writes.html)." + ] } ], "metadata": { diff --git a/questions.jsonl b/questions.jsonl index 020e4b4a..ce6eebd9 100644 --- a/questions.jsonl +++ b/questions.jsonl @@ -20,22 +20,22 @@ {"update": "id:pyvespa-p:paragraph::pyvespa/getting-started-pyvespa-cloud.html-deploy-to-vespa-cloud", "fields": {"questions": {"assign": ["What is Vespa.ai?", "What are tenants and instances?", "How do I deploy an instance?", "How can I interact with my application?", "What is Sphinx used for?", "What organization owns Vespa.ai?"]}}} {"update": "id:pyvespa-p:paragraph::pyvespa/examples.html-", "fields": {"questions": {"assign": ["What examples are included in pyvespa?", "How can I exchange data with pyvespa applications?"]}}} {"update": "id:pyvespa-p:paragraph::pyvespa/examples.html-examples", "fields": {"questions": {"assign": ["How do I create and deploy the application for question answering?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-", "fields": {"questions": {"assign": ["How to feed data?", "How to get data?", "How to update data?", "How to delete data?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-exchange-data-with-applications", "fields": {"questions": {"assign": ["How to get data?", "How to update data?", "How to delete data?", "How to deploy a sample test application?", "What is sentence_embedding?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-feed-data", "fields": {"questions": {"assign": ["Can data be fed in batches?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-synchronous", "fields": {"questions": {"assign": ["What is synchronous feeding?", "What is data_id?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-asynchronous", "fields": {"questions": {"assign": ["What does app.asyncio() return?", "What does async with guarantee?", "How do you use asyncio constructs?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-get-data", "fields": {"questions": {"assign": ["What is the similarity between feeding and getting data?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-batch", "fields": {"questions": {"assign": ["How do we prepare data?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-individual-data-points", "fields": {"questions": {"assign": ["What is synchronous data?", "What is asynchronous data?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-asynchronous", "fields": {"questions": {"assign": ["What is the purpose of app.asyncio()?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-update-data", "fields": {"questions": {"assign": ["What is data feeding?", "How do you update data?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-individual-data-points", "fields": {"questions": {"assign": ["Can data points be updated asynchronously?", "Can data points be updated synchronously?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-synchronous", "fields": {"questions": {"assign": ["What is data_id?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-delete-data", "fields": {"questions": {"assign": ["What can be deleted in batches?", "What is individual data deletion?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-batch", "fields": {"questions": {"assign": ["How to delete batch in Vespa?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-individual-data-points", "fields": {"questions": {"assign": ["How can data points be deleted?", "What is synchronous deletion?", "What is asynchronous deletion?"]}}} -{"update": "id:pyvespa-p:paragraph::pyvespa/exchange-data-with-app.html-synchronous", "fields": {"questions": {"assign": ["What does app.delete_data do?", "What data is being deleted?", "How to delete data with Vespa.ai?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-", "fields": {"questions": {"assign": ["How to feed data?", "How to get data?", "How to update data?", "How to delete data?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-exchange-data-with-applications", "fields": {"questions": {"assign": ["How to get data?", "How to update data?", "How to delete data?", "How to deploy a sample test application?", "What is sentence_embedding?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-feed-data", "fields": {"questions": {"assign": ["Can data be fed in batches?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-synchronous", "fields": {"questions": {"assign": ["What is synchronous feeding?", "What is data_id?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-asynchronous", "fields": {"questions": {"assign": ["What does app.asyncio() return?", "What does async with guarantee?", "How do you use asyncio constructs?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-get-data", "fields": {"questions": {"assign": ["What is the similarity between feeding and getting data?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-batch", "fields": {"questions": {"assign": ["How do we prepare data?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-individual-data-points", "fields": {"questions": {"assign": ["What is synchronous data?", "What is asynchronous data?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-asynchronous", "fields": {"questions": {"assign": ["What is the purpose of app.asyncio()?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-update-data", "fields": {"questions": {"assign": ["What is data feeding?", "How do you update data?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-individual-data-points", "fields": {"questions": {"assign": ["Can data points be updated asynchronously?", "Can data points be updated synchronously?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-synchronous", "fields": {"questions": {"assign": ["What is data_id?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-delete-data", "fields": {"questions": {"assign": ["What can be deleted in batches?", "What is individual data deletion?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-batch", "fields": {"questions": {"assign": ["How to delete batch in Vespa?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-individual-data-points", "fields": {"questions": {"assign": ["How can data points be deleted?", "What is synchronous deletion?", "What is asynchronous deletion?"]}}} +{"update": "id:pyvespa-p:paragraph::pyvespa/reads-writes.html-synchronous", "fields": {"questions": {"assign": ["What does app.delete_data do?", "What data is being deleted?", "How to delete data with Vespa.ai?"]}}} {"update": "id:pyvespa-p:paragraph::pyvespa/genindex.html-%5C_", "fields": {"questions": {"assign": ["What is VespaCloud?", "What is FieldSet?", "What is QueryField?", "What is QueryProfile?", "What is QueryTypeField?"]}}} {"update": "id:pyvespa-p:paragraph::pyvespa/genindex.html-d", "fields": {"questions": {"assign": ["What methods are available for deleting documents?", "What method can be used for deployment?"]}}} {"update": "id:pyvespa-p:paragraph::pyvespa/genindex.html-f", "fields": {"questions": {"assign": ["What is feed_batch?", "What is feed_data_point?", "What is FieldSet?", "What does from_container_name_or_id do?"]}}}