Skip to content

Commit

Permalink
Adds documentation about byField rerank processor (#8593)
Browse files Browse the repository at this point in the history
* Adds documentation about byField rerank processor

Signed-off-by: Brian Flores <iflorbri@amazon.com>

* Polishes example and fixes spelling mistakes

Signed-off-by: Brian Flores <iflorbri@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Doc review

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Polish example to work with curl request

If you use postman or dev tools it wont work since there are qoutes in the index this had to be changed. Also it had to be made clear where the search pipeline would be applied in doing a search

Signed-off-by: Brian Flores <iflorbri@amazon.com>

* added book-index endpoint to rerank-processor.md

Signed-off-by: Brian Flores <iflorbri@amazon.com>

---------

Signed-off-by: Brian Flores <iflorbri@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
  • Loading branch information
4 people authored Oct 23, 2024
1 parent a500ef7 commit f43dcfa
Show file tree
Hide file tree
Showing 4 changed files with 438 additions and 127 deletions.
118 changes: 97 additions & 21 deletions _search-plugins/search-pipelines/rerank-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,33 +11,49 @@ grand_parent: Search pipelines
Introduced 2.12
{: .label .label-purple }

The `rerank` search request processor intercepts search results and passes them to a cross-encoder model to be reranked. The model reranks the results, taking into account the scoring context. Then the processor orders documents in the search results based on their new scores.
The `rerank` search response processor intercepts and reranks search results. The processor orders documents in the search results based on their new scores.

OpenSearch supports the following rerank types.

Type | Description | Earliest available version
:--- | :--- | :---
[`ml_opensearch`](#the-ml_opensearch-rerank-type) | Applies an OpenSearch-provided cross-encoder model. | 2.12
[`by_field`](#the-by_field-rerank-type) | Applies reranking based on a user-provided field. | 2.18

## Request body fields

The following table lists all available request fields.

Field | Data type | Description
:--- | :--- | :---
`<reranker_type>` | Object | The reranker type provides the rerank processor with static information needed across all reranking calls. Required.
`context` | Object | Provides the rerank processor with information necessary for generating reranking context at query time.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
Field | Data type | Required/Optional | Description
:--- | :--- | :--- | :---
`<rerank_type>` | Object | Required | The rerank type for document reranking. Valid values are `ml-opensearch` and `by_field`.
`context` | Object | Required for the `ml_opensearch` rerank type. Optional and does not affect the results for the `by_field` rerank type. | Provides the `rerank` processor with information necessary for reranking at query time.
`tag` | String | Optional | The processor's identifier.
`description` | String | Optional | A description of the processor.
`ignore_failure` | Boolean | Optional | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Default is `false`.

<!-- vale off -->
## The ml_opensearch rerank type
<!-- vale on -->
Introduced 2.12
{: .label .label-purple }

### The `ml_opensearch` reranker type
To rerank results using a cross-encoder model, specify the `ml_opensearch` rerank type.

The `ml_opensearch` reranker type is designed to work with the cross-encoder model provided by OpenSearch. For this reranker type, specify the following fields.
### Prerequisite

Before using the `ml_opensearch` rerank type, you must configure a cross-encoder model. For information about using an OpenSearch-provided model, see [Cross-encoder models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#cross-encoder-models). For information about using a custom model, see [Custom local models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/custom-local-models/).

The `ml_opensearch` rerank type supports the following fields. All fields are required.

Field | Data type | Description
:--- | :--- | :---
`ml_opensearch` | Object | Provides the rerank processor with model information. Required.
`ml_opensearch.model_id` | String | The model ID for the cross-encoder model. Required. For more information, see [Using ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
`context.document_fields` | Array | An array of document fields that specifies the fields from which to retrieve context for the cross-encoder model. Required.
`ml_opensearch.model_id` | String | The model ID of the cross-encoder model for reranking. For more information, see [Using ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
`context.document_fields` | Array | An array of document fields that specifies the fields from which to retrieve context for the cross-encoder model.

## Example
### Example

The following example demonstrates using a search pipeline with a `rerank` processor.
The following example demonstrates using a search pipeline with a `rerank` processor implemented using the `ml_opensearch` rerank type. For a complete example, see [Reranking using a cross-encoder model]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-cross-encoder/).

### Creating a search pipeline

Expand Down Expand Up @@ -108,11 +124,71 @@ POST /_search?search_pipeline=rerank_pipeline
```
{% include copy-curl.html %}

The `query_context` object contains the following fields.
The `query_context` object contains the following fields. You must provide either `query_text` or `query_text_path` but cannot provide both simultaneously.

Field name | Required/Optional | Description
:--- | :--- | :---
`query_text` | Exactly one of `query_text` or `query_text_path` is required. | The natural language text of the question that you want to use to rerank the search results.
`query_text_path` | Exactly one of `query_text` or `query_text_path` is required. | The full JSON path to the text of the question that you want to use to rerank the search results. The maximum number of characters allowed in the path is `1000`.


<!-- vale off -->
## The by_field rerank type
<!-- vale on -->
Introduced 2.18
{: .label .label-purple }

To rerank results by a document field, specify the `by_field` rerank type.

The `by_field` object supports the following fields.

Field | Data type | Required/Optional | Description
:--- | :--- | :--- | :---
`target_field` | String | Required | Specifies the field name or a dot path to the field containing the score to use for reranking.
`remove_target_field` | Boolean | Optional | If `true`, the response does not include the `target_field` used to perform reranking. Default is `false`.
`keep_previous_score` | Boolean | Optional | If `true`, the response includes a `previous_score` field, which contains the score calculated before reranking and can be useful when debugging. Default is `false`.

### Example

The following example demonstrates using a search pipeline with a `rerank` processor implemented using the `by_field` rerank type. For a complete example, see [Reranking by a document field]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-by-field/).

### Creating a search pipeline

The following request creates a search pipeline with a `by_field` rerank type response processor that ranks the documents by the `reviews.stars` field and specifies to return the original document score:

```json
PUT /_search/pipeline/rerank_byfield_pipeline
{
"response_processors": [
{
"rerank": {
"by_field": {
"target_field": "reviews.stars",
"keep_previous_score" : true
}
}
}
]
}
```
{% include copy-curl.html %}

### Using the search pipeline

To apply the search pipeline to a query, provide the search pipeline name in the query parameter:

```json
POST /book-index/_search?search_pipeline=rerank_byfield_pipeline
{
"query": {
"match_all": {}
}
}
```
{% include copy-curl.html %}

Field name | Description
:--- | :---
`query_text` | The natural language text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required.
`query_text_path` | The full JSON path to the text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required. The maximum number of characters in the path is `1000`.
## Next steps

For more information about setting up reranking, see [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/).
- Learn more about [reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/).
- See a complete example of [reranking using a cross-encoder model]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-cross-encoder/).
- See a complete example of [reranking by a document field]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-by-field/).
208 changes: 208 additions & 0 deletions _search-plugins/search-relevance/rerank-by-field.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
---
layout: default
title: Reranking by a field
parent: Reranking search results
grand_parent: Search relevance
has_children: false
nav_order: 20
---

# Reranking search results by a field
Introduced 2.18
{: .label .label-purple }

You can use a `by_field` rerank type to rerank search results by a document field. Reranking search results by a field is useful if a model has already run and produced a numerical score for your documents or if a previous search response processor was applied and you want to rerank documents differently based on an aggregated field.

To implement reranking, you need to configure a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline intercepts search results and applies the [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) to them. The `rerank` processor evaluates the search results and sorts them based on the new scores obtained from a document field.

## Running a search with reranking

To run a search with reranking, follow these steps:

1. [Configure a search pipeline](#step-1-configure-a-search-pipeline).
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion).
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index).
1. [Search using reranking](#step-4-search-using-reranking).

## Step 1: Configure a search pipeline

Configure a search pipeline with a [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) and specify the `by_field` rerank type. The pipeline sorts by the `reviews.stars` field (specified by a complete dot path to the field) and returns the original query scores for all documents along with their new scores:

```json
PUT /_search/pipeline/rerank_byfield_pipeline
{
"response_processors": [
{
"rerank": {
"by_field": {
"target_field": "reviews.stars",
"keep_previous_score" : true
}
}
}
]
}
```
{% include copy-curl.html %}

For more information about the request fields, see [Request fields]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#request-body-fields).

## Step 2: Create an index for ingestion

In order to use the `rerank` processor defined in your pipeline, create an OpenSearch index and add the pipeline created in the previous step as the default pipeline:

```json
PUT /book-index
{
"settings": {
"index.search.default_pipeline" : "rerank_byfield_pipeline"
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"author": {
"type": "text"
},
"genre": {
"type": "keyword"
},
"reviews": {
"properties": {
"stars": {
"type": "float"
}
}
},
"description": {
"type": "text"
}
}
}
}
```
{% include copy-curl.html %}

## Step 3: Ingest documents into the index

To ingest documents into the index created in the previous step, send the following bulk request:

```json
POST /_bulk
{ "index": { "_index": "book-index", "_id": "1" } }
{ "title": "The Lost City", "author": "Jane Doe", "genre": "Adventure Fiction", "reviews": { "stars": 4.2 }, "description": "An exhilarating journey through a hidden civilization in the Amazon rainforest." }
{ "index": { "_index": "book-index", "_id": "2" } }
{ "title": "Whispers of the Past", "author": "John Smith", "genre": "Historical Mystery", "reviews": { "stars": 4.7 }, "description": "A gripping tale set in Victorian England, unraveling a century-old mystery." }
{ "index": { "_index": "book-index", "_id": "3" } }
{ "title": "Starlit Dreams", "author": "Emily Clark", "genre": "Science Fiction", "reviews": { "stars": 4.5 }, "description": "In a future where dreams can be shared, one girl discovers her imaginations power." }
{ "index": { "_index": "book-index", "_id": "4" } }
{ "title": "The Enchanted Garden", "author": "Alice Green", "genre": "Fantasy", "reviews": { "stars": 4.8 }, "description": "A magical garden holds the key to a young girls destiny and friendship." }

```
{% include copy-curl.html %}

## Step 4: Search using reranking

As an example, run a `match_all` query on your index:

```json
POST /book-index/_search
{
"query": {
"match_all": {}
}
}
```
{% include copy-curl.html %}

The response contains documents sorted in descending order based on the `reviews.starts` field. Each document contains the original query score in the `previous_score` field:

```json
{
"took": 33,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 4.8,
"hits": [
{
"_index": "book-index",
"_id": "4",
"_score": 4.8,
"_source": {
"reviews": {
"stars": 4.8
},
"author": "Alice Green",
"genre": "Fantasy",
"description": "A magical garden holds the key to a young girls destiny and friendship.",
"previous_score": 1,
"title": "The Enchanted Garden"
}
},
{
"_index": "book-index",
"_id": "2",
"_score": 4.7,
"_source": {
"reviews": {
"stars": 4.7
},
"author": "John Smith",
"genre": "Historical Mystery",
"description": "A gripping tale set in Victorian England, unraveling a century-old mystery.",
"previous_score": 1,
"title": "Whispers of the Past"
}
},
{
"_index": "book-index",
"_id": "3",
"_score": 4.5,
"_source": {
"reviews": {
"stars": 4.5
},
"author": "Emily Clark",
"genre": "Science Fiction",
"description": "In a future where dreams can be shared, one girl discovers her imaginations power.",
"previous_score": 1,
"title": "Starlit Dreams"
}
},
{
"_index": "book-index",
"_id": "1",
"_score": 4.2,
"_source": {
"reviews": {
"stars": 4.2
},
"author": "Jane Doe",
"genre": "Adventure Fiction",
"description": "An exhilarating journey through a hidden civilization in the Amazon rainforest.",
"previous_score": 1,
"title": "The Lost City"
}
}
]
},
"profile": {
"shards": []
}
}
```

## Next steps

- Learn more about the [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/).
Loading

0 comments on commit f43dcfa

Please sign in to comment.