Skip to content

Commit

Permalink
Change name of normalization processor in text (#5611)
Browse files Browse the repository at this point in the history
* Change name of normalization processor in text

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Fix link

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
  • Loading branch information
kolchfa-aws authored Nov 16, 2023
1 parent 6625a06 commit bb269db
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 7 deletions.
2 changes: 1 addition & 1 deletion _query-dsl/compound/hybrid.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ You can use a hybrid query to combine relevance scores from multiple queries int

## Example

Before using a `hybrid` query, you must configure a search pipeline with a [`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) (see [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor#example)).
Before using a `hybrid` query, you must configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) (see [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor#example)).

To try out the example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).

Expand Down
10 changes: 5 additions & 5 deletions _search-plugins/search-pipelines/normalization-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ grand_parent: Search pipelines

# Normalization processor

The `normalization_processor` is a search phase results processor that runs between the query and fetch phases of search execution. It intercepts the query phase results and then normalizes and combines the document scores from different query clauses before passing the documents to the fetch phase.
The `normalization-processor` is a search phase results processor that runs between the query and fetch phases of search execution. It intercepts the query phase results and then normalizes and combines the document scores from different query clauses before passing the documents to the fetch phase.

## Score normalization and combination

Expand All @@ -21,7 +21,7 @@ OpenSearch supports two search types: `query_then_fetch` and `dfs_query_then_fet

![Normalization processor flow diagram]({{site.url}}{{site.baseurl}}/images/normalization-processor.png)

When you send a search request to a node, the node becomes a _coordinating node_. During the first phase of search, the _query phase_, the coordinating node routes the search request to all shards in the index, including primary and replica shards. Each shard then runs the search query locally and returns metadata about the matching documents, which includes their document IDs and relevance scores. The `normalization_processor` then normalizes and combines scores from different query clauses. The coordinating node merges and sorts the local lists of results, compiling a global list of top documents that match the query. After that, search execution enters a _fetch phase_, in which the coordinating node requests the documents in the global list from the shards where they reside. Each shard returns the documents' `_source` to the coordinating node. Finally, the coordinating node sends a search response containing the results back to you.
When you send a search request to a node, the node becomes a _coordinating node_. During the first phase of search, the _query phase_, the coordinating node routes the search request to all shards in the index, including primary and replica shards. Each shard then runs the search query locally and returns metadata about the matching documents, which includes their document IDs and relevance scores. The `normalization-processor` then normalizes and combines scores from different query clauses. The coordinating node merges and sorts the local lists of results, compiling a global list of top documents that match the query. After that, search execution enters a _fetch phase_, in which the coordinating node requests the documents in the global list from the shards where they reside. Each shard returns the documents' `_source` to the coordinating node. Finally, the coordinating node sends a search response containing the results back to you.

## Request fields

Expand All @@ -38,11 +38,11 @@ Field | Data type | Description

## Example

The following example demonstrates using a search pipeline with a `normalization_processor`. To try out this example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).
The following example demonstrates using a search pipeline with a `normalization-processor`. To try out this example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).

### Creating a search pipeline

The following request creates a search pipeline containing a `normalization_processor` that uses the `min_max` normalization technique and the `arithmetic_mean` combination technique:
The following request creates a search pipeline containing a `normalization-processor` that uses the `min_max` normalization technique and the `arithmetic_mean` combination technique:

```json
PUT /_search/pipeline/nlp-search-pipeline
Expand Down Expand Up @@ -114,4 +114,4 @@ For more information, see [Hybrid query]({{site.url}}{{site.baseurl}}/query-dsl/

To improve search relevance, we recommend increasing the sample size.

If the hybrid query does not return some expected results, it may be because the subqueries return too few documents. The `normalization_processor` only transforms the results returned by each subquery; it does not perform any additional sampling. During our experiments, we used [nDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) to measure quality of information retrieval depending on the number of documents returned (the size). We have found that a size in the [100, 200] range works best for datasets of up to 10M documents. We do not recommend increasing the size beyond the recommended values because higher size values do not improve search relevance but increase search latency.
If the hybrid query does not return some expected results, it may be because the subqueries return too few documents. The `normalization-processor` only transforms the results returned by each subquery; it does not perform any additional sampling. During our experiments, we used [nDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) to measure quality of information retrieval depending on the number of documents returned (the size). We have found that a size in the [100, 200] range works best for datasets of up to 10M documents. We do not recommend increasing the size beyond the recommended values because higher size values do not improve search relevance but increase search latency.
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/search-processors.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ The following table lists all supported search request processors.

Processor | Description | Earliest available version
:--- | :--- | :---
[`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Intercepts the query phase results and normalizes and combines the document scores before passing the documents to the fetch phase. | 2.10
[`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) | Intercepts the query phase results and normalizes and combines the document scores before passing the documents to the fetch phase. | 2.10

## Viewing available processor types

Expand Down

0 comments on commit bb269db

Please sign in to comment.