[DOC] Document the Derived Field feature #6943

rishabhmaurya · 2024-04-11T17:41:10Z

What do you want to do?

Request a change to existing documentation
Add new documentation
Report a technical problem with the documentation
Other

Tell us about your request. Provide a summary of the request and all versions that are affected.

Derived fields allows users to add or manipulate existing indexed fields by running scripts on the _source document. It eliminates the need to index or store these fields separately while still enabling queries on them. However, this flexibility comes with a trade-off in query latency, as each matching document is evaluated against these derived fields by loading the _source document and running the script to determine if the document is a match.

Meta issue: opensearch-project/OpenSearch#12281
RFC: opensearch-project/OpenSearch#1133

Key Benefits

Adds or modifies fields on-the-fly during query time.
Reduces storage requirements by avoiding direct indexing of derived fields.
Enables dynamic data transformations and enrichments.

Supported types

boolean, keyword, date, long, double, geo_point, ip.

What other resources are available?
Steps for testing -

Example 1

Step 1: Create Index Mapping with Derived Fields

curl -X PUT "localhost:9200/my_index?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "product_name": { "type": "keyword" },
      "price": { "type": "double" },
      "quantity": { "type": "integer" }
    },
    "derived": {
      "total_cost": {
        "type": "double",
        "script": {
          "source": "emit(doc[\"price\"].value * doc[\"quantity\"].value)"
        }
      }
    }
  }
}
'

In this example, we define a derived field total_cost that calculates the total cost by multiplying the price and quantity fields.
Step 2: Ingest Documents

curl -X POST "localhost:9200/my_index/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index":{}}
{"product_name": "Widget", "price": 10.5, "quantity": 100}
{"index":{}}
{"product_name": "Gadget", "price": 15.75, "quantity": 50}
{"index":{}}
{"product_name": "Tool", "price": 8.25, "quantity": 200}
{"index":{}}
{"product_name": "Appliance", "price": 50.0, "quantity": 10}
{"index":{}}
{"product_name": "Accessory", "price": 5.0, "quantity": 300}
'

We have indexed 5 documents representing various products with their prices and quantities.

Step 3: Query Based on Derived Field

curl -X GET "localhost:9200/my_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "total_cost": {
        "gte": 1000  // Example query to find products with total cost >= 1000
      }
    }
  },
  "fields" : ["*"]
}
'

This query retrieves documents where the total_cost is greater than or equal to 1000.

Example 2

User sentiment analysis

Key Components

Derived Field: sentiment - Calculates the sentiment of the text based on predefined rules.
Indexed Fields: Additional fields for context-based queries.
User ID Filter: Narrowing down search results based on user ID.

Step 1: Create Index Mapping with Derived Field

curl -X PUT "localhost:9200/sentiment_analysis?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "text": { "type": "text", "index": false }, // Field not indexed
      "category": { "type": "keyword" },  // Additional indexed field for context
      "user_id": { "type": "keyword" }   // User ID field for filtering
    },
    "derived": {
      "sentiment": {
        "type": "keyword",
        "script": {
          "source": "if (params._source[\"text\"].contains(\"happy\")) { emit(\"positive\") } else if (params._source[\"text\"].contains(\"sad\")) { emit(\"negative\") } else { emit(\"neutral\") }"
        }
      }
    }
  }
}
'

This mapping defines a derived field sentiment that assigns sentiment labels (positive, negative, neutral) based on specific keywords in the text, along with indexed fields category and user_id for context-based queries and user filtering.

Step 2: Ingest Documents

curl -X POST "localhost:9200/sentiment_analysis/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index":{}}
{"text": "I am feeling happy today!", "category": "personal", "user_id": "123"}
{"index":{}}
{"text": "The news made me sad.", "category": "news", "user_id": "456"}
{"index":{}}
{"text": "The weather is neutral.", "category": "weather", "user_id": "789"}
'

We've indexed 3 documents with varying sentiments, categories, and user IDs for context-based queries and user filtering.

Step 3: Query Based on Sentiment, Indexed Fields, and User ID

Positive Sentiment in Personal Category for User ID "123"

curl -X GET "localhost:9200/sentiment_analysis/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "sentiment": "positive" } },
        { "match": { "category": "personal" } },
        { "match": { "user_id": "123" } }  // User ID filter
      ]
    }
  },
  "fields" : ["*"]
}
'

This query retrieves documents with a positive sentiment in the personal category for the user with ID "123", combining sentiment analysis with indexed field queries and user filtering.

Definition of derived field in search request

The same example can be used to define the derived fields in search request, here is an example -

Step 1: Create index

curl -X PUT "localhost:9200/sentiment_analysis?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "text": { "type": "text", "index": false }, // Field not indexed
      "category": { "type": "keyword" },  // Additional indexed field for context
      "user_id": { "type": "keyword" }   // User ID field for filtering
    }
  }
}
'

Step 2: Ingest Documents

curl -X POST "localhost:9200/sentiment_analysis/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index":{}}
{"text": "I am feeling happy today!", "category": "personal", "user_id": "123"}
{"index":{}}
{"text": "The news made me sad.", "category": "news", "user_id": "456"}
{"index":{}}
{"text": "The weather is neutral.", "category": "weather", "user_id": "789"}
'

Step 3: Query Based on Sentiment, Indexed Fields, and User ID

curl -X GET "localhost:9200/sentiment_analysis/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "sentiment": "positive" } },
        { "match": { "category": "personal" } },
        { "match": { "user_id": "123" } }  // User ID filter
      ]
    }
  },
  "fields" : ["*"],
  "derived": {
    "sentiment": {
      "type": "keyword",
      "script": {
        "source": "if (params._source[\"text\"].contains(\"happy\")) { emit(\"positive\") } else if (params._source[\"text\"].contains(\"sad\")) { emit(\"negative\") } else { emit(\"neutral\") }"
      }
    }
  }
}
'

The text was updated successfully, but these errors were encountered:

hdhalter · 2024-04-24T23:24:25Z

Hi @rishabhmaurya , what is your ETA for producing the doc PR? To meet entrance criteria, the PR must be open and in review by 4/30. Thanks!

getsaurabh02 · 2024-04-29T20:51:09Z

@rishabhmaurya to add the PR for 2.14 related doc content.

hdhalter · 2024-04-29T20:58:52Z

Doc in progress.

rishabhmaurya · 2024-04-30T22:50:30Z

Given this feature isn't complete yet - its missing aggregation and scoring support, we will move it out of 2.14

cc: @smacrakis @getsaurabh02

hdhalter · 2024-06-06T17:49:01Z

@rishabhmaurya will submit PR by end of day 6/6. Thanks!

rishabhmaurya · 2024-06-07T01:34:36Z

@hdhalter Here is the ~~draft~~ PR: https://github.com/opensearch-project/documentation-website/pull/7329/files
Feel free to take a look. I need to work on the last section before I publish it.

rishabhmaurya added untriaged v2.14.0 labels Apr 11, 2024

hdhalter assigned rishabhmaurya and Naarcha-AWS Apr 15, 2024

hdhalter added 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. and removed untriaged labels Apr 15, 2024

hdhalter added this to the v2.14 milestone Apr 15, 2024

mgodwan mentioned this issue Apr 19, 2024

Add metadata fields for mappings (content gap initiative) #6933

Merged

1 task

rishabhmaurya added v2.15.0 and removed v2.14.0 labels Apr 30, 2024

hdhalter modified the milestones: v2.14, v2.15 May 2, 2024

This was referenced May 16, 2024

[Feature Request] Support for object type in Derived Fields opensearch-project/OpenSearch#13143

Closed

[Derived Field] Integration tests for derived fields opensearch-project/OpenSearch#13721

Merged

rishabh6788 mentioned this issue May 16, 2024

test-commit rishabh6788/OpenSearch#1

Closed

9 tasks

rishabhmaurya mentioned this issue Jun 7, 2024

Add documentation of derived fields #7329

Merged

1 task

hdhalter added 2 - In progress Issue/PR: The issue or PR is in progress. and removed 1 - Backlog - DEV Developer assigned to issue is responsible for creating PR. labels Jun 7, 2024

kolchfa-aws closed this as completed in #7329 Jun 14, 2024

hdhalter added the 3 - Done Issue is done/complete label Jun 14, 2024

hdhalter removed the 2 - In progress Issue/PR: The issue or PR is in progress. label Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Document the Derived Field feature #6943

[DOC] Document the Derived Field feature #6943

rishabhmaurya commented Apr 11, 2024 •

edited

Loading

hdhalter commented Apr 24, 2024

getsaurabh02 commented Apr 29, 2024

hdhalter commented Apr 29, 2024

rishabhmaurya commented Apr 30, 2024 •

edited

Loading

hdhalter commented Jun 6, 2024

rishabhmaurya commented Jun 7, 2024 •

edited

Loading

[DOC] Document the Derived Field feature #6943

[DOC] Document the Derived Field feature #6943

Comments

rishabhmaurya commented Apr 11, 2024 • edited Loading

Example 1

Example 2

Definition of derived field in search request

hdhalter commented Apr 24, 2024

getsaurabh02 commented Apr 29, 2024

hdhalter commented Apr 29, 2024

rishabhmaurya commented Apr 30, 2024 • edited Loading

hdhalter commented Jun 6, 2024

rishabhmaurya commented Jun 7, 2024 • edited Loading

rishabhmaurya commented Apr 11, 2024 •

edited

Loading

rishabhmaurya commented Apr 30, 2024 •

edited

Loading

rishabhmaurya commented Jun 7, 2024 •

edited

Loading