From eb0bb09e4496f3f839237e9c1f95bf67cf9c2b73 Mon Sep 17 00:00:00 2001 From: Sagar Upadhyaya Date: Mon, 18 Mar 2024 09:00:12 -0700 Subject: [PATCH 01/10] Adding documentation for cache plugin and tiered cache Signed-off-by: Sagar Upadhyaya --- _search-plugins/cache-plugins/index.md | 25 ++++++++ _search-plugins/cache-plugins/tiered-cache.md | 60 +++++++++++++++++++ 2 files changed, 85 insertions(+) create mode 100644 _search-plugins/cache-plugins/index.md create mode 100644 _search-plugins/cache-plugins/tiered-cache.md diff --git a/_search-plugins/cache-plugins/index.md b/_search-plugins/cache-plugins/index.md new file mode 100644 index 0000000000..233533ef4d --- /dev/null +++ b/_search-plugins/cache-plugins/index.md @@ -0,0 +1,25 @@ +# Cache plugins +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated GitHub issue. + +Cache plugins gives an ability to use different kind of cache stores(like on-heap, disk and tiered) within Opensearch. + +## Background + +OpenSearch relies heavily on different on-heap caches to speed up the data retrieval process and thereby providing significant improvement in search latencies. But cache size is limited by the amount of memory available on a node. In cases where we are dealing with larger datasets which can potentially be cached, this causes a lot of cache evictions/misses and potentially impacting performance as we need to process/compute the query again causing high resource consumption. + +Different on-heap cache types within OpenSearch(as of today): +- Request cache: Caches the local results on each shard. This allows frequently used (and potentially heavy) search requests to return results almost instantly. +- Query cache: The shard-level query cche aches data when a similar query is used. The query cache is even more granular and can cache data that is reused between different queries. +- Field data cache: The field data cache contains field data and global ordinals, which are both used to support aggregations on certain field types. + +## New cache stores + +In addition to on-heap cache, we have introduced new cache implementations as listed below: + +- Disk cache: This cache uses disk to store precomputed result of a query and can be used to extend the capabilities of OpenSearch. This can be used to cache much larger dataset provided that the disk latencies are acceptable. +- Tiered cache: This is basically a multi level cache with each tier having it’s own characteristics and performance levels. For example, it can contain on-heap and disk tier. This tries to utilize the combination of different tiers and provide a balance between performance and size. To learn more about this, see {} + +## Integration points + +As of now, we have integrated below caches within OpenSearch with above plugin and provided an ability to extend its capabilities to use tiered or disk cache. +- Request cache diff --git a/_search-plugins/cache-plugins/tiered-cache.md b/_search-plugins/cache-plugins/tiered-cache.md new file mode 100644 index 0000000000..dcba3cd460 --- /dev/null +++ b/_search-plugins/cache-plugins/tiered-cache.md @@ -0,0 +1,60 @@ +# Tiered cache + +A multi level cache with each tier having it’s own characteristics and performance levels. This tries to utilize the combination of different tiers and provide a balance between performance and size. + +## Get started + +Tiered caching feature is an experimental feature as of OpenSearch 2.13. To begin using this feature, you need to first enable it using the `opensearch.experimental.feature.pluggable.caching.enabled` feature flag. + +## Types of tiered cache + +As of today, we have below implementations available for tiered cache: +- Tiered spillover cache: This implementation spills the evicted items from upper to lower tiers. Here upper tier is relatively smaller in size but offers better latency like on-heap tier. Lower is relatively larger in size but is slower(in terms of latency) compared to upper tiers. Example for lower tier can be a disk tier. As of now, it offers on-heap and disk tier. + +### Installing required plugins + +Tiered cache provides you a way to plugin any kind of disk or on-heap tier implementation. You can install desired plugins which you intend to use in Tiered cache. As of now, we only have ```cache-ehcache``` plugin available which essentially provides a disk cache implementation which can be used within tiered cache as a disk tier. Also note that failing to install this plugin and not setting disk cache properties appropriately will fail to initialize tiered cache. + + +### Tiered cache settings + +Currently we have extended OpenSearch request cache capabilites to use tiered cache. This section provides with desired instruction to appropriately set desired settings in ```opensearch.yml``` file. +Below instructions takes Request cache as an example as that is the only option as of today. + +#### 1. Set the cache store name + +Here tiered_spillover signifies that we intend to use a tiered spilover cache as mentioned above. + +```indices.request.cache.store.name: tiered_spillover``` + +#### 2. Set the underlying onHeap and disk stores for tiered cache + +Here ```opensearch_onheap``` is the inbuilt/default on-heap cache available within OpenSearch. +```ehcache_disk``` is the disk cache implementation from ehcache. This is provided via plugin, so needs to be installed as a pre-requisite. + +``` +indices.request.cache.tiered_spillover.onheap.store.name: opensearch_onheap +indices.request.cache.tiered_spillover.disk.store.name: ehcache_disk +``` + +#### 3. Set appropriate configs for on-heap and disk store + + +OnHeap cache store settings for ```opensearch_onheap``` store. + +Setting | Default | Description +:--- | :--- | :--- +`indices.request.cache.opensearch_onheap.size` | 1% of the heap | Size of on heap cache. Optional. +`indices.request.cache.opensearch_onheap.expire` | MAX_VALUE(disabled) | Specify a time-to-live(ttl) for the cached results. Optional. + +Disk cache store setting for ```ehcache_disk``` store. + +Setting | Default | Description +:--- | :--- | :--- +`indices.request.cache.ehcache_disk.max_size_in_bytes` | 1073741824 (1gb) | Defines size of the disk cache. Optional. +`indices.request.cache.ehcache_disk.storage.path` | "" | Defines storage path for disk cache. Required. +`indices.request.cache.ehcache_disk.expire_after_access` | MAX_VALUE(disabled) | Specify a time-to-live(ttl) for the cached results. Optional. +`indices.request.cache.ehcache_disk.alias` | ehcacheDiskCache#INDICES_REQUEST_CACHE (taking requets cache as an example) | Specify an alias for disk cache. Optional. +`indices.request.cache.ehcache_disk.segments` | 16 | Defines how many segments the disk cache is separated into. Used for concurrency. Optional. +`indices.request.cache.ehcache_disk.concurrency` | 1 | Defines distinct write queues created for disk store where a group of segments share a write queue. Optional. + From 8afa27594c0d8aba8575893d3586047806463f59 Mon Sep 17 00:00:00 2001 From: Sagar Upadhyaya Date: Mon, 18 Mar 2024 09:10:36 -0700 Subject: [PATCH 02/10] Adding tiered cache github link and fixing typos Signed-off-by: Sagar Upadhyaya --- _search-plugins/cache-plugins/index.md | 8 +++++++- _search-plugins/cache-plugins/tiered-cache.md | 18 +++++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/_search-plugins/cache-plugins/index.md b/_search-plugins/cache-plugins/index.md index 233533ef4d..0fb94a34f1 100644 --- a/_search-plugins/cache-plugins/index.md +++ b/_search-plugins/cache-plugins/index.md @@ -1,5 +1,11 @@ +--- +layout: default +title: Cache plugins +nav_order: 100 +--- + # Cache plugins -This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated GitHub issue. +This is an experimental feature and is not recommended for use in a production environment. Cache plugins gives an ability to use different kind of cache stores(like on-heap, disk and tiered) within Opensearch. diff --git a/_search-plugins/cache-plugins/tiered-cache.md b/_search-plugins/cache-plugins/tiered-cache.md index dcba3cd460..44133d1dfe 100644 --- a/_search-plugins/cache-plugins/tiered-cache.md +++ b/_search-plugins/cache-plugins/tiered-cache.md @@ -1,16 +1,24 @@ +--- +layout: default +title: Tiered cache +parent: Cache plugins +nav_order: 65 +--- + # Tiered cache +Tiered caching feature is an experimental feature as of OpenSearch 2.13. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024). A multi level cache with each tier having it’s own characteristics and performance levels. This tries to utilize the combination of different tiers and provide a balance between performance and size. -## Get started - -Tiered caching feature is an experimental feature as of OpenSearch 2.13. To begin using this feature, you need to first enable it using the `opensearch.experimental.feature.pluggable.caching.enabled` feature flag. - ## Types of tiered cache As of today, we have below implementations available for tiered cache: - Tiered spillover cache: This implementation spills the evicted items from upper to lower tiers. Here upper tier is relatively smaller in size but offers better latency like on-heap tier. Lower is relatively larger in size but is slower(in terms of latency) compared to upper tiers. Example for lower tier can be a disk tier. As of now, it offers on-heap and disk tier. +## Get started + +Tiered caching feature is an experimental feature as of OpenSearch 2.13. To begin using this feature, you need to first enable it using the `opensearch.experimental.feature.pluggable.caching.enabled` feature flag. + ### Installing required plugins Tiered cache provides you a way to plugin any kind of disk or on-heap tier implementation. You can install desired plugins which you intend to use in Tiered cache. As of now, we only have ```cache-ehcache``` plugin available which essentially provides a disk cache implementation which can be used within tiered cache as a disk tier. Also note that failing to install this plugin and not setting disk cache properties appropriately will fail to initialize tiered cache. @@ -54,7 +62,7 @@ Setting | Default | Description `indices.request.cache.ehcache_disk.max_size_in_bytes` | 1073741824 (1gb) | Defines size of the disk cache. Optional. `indices.request.cache.ehcache_disk.storage.path` | "" | Defines storage path for disk cache. Required. `indices.request.cache.ehcache_disk.expire_after_access` | MAX_VALUE(disabled) | Specify a time-to-live(ttl) for the cached results. Optional. -`indices.request.cache.ehcache_disk.alias` | ehcacheDiskCache#INDICES_REQUEST_CACHE (taking requets cache as an example) | Specify an alias for disk cache. Optional. +`indices.request.cache.ehcache_disk.alias` | ehcacheDiskCache#INDICES_REQUEST_CACHE (request cache as an example) | Specify an alias for disk cache. Optional. `indices.request.cache.ehcache_disk.segments` | 16 | Defines how many segments the disk cache is separated into. Used for concurrency. Optional. `indices.request.cache.ehcache_disk.concurrency` | 1 | Defines distinct write queues created for disk store where a group of segments share a write queue. Optional. From 2e3e4e60055afa884b48baa83fcda72d1264efd8 Mon Sep 17 00:00:00 2001 From: Sagar Upadhyaya Date: Mon, 18 Mar 2024 09:12:09 -0700 Subject: [PATCH 03/10] Refactor tiered cache doc Signed-off-by: Sagar Upadhyaya --- _search-plugins/cache-plugins/tiered-cache.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_search-plugins/cache-plugins/tiered-cache.md b/_search-plugins/cache-plugins/tiered-cache.md index 44133d1dfe..578f9b9797 100644 --- a/_search-plugins/cache-plugins/tiered-cache.md +++ b/_search-plugins/cache-plugins/tiered-cache.md @@ -8,6 +8,7 @@ nav_order: 65 # Tiered cache Tiered caching feature is an experimental feature as of OpenSearch 2.13. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024). + A multi level cache with each tier having it’s own characteristics and performance levels. This tries to utilize the combination of different tiers and provide a balance between performance and size. ## Types of tiered cache From 9249d9187703448d74f55c2a6b704691d0367269 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Tue, 19 Mar 2024 14:13:10 -0400 Subject: [PATCH 04/10] Doc review Signed-off-by: Fanit Kolchina --- _search-plugins/cache-plugins/index.md | 37 +++++----- _search-plugins/cache-plugins/tiered-cache.md | 74 +++++++++++-------- 2 files changed, 61 insertions(+), 50 deletions(-) diff --git a/_search-plugins/cache-plugins/index.md b/_search-plugins/cache-plugins/index.md index 0fb94a34f1..2f1ea14fd7 100644 --- a/_search-plugins/cache-plugins/index.md +++ b/_search-plugins/cache-plugins/index.md @@ -1,31 +1,32 @@ --- layout: default -title: Cache plugins +title: Cache types +parent: Improving search performance +has_children: true nav_order: 100 --- -# Cache plugins -This is an experimental feature and is not recommended for use in a production environment. +# Cache types -Cache plugins gives an ability to use different kind of cache stores(like on-heap, disk and tiered) within Opensearch. +OpenSearch relies heavily on different types of on-heap cache to accelerate data retrieval, providing significant improvement in search latencies. However, cache size is limited by the amount of memory available on a node. If you are processing a larger dataset that can potentially be cached, the cache size limit causes a lot of cache evictions and misses. The increasing number of evictions impacts performance because OpenSearch needs to process the query again, causing high resource consumption. -## Background +Prior to version 2.13, OpenSearch supported the following on-heap cache types: -OpenSearch relies heavily on different on-heap caches to speed up the data retrieval process and thereby providing significant improvement in search latencies. But cache size is limited by the amount of memory available on a node. In cases where we are dealing with larger datasets which can potentially be cached, this causes a lot of cache evictions/misses and potentially impacting performance as we need to process/compute the query again causing high resource consumption. +- **Request cache**: Caches the local results on each shard. This allows frequently used (and potentially resource-heavy) search requests to return results almost instantly. +- **Query cache**: The shard-level query cache caches common data from similar queries. The query cache is more granular than the request cache and can cache data that is reused between different queries. +- **Field data cache**: The field data cache contains field data and global ordinals, which are both used to support aggregations on certain field types. -Different on-heap cache types within OpenSearch(as of today): -- Request cache: Caches the local results on each shard. This allows frequently used (and potentially heavy) search requests to return results almost instantly. -- Query cache: The shard-level query cche aches data when a similar query is used. The query cache is even more granular and can cache data that is reused between different queries. -- Field data cache: The field data cache contains field data and global ordinals, which are both used to support aggregations on certain field types. +## Additional cache stores +**Introduced 2.13** +{: .label .label-purple } -## New cache stores +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024). +{: .warning} -In addition to on-heap cache, we have introduced new cache implementations as listed below: +In addition to existing OpenSearch cache types, you can use the following cache stores with the help of cache plugins: -- Disk cache: This cache uses disk to store precomputed result of a query and can be used to extend the capabilities of OpenSearch. This can be used to cache much larger dataset provided that the disk latencies are acceptable. -- Tiered cache: This is basically a multi level cache with each tier having it’s own characteristics and performance levels. For example, it can contain on-heap and disk tier. This tries to utilize the combination of different tiers and provide a balance between performance and size. To learn more about this, see {} +- **Disk cache**: This cache stores a precomputed result of a query on disk. You can use disk cache to cache much larger datasets, provided that the disk latencies are acceptable. +- **Tiered cache**: This is a multi-level cache, in which each tier has its own characteristics and performance levels. For example, a tiered cache can contain on-heap and disk tiers. By combining different tiers, you can achieve a balance between cache performance and size. To learn more, see [Tiered cache]({{site.url}}{{site.baseurl}}/search-plugins/cache-plugins/tiered-cache/). -## Integration points - -As of now, we have integrated below caches within OpenSearch with above plugin and provided an ability to extend its capabilities to use tiered or disk cache. -- Request cache +In OpenSearch 2.13, request cache is integrated with cache plugins. You can use tiered or disk cache on a request level. +{: .note} \ No newline at end of file diff --git a/_search-plugins/cache-plugins/tiered-cache.md b/_search-plugins/cache-plugins/tiered-cache.md index 578f9b9797..a7d79d2111 100644 --- a/_search-plugins/cache-plugins/tiered-cache.md +++ b/_search-plugins/cache-plugins/tiered-cache.md @@ -1,69 +1,79 @@ --- layout: default title: Tiered cache -parent: Cache plugins -nav_order: 65 +parent: Cache types +grand_parent: Improving search performance +nav_order: 10 --- # Tiered cache -Tiered caching feature is an experimental feature as of OpenSearch 2.13. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024). +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024). +{: .warning} -A multi level cache with each tier having it’s own characteristics and performance levels. This tries to utilize the combination of different tiers and provide a balance between performance and size. +Tiered cache is a multi-level cache, in which each tier has its own characteristics and performance levels. By combining different tiers, you can achieve a balance between cache performance and size. ## Types of tiered cache -As of today, we have below implementations available for tiered cache: -- Tiered spillover cache: This implementation spills the evicted items from upper to lower tiers. Here upper tier is relatively smaller in size but offers better latency like on-heap tier. Lower is relatively larger in size but is slower(in terms of latency) compared to upper tiers. Example for lower tier can be a disk tier. As of now, it offers on-heap and disk tier. +OpenSearch 2.13 provides an implementation of _tiered spillover cache_. This implementation spills the evicted items from upper to lower tiers. The upper tier is smaller in size but offers better latency like on-heap tier. The lower tier is larger in size but is slower in terms of latency compared to the upper tier. An example of a lower tier is disk cache. OpenSearch 2.13 offers on-heap and disk tiers. -## Get started +## Enabling tiered cache -Tiered caching feature is an experimental feature as of OpenSearch 2.13. To begin using this feature, you need to first enable it using the `opensearch.experimental.feature.pluggable.caching.enabled` feature flag. +To enable tiered cache, configure the following setting: -### Installing required plugins +```yaml +opensearch.experimental.feature.pluggable.caching.enabled: true +``` +{% include copy.html %} -Tiered cache provides you a way to plugin any kind of disk or on-heap tier implementation. You can install desired plugins which you intend to use in Tiered cache. As of now, we only have ```cache-ehcache``` plugin available which essentially provides a disk cache implementation which can be used within tiered cache as a disk tier. Also note that failing to install this plugin and not setting disk cache properties appropriately will fail to initialize tiered cache. +For more information about ways to enable experimental features, see [Experimental feature flags]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). +## Installing required plugins -### Tiered cache settings +Tiered cache provides a way to plug in any disk or on-heap tier implementation. You can install the plugins you intend to use in tiered cache. As of OpenSearch 2.13, the available cache plugin is the `cache-ehcache` plugin. This plugin provides a disk cache implementation to use within tiered cache as a disk tier. -Currently we have extended OpenSearch request cache capabilites to use tiered cache. This section provides with desired instruction to appropriately set desired settings in ```opensearch.yml``` file. -Below instructions takes Request cache as an example as that is the only option as of today. +Tiered cache will fail to initialize if the `cache-ehcache` plugin is not installed or disk cache properties are not set. +{: .warning} -#### 1. Set the cache store name +## Tiered cache settings -Here tiered_spillover signifies that we intend to use a tiered spilover cache as mentioned above. +In OpenSearch 2.13, request cache can use tiered cache. To start, configure the following settings in the `opensearch.yml` file. -```indices.request.cache.store.name: tiered_spillover``` +### Cache store name -#### 2. Set the underlying onHeap and disk stores for tiered cache +Setting the cache store name to `tiered_spillover` signals to use the OpenSearch-provided tiered spillover cache implementation: +```yaml +indices.request.cache.store.name: tiered_spillover: true +``` +{% include copy.html %} -Here ```opensearch_onheap``` is the inbuilt/default on-heap cache available within OpenSearch. -```ehcache_disk``` is the disk cache implementation from ehcache. This is provided via plugin, so needs to be installed as a pre-requisite. +### Setting on-heap and disk store tiers -``` +The `opensearch_onheap` setting is the built-in on-heap cache available in OpenSearch. The `ehcache_disk` setting is the disk cache implementation from [Ehcache](https://www.ehcache.org/). This requires installing a plugin: + +```yaml indices.request.cache.tiered_spillover.onheap.store.name: opensearch_onheap indices.request.cache.tiered_spillover.disk.store.name: ehcache_disk ``` +{% include copy.html %} -#### 3. Set appropriate configs for on-heap and disk store - +### Configuring on-heap and disk store -OnHeap cache store settings for ```opensearch_onheap``` store. +The following table lists the cache store settings for the `opensearch_onheap` store. Setting | Default | Description :--- | :--- | :--- -`indices.request.cache.opensearch_onheap.size` | 1% of the heap | Size of on heap cache. Optional. -`indices.request.cache.opensearch_onheap.expire` | MAX_VALUE(disabled) | Specify a time-to-live(ttl) for the cached results. Optional. +`indices.request.cache.opensearch_onheap.size` | 1% of the heap | Size of on-heap cache. Optional. +`indices.request.cache.opensearch_onheap.expire` | `MAX_VALUE` (disabled) | Specify a time-to-live (TTL) for the cached results. Optional. -Disk cache store setting for ```ehcache_disk``` store. +The following table lists the disk cache store settings for the `ehcache_disk` store. Setting | Default | Description :--- | :--- | :--- -`indices.request.cache.ehcache_disk.max_size_in_bytes` | 1073741824 (1gb) | Defines size of the disk cache. Optional. -`indices.request.cache.ehcache_disk.storage.path` | "" | Defines storage path for disk cache. Required. -`indices.request.cache.ehcache_disk.expire_after_access` | MAX_VALUE(disabled) | Specify a time-to-live(ttl) for the cached results. Optional. -`indices.request.cache.ehcache_disk.alias` | ehcacheDiskCache#INDICES_REQUEST_CACHE (request cache as an example) | Specify an alias for disk cache. Optional. -`indices.request.cache.ehcache_disk.segments` | 16 | Defines how many segments the disk cache is separated into. Used for concurrency. Optional. -`indices.request.cache.ehcache_disk.concurrency` | 1 | Defines distinct write queues created for disk store where a group of segments share a write queue. Optional. +`indices.request.cache.ehcache_disk.max_size_in_bytes` | `1073741824` (1 GB) | Defines size of the disk cache. Optional. +`indices.request.cache.ehcache_disk.storage.path` | `""` | Defines the storage path for disk cache. Required. +`indices.request.cache.ehcache_disk.expire_after_access` | `MAX_VALUE` (disabled) | Specify a time-to-live (TTL) for the cached results. Optional. +`indices.request.cache.ehcache_disk.alias` | `ehcacheDiskCache#INDICES_REQUEST_CACHE` (this is an example of request cache) | Specify an alias for disk cache. Optional. +`indices.request.cache.ehcache_disk.segments` | `16` | Defines the number of segments the disk cache is separated into. Used for concurrency. Optional. +`indices.request.cache.ehcache_disk.concurrency` | `1` | Defines the number of distinct write queues created for disk store, where a group of segments share a write queue. Optional. From 70b35c075ece6be12161cccd5ae605c90d4d8aa9 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Wed, 20 Mar 2024 15:47:32 -0400 Subject: [PATCH 05/10] Review comments Signed-off-by: Fanit Kolchina --- _search-plugins/{cache-plugins => caching}/index.md | 10 +++++----- .../{cache-plugins => caching}/tiered-cache.md | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) rename _search-plugins/{cache-plugins => caching}/index.md (88%) rename _search-plugins/{cache-plugins => caching}/tiered-cache.md (99%) diff --git a/_search-plugins/cache-plugins/index.md b/_search-plugins/caching/index.md similarity index 88% rename from _search-plugins/cache-plugins/index.md rename to _search-plugins/caching/index.md index 2f1ea14fd7..f3cc4709b8 100644 --- a/_search-plugins/cache-plugins/index.md +++ b/_search-plugins/caching/index.md @@ -1,12 +1,12 @@ --- layout: default -title: Cache types +title: Cache stores parent: Improving search performance has_children: true nav_order: 100 --- -# Cache types +# Cache stores OpenSearch relies heavily on different types of on-heap cache to accelerate data retrieval, providing significant improvement in search latencies. However, cache size is limited by the amount of memory available on a node. If you are processing a larger dataset that can potentially be cached, the cache size limit causes a lot of cache evictions and misses. The increasing number of evictions impacts performance because OpenSearch needs to process the query again, causing high resource consumption. @@ -23,10 +23,10 @@ Prior to version 2.13, OpenSearch supported the following on-heap cache types: This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024). {: .warning} -In addition to existing OpenSearch cache types, you can use the following cache stores with the help of cache plugins: +In addition to existing OpenSearch custom on-heap cache stores, cache plugins provide the following cache stores: - **Disk cache**: This cache stores a precomputed result of a query on disk. You can use disk cache to cache much larger datasets, provided that the disk latencies are acceptable. -- **Tiered cache**: This is a multi-level cache, in which each tier has its own characteristics and performance levels. For example, a tiered cache can contain on-heap and disk tiers. By combining different tiers, you can achieve a balance between cache performance and size. To learn more, see [Tiered cache]({{site.url}}{{site.baseurl}}/search-plugins/cache-plugins/tiered-cache/). +- **Tiered cache**: This is a multi-level cache, in which each tier has its own characteristics and performance levels. For example, a tiered cache can contain on-heap and disk tiers. By combining different tiers, you can achieve a balance between cache performance and size. To learn more, see [Tiered cache]({{site.url}}{{site.baseurl}}/search-plugins/caching/tiered-cache/). -In OpenSearch 2.13, request cache is integrated with cache plugins. You can use tiered or disk cache on a request level. +In OpenSearch 2.13, request cache is integrated with cache plugins. You can use tiered or disk cache as a request-level cache. {: .note} \ No newline at end of file diff --git a/_search-plugins/cache-plugins/tiered-cache.md b/_search-plugins/caching/tiered-cache.md similarity index 99% rename from _search-plugins/cache-plugins/tiered-cache.md rename to _search-plugins/caching/tiered-cache.md index a7d79d2111..10658b0d24 100644 --- a/_search-plugins/cache-plugins/tiered-cache.md +++ b/_search-plugins/caching/tiered-cache.md @@ -1,7 +1,7 @@ --- layout: default title: Tiered cache -parent: Cache types +parent: Cache stores grand_parent: Improving search performance nav_order: 10 --- From 7ed77923e7004e54fefe06c61d2e2c0a5cf9c9b8 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Wed, 20 Mar 2024 15:59:17 -0400 Subject: [PATCH 06/10] Add plugin instructions Signed-off-by: Fanit Kolchina --- _search-plugins/caching/tiered-cache.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/_search-plugins/caching/tiered-cache.md b/_search-plugins/caching/tiered-cache.md index 10658b0d24..09230e9388 100644 --- a/_search-plugins/caching/tiered-cache.md +++ b/_search-plugins/caching/tiered-cache.md @@ -42,6 +42,7 @@ In OpenSearch 2.13, request cache can use tiered cache. To start, configure the ### Cache store name Setting the cache store name to `tiered_spillover` signals to use the OpenSearch-provided tiered spillover cache implementation: + ```yaml indices.request.cache.store.name: tiered_spillover: true ``` @@ -49,7 +50,7 @@ indices.request.cache.store.name: tiered_spillover: true ### Setting on-heap and disk store tiers -The `opensearch_onheap` setting is the built-in on-heap cache available in OpenSearch. The `ehcache_disk` setting is the disk cache implementation from [Ehcache](https://www.ehcache.org/). This requires installing a plugin: +The `opensearch_onheap` setting is the built-in on-heap cache available in OpenSearch. The `ehcache_disk` setting is the disk cache implementation from [Ehcache](https://www.ehcache.org/). This requires installing the `cache-ehcache` plugin: ```yaml indices.request.cache.tiered_spillover.onheap.store.name: opensearch_onheap @@ -57,6 +58,8 @@ indices.request.cache.tiered_spillover.disk.store.name: ehcache_disk ``` {% include copy.html %} +For more information about installing non-bundled plugins, see [Additional plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/#additional-plugins). + ### Configuring on-heap and disk store The following table lists the cache store settings for the `opensearch_onheap` store. From 3e842dc1a45bdd492759d015e1894ca4e80756f4 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 22 Mar 2024 09:28:06 -0400 Subject: [PATCH 07/10] Renamed topic to caching Signed-off-by: Fanit Kolchina --- _search-plugins/caching/index.md | 4 ++-- _search-plugins/caching/tiered-cache.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_search-plugins/caching/index.md b/_search-plugins/caching/index.md index f3cc4709b8..5f4c07b964 100644 --- a/_search-plugins/caching/index.md +++ b/_search-plugins/caching/index.md @@ -1,12 +1,12 @@ --- layout: default -title: Cache stores +title: Caching parent: Improving search performance has_children: true nav_order: 100 --- -# Cache stores +# Caching OpenSearch relies heavily on different types of on-heap cache to accelerate data retrieval, providing significant improvement in search latencies. However, cache size is limited by the amount of memory available on a node. If you are processing a larger dataset that can potentially be cached, the cache size limit causes a lot of cache evictions and misses. The increasing number of evictions impacts performance because OpenSearch needs to process the query again, causing high resource consumption. diff --git a/_search-plugins/caching/tiered-cache.md b/_search-plugins/caching/tiered-cache.md index 09230e9388..71e9a1cba1 100644 --- a/_search-plugins/caching/tiered-cache.md +++ b/_search-plugins/caching/tiered-cache.md @@ -1,7 +1,7 @@ --- layout: default title: Tiered cache -parent: Cache stores +parent: Caching grand_parent: Improving search performance nav_order: 10 --- From 5a1a2f12d61df2b1d17558c8a14de3b1494845bb Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 22 Mar 2024 10:07:07 -0400 Subject: [PATCH 08/10] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/caching/index.md | 4 ++-- _search-plugins/caching/tiered-cache.md | 30 ++++++++++++------------- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/_search-plugins/caching/index.md b/_search-plugins/caching/index.md index 5f4c07b964..f40aeab3bf 100644 --- a/_search-plugins/caching/index.md +++ b/_search-plugins/caching/index.md @@ -13,7 +13,7 @@ OpenSearch relies heavily on different types of on-heap cache to accelerate data Prior to version 2.13, OpenSearch supported the following on-heap cache types: - **Request cache**: Caches the local results on each shard. This allows frequently used (and potentially resource-heavy) search requests to return results almost instantly. -- **Query cache**: The shard-level query cache caches common data from similar queries. The query cache is more granular than the request cache and can cache data that is reused between different queries. +- **Query cache**: The shard-level query cache caches common data from similar queries. The query cache is more granular than the request cache and can cache data that is reused in different queries. - **Field data cache**: The field data cache contains field data and global ordinals, which are both used to support aggregations on certain field types. ## Additional cache stores @@ -28,5 +28,5 @@ In addition to existing OpenSearch custom on-heap cache stores, cache plugins pr - **Disk cache**: This cache stores a precomputed result of a query on disk. You can use disk cache to cache much larger datasets, provided that the disk latencies are acceptable. - **Tiered cache**: This is a multi-level cache, in which each tier has its own characteristics and performance levels. For example, a tiered cache can contain on-heap and disk tiers. By combining different tiers, you can achieve a balance between cache performance and size. To learn more, see [Tiered cache]({{site.url}}{{site.baseurl}}/search-plugins/caching/tiered-cache/). -In OpenSearch 2.13, request cache is integrated with cache plugins. You can use tiered or disk cache as a request-level cache. +In OpenSearch 2.13, the request cache is integrated with cache plugins. You can use a tiered or disk cache as a request-level cache. {: .note} \ No newline at end of file diff --git a/_search-plugins/caching/tiered-cache.md b/_search-plugins/caching/tiered-cache.md index 71e9a1cba1..3842ebe5a9 100644 --- a/_search-plugins/caching/tiered-cache.md +++ b/_search-plugins/caching/tiered-cache.md @@ -11,15 +11,15 @@ nav_order: 10 This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024). {: .warning} -Tiered cache is a multi-level cache, in which each tier has its own characteristics and performance levels. By combining different tiers, you can achieve a balance between cache performance and size. +A tiered cache is a multi-level cache, in which each tier has its own characteristics and performance levels. By combining different tiers, you can achieve a balance between cache performance and size. -## Types of tiered cache +## Types of tiered caches -OpenSearch 2.13 provides an implementation of _tiered spillover cache_. This implementation spills the evicted items from upper to lower tiers. The upper tier is smaller in size but offers better latency like on-heap tier. The lower tier is larger in size but is slower in terms of latency compared to the upper tier. An example of a lower tier is disk cache. OpenSearch 2.13 offers on-heap and disk tiers. +OpenSearch 2.13 provides an implementation of _tiered spillover cache_. This implementation spills the evicted items from upper to lower tiers. The upper tier is smaller in size but offers better latency, like the on-heap tier. The lower tier is larger in size but is slower in terms of latency compared to the upper tier. A disk cache is an example of a lower tier. OpenSearch 2.13 offers on-heap and disk tiers. -## Enabling tiered cache +## Enabling a tiered cache -To enable tiered cache, configure the following setting: +To enable a tiered cache, configure the following setting: ```yaml opensearch.experimental.feature.pluggable.caching.enabled: true @@ -30,18 +30,18 @@ For more information about ways to enable experimental features, see [Experiment ## Installing required plugins -Tiered cache provides a way to plug in any disk or on-heap tier implementation. You can install the plugins you intend to use in tiered cache. As of OpenSearch 2.13, the available cache plugin is the `cache-ehcache` plugin. This plugin provides a disk cache implementation to use within tiered cache as a disk tier. +A tiered cache provides a way to plug in any disk or on-heap tier implementation. You can install the plugins you intend to use in the tiered cache. As of OpenSearch 2.13, the available cache plugin is the `cache-ehcache` plugin. This plugin provides a disk cache implementation to use within a tiered cache as a disk tier. -Tiered cache will fail to initialize if the `cache-ehcache` plugin is not installed or disk cache properties are not set. +A tiered cache will fail to initialize if the `cache-ehcache` plugin is not installed or disk cache properties are not set. {: .warning} ## Tiered cache settings -In OpenSearch 2.13, request cache can use tiered cache. To start, configure the following settings in the `opensearch.yml` file. +In OpenSearch 2.13, a request cache can use a tiered cache. To begin, configure the following settings in the `opensearch.yml` file. ### Cache store name -Setting the cache store name to `tiered_spillover` signals to use the OpenSearch-provided tiered spillover cache implementation: +Set the cache store name to `tiered_spillover` to use the OpenSearch-provided tiered spillover cache implementation: ```yaml indices.request.cache.store.name: tiered_spillover: true @@ -60,23 +60,23 @@ indices.request.cache.tiered_spillover.disk.store.name: ehcache_disk For more information about installing non-bundled plugins, see [Additional plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/#additional-plugins). -### Configuring on-heap and disk store +### Configuring on-heap and disk stores The following table lists the cache store settings for the `opensearch_onheap` store. Setting | Default | Description :--- | :--- | :--- -`indices.request.cache.opensearch_onheap.size` | 1% of the heap | Size of on-heap cache. Optional. +`indices.request.cache.opensearch_onheap.size` | 1% of the heap | The size of the on-heap cache. Optional. `indices.request.cache.opensearch_onheap.expire` | `MAX_VALUE` (disabled) | Specify a time-to-live (TTL) for the cached results. Optional. The following table lists the disk cache store settings for the `ehcache_disk` store. Setting | Default | Description :--- | :--- | :--- -`indices.request.cache.ehcache_disk.max_size_in_bytes` | `1073741824` (1 GB) | Defines size of the disk cache. Optional. -`indices.request.cache.ehcache_disk.storage.path` | `""` | Defines the storage path for disk cache. Required. +`indices.request.cache.ehcache_disk.max_size_in_bytes` | `1073741824` (1 GB) | Defines the size of the disk cache. Optional. +`indices.request.cache.ehcache_disk.storage.path` | `""` | Defines the storage path for the disk cache. Required. `indices.request.cache.ehcache_disk.expire_after_access` | `MAX_VALUE` (disabled) | Specify a time-to-live (TTL) for the cached results. Optional. -`indices.request.cache.ehcache_disk.alias` | `ehcacheDiskCache#INDICES_REQUEST_CACHE` (this is an example of request cache) | Specify an alias for disk cache. Optional. +`indices.request.cache.ehcache_disk.alias` | `ehcacheDiskCache#INDICES_REQUEST_CACHE` (this is an example of request cache) | Specify an alias for the disk cache. Optional. `indices.request.cache.ehcache_disk.segments` | `16` | Defines the number of segments the disk cache is separated into. Used for concurrency. Optional. -`indices.request.cache.ehcache_disk.concurrency` | `1` | Defines the number of distinct write queues created for disk store, where a group of segments share a write queue. Optional. +`indices.request.cache.ehcache_disk.concurrency` | `1` | Defines the number of distinct write queues created for the disk store, where a group of segments share a write queue. Optional. From bb092bc432bd7e6de28a4f5d8d919d0c090fea7b Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 22 Mar 2024 10:11:18 -0400 Subject: [PATCH 09/10] Update _search-plugins/caching/index.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/caching/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/caching/index.md b/_search-plugins/caching/index.md index f40aeab3bf..ec61ed5903 100644 --- a/_search-plugins/caching/index.md +++ b/_search-plugins/caching/index.md @@ -8,7 +8,7 @@ nav_order: 100 # Caching -OpenSearch relies heavily on different types of on-heap cache to accelerate data retrieval, providing significant improvement in search latencies. However, cache size is limited by the amount of memory available on a node. If you are processing a larger dataset that can potentially be cached, the cache size limit causes a lot of cache evictions and misses. The increasing number of evictions impacts performance because OpenSearch needs to process the query again, causing high resource consumption. +OpenSearch relies heavily on different on-heap cache types to accelerate data retrieval, providing significant improvement in search latencies. However, cache size is limited by the amount of memory available on a node. If you are processing a larger dataset that can potentially be cached, the cache size limit causes a lot of cache evictions and misses. The increasing number of evictions impacts performance because OpenSearch needs to process the query again, causing high resource consumption. Prior to version 2.13, OpenSearch supported the following on-heap cache types: From 5706e81e11c982d3fe0868c1f18276de1e8a34e0 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 22 Mar 2024 10:11:39 -0400 Subject: [PATCH 10/10] Update _search-plugins/caching/index.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/caching/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/caching/index.md b/_search-plugins/caching/index.md index ec61ed5903..4d0173fdc7 100644 --- a/_search-plugins/caching/index.md +++ b/_search-plugins/caching/index.md @@ -25,7 +25,7 @@ This is an experimental feature and is not recommended for use in a production e In addition to existing OpenSearch custom on-heap cache stores, cache plugins provide the following cache stores: -- **Disk cache**: This cache stores a precomputed result of a query on disk. You can use disk cache to cache much larger datasets, provided that the disk latencies are acceptable. +- **Disk cache**: This cache stores the precomputed result of a query on disk. You can use a disk cache to cache much larger datasets, provided that the disk latencies are acceptable. - **Tiered cache**: This is a multi-level cache, in which each tier has its own characteristics and performance levels. For example, a tiered cache can contain on-heap and disk tiers. By combining different tiers, you can achieve a balance between cache performance and size. To learn more, see [Tiered cache]({{site.url}}{{site.baseurl}}/search-plugins/caching/tiered-cache/). In OpenSearch 2.13, the request cache is integrated with cache plugins. You can use a tiered or disk cache as a request-level cache.