Query grouping framework for Top N queries and group by query similarity #66

deshsidd · 2024-08-02T18:46:12Z

Description

For Top N queries by latency, we can encounter scenarios where some (or most) of the Top N queries contain duplicate queries. Say the same dashboard query is triggered continuously and happens to be the most expensive query in terms of latency - in this scenario all the Top N queries by latency will likely be spammed by the same query. To overcome such scenarios and to get a more detailed view of the Top N query patterns we have implemented Grouping Top N queries by similarity. As a followup we can also use this framework to implement grouping top N queries by frequency, user_id, etc.

Major changes:

Query Grouping Service that groups queries based on a group_id and uses a Min and Max priority queue approach as discussed in the RFC
Created the Measurement class as an abstraction for number that is used to store the measurement for the specific MetricType. Measurement can support DimensionType (Average, Sum) for the specific measurement. For grouping by similarity we use the average latency, average cpu and average memory to maintain the ordering.
We have a GroupingType enum that describes how we group the Top N queries (similarity, user_id)
The Grouping setting applies to ALL metric types and we cannot set this only for a subset of MetricType as discussed in the RFC.
Each TopQueriesService has its instance of QueryGroupingService. We have one TopQueriesService for each metrictype.
In QueryInsightsService we add ALL the records to the queryRecordsQueue for the TopQueriesService to consume if search.query.metric feature is enabled or is grouping enabled. Note that we skip the optimization in this case.

public boolean addRecord(final SearchQueryRecord record) {
        boolean shouldAdd = isSearchQueryMetricsFeatureEnabled() || isGroupingEnabled();
        if (!shouldAdd) {
            for (Map.Entry<MetricType, TopQueriesService> entry : topQueriesServices.entrySet()) {
                if (!enableCollect.get(entry.getKey())) {
                    continue;
                }
                List<SearchQueryRecord> currentSnapshot = entry.getValue().getTopQueriesCurrentSnapshot();
                // skip add to top N queries store if the incoming record is smaller than the Nth record
                if (currentSnapshot.size() < entry.getValue().getTopNSize()
                    || SearchQueryRecord.compare(record, currentSnapshot.get(0), entry.getKey()) > 0) {
                    shouldAdd = true;
                    break;
                }
            }
        }
        if (shouldAdd) {
            return queryRecordsQueue.offer(record);
        }
        return false;
    }

Added exhaustive unit tests for QueryGroupingService.

Issues Resolved

addresses #13357

Configure Grouping

deshsid@c889f3bdacfb query-insights-unzip % curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "search.insights.top_queries.group_by": "similarity"
  }
}
'
{"acknowledged":true,"persistent":{"search":{"insights":{"top_queries":{"group_by":"similarity"}}}},"transient":{}}%

Configure Grouping Error Response

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "search.insights.top_queries.group_by": "similarit"
  }
}'

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"illegal value can't update [search.insights.top_queries.group_by] from [similarity] to [similarit]"}],"type":"illegal_argument_exception","reason":"illegal value can't update [search.insights.top_queries.group_by] from [similarity] to [similarit]","caused_by":{"type":"illegal_argument_exception","reason":"Invalid grouping type [similarit], type should be one of [SIMILARITY, USER_ID, NONE]"}},"status":400}%

Get Top N Queries by latency with grouping enabled, group_by SIMILARITY

curl -XGET "http://localhost:9200/_insights/top_queries"
{
  "top_queries": [
    {
      "timestamp": 1722630496342,
      "query_hashcode": 29791,
      "search_type": "query_then_fetch",
      "task_resource_usages": [
        {
          "action": "indices:data/read/search[phase/query]",
          "taskId": 135,
          "parentTaskId": 134,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 625000,
            "memory_in_bytes": 41512
          }
        },
        {
          "action": "indices:data/read/search",
          "taskId": 134,
          "parentTaskId": -1,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 84000,
            "memory_in_bytes": 3264
          }
        }
      ],
      "source": {},
      "indices": ["my_test_index"],
      "total_shards": 1,
      "labels": {},
      "phase_latency_map": {
        "expand": 0,
        "query": 774,
        "fetch": 0
      },
      "node_id": "zp2vxuVsRwawzBK2u7f7FA",
      "measurements": {
        "latency": {
          "metricType": "latency",
          "number": 774,
          "count": 1,
          "dimensionType": "AVERAGE"
        }
      }
    },
    {
      "timestamp": 1722630528201,
      "query_hashcode": 709023605,
      "search_type": "query_then_fetch",
      "task_resource_usages": [
        {
          "action": "indices:data/read/search[phase/query]",
          "taskId": 163,
          "parentTaskId": 162,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 9412000,
            "memory_in_bytes": 618968
          }
        },
        {
          "action": "indices:data/read/search",
          "taskId": 162,
          "parentTaskId": -1,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 158000,
            "memory_in_bytes": 3720
          }
        }
      ],
      "source": {
        "sort": [
          {
            "age": {
              "order": "asc"
            }
          }
        ]
      },
      "indices": ["my_test_index"],
      "total_shards": 1,
      "labels": {},
      "phase_latency_map": {
        "expand": 0,
        "query": 10,
        "fetch": 0
      },
      "node_id": "zp2vxuVsRwawzBK2u7f7FA",
      "measurements": {
        "latency": {
          "metricType": "latency",
          "number": 11,
          "count": 1,
          "dimensionType": "AVERAGE"
        }
      }
    },
    {
      "timestamp": 1722630499772,
      "query_hashcode": -1204891025,
      "search_type": "query_then_fetch",
      "task_resource_usages": [
        {
          "action": "indices:data/read/search[phase/query]",
          "taskId": 137,
          "parentTaskId": 136,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 7603000,
            "memory_in_bytes": 477600
          }
        },
        {
          "action": "indices:data/read/search",
          "taskId": 136,
          "parentTaskId": -1,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 127000,
            "memory_in_bytes": 3232
          }
        }
      ],
      "source": {
        "query": {
          "match": {
            "occupation": {
              "query": "Software Engineer",
              "operator": "OR",
              "prefix_length": 0,
              "max_expansions": 50,
              "fuzzy_transpositions": true,
              "lenient": false,
              "zero_terms_query": "NONE",
              "auto_generate_synonyms_phrase_query": true,
              "boost": 1.0
            }
          }
        }
      },
      "indices": ["my_test_index"],
      "total_shards": 1,
      "labels": {},
      "phase_latency_map": {
        "expand": 0,
        "query": 8,
        "fetch": 0
      },
      "node_id": "zp2vxuVsRwawzBK2u7f7FA",
      "measurements": {
        "latency": {
          "metricType": "latency",
          "number": 9,
          "count": 1,
          "dimensionType": "AVERAGE"
        }
      }
    }
  ]
}

Get Top N queries with group_by NONE

{
  "top_queries": [
    {
      "timestamp": 1722632764895,
      "query_hashcode": -1204891025,
      "search_type": "query_then_fetch",
      "task_resource_usages": [
        {
          "action": "indices:data/read/search[phase/query]",
          "taskId": 953,
          "parentTaskId": 952,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 1640000,
            "memory_in_bytes": 120760
          }
        },
        {
          "action": "indices:data/read/search",
          "taskId": 952,
          "parentTaskId": -1,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 150000,
            "memory_in_bytes": 3232
          }
        }
      ],
      "source": {
        "query": {
          "match": {
            "occupation": {
              "query": "Software Engineer",
              "operator": "OR",
              "prefix_length": 0,
              "max_expansions": 50,
              "fuzzy_transpositions": true,
              "lenient": false,
              "zero_terms_query": "NONE",
              "auto_generate_synonyms_phrase_query": true,
              "boost": 1.0
            }
          }
        }
      },
      "indices": ["my_test_index"],
      "total_shards": 1,
      "labels": {},
      "phase_latency_map": {
        "expand": 0,
        "query": 2,
        "fetch": 0
      },
      "node_id": "zp2vxuVsRwawzBK2u7f7FA",
      "measurements": {
        "latency": {
          "metricType": "latency",
          "number": 3,
          "count": 1,
          "dimensionType": "NONE"
        }
      }
    },
    {
      "timestamp": 1722632770456,
      "query_hashcode": 605146258,
      "search_type": "query_then_fetch",
      "task_resource_usages": [
        {
          "action": "indices:data/read/search[phase/query]",
          "taskId": 959,
          "parentTaskId": 958,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 852000,
            "memory_in_bytes": 49328
          }
        },
        {
          "action": "indices:data/read/search",
          "taskId": 958,
          "parentTaskId": -1,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 120000,
            "memory_in_bytes": 3240
          }
        }
      ],
      "source": {
        "query": {
          "range": {
            "age": {
              "from": 30,
              "to": null,
              "include_lower": false,
              "include_upper": true,
              "boost": 1.0
            }
          }
        }
      },
      "indices": ["my_test_index"],
      "total_shards": 1,
      "labels": {},
      "phase_latency_map": {
        "expand": 0,
        "query": 1,
        "fetch": 0
      },
      "node_id": "zp2vxuVsRwawzBK2u7f7FA",
      "measurements": {
        "latency": {
          "metricType": "latency",
          "number": 2,
          "count": 1,
          "dimensionType": "NONE"
        }
      }
    },
    {
      "timestamp": 1722632769697,
      "query_hashcode": 605146258,
      "search_type": "query_then_fetch",
      "task_resource_usages": [
        {
          "action": "indices:data/read/search[phase/query]",
          "taskId": 957,
          "parentTaskId": 956,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 980000,
            "memory_in_bytes": 49328
          }
        },
        {
          "action": "indices:data/read/search",
          "taskId": 956,
          "parentTaskId": -1,
          "nodeId": "zp2vxuVsRwawzBK2u7f7FA",
          "taskResourceUsage": {
            "cpu_time_in_nanos": 130000,
            "memory_in_bytes": 3240
          }
        }
      ],
      "source": {
        "query": {
          "range": {
            "age": {
              "from": 30,
              "to": null,
              "include_lower": false,
              "include_upper": true,
              "boost": 1.0
            }
          }
        }
      },
      "indices": ["my_test_index"],
      "total_shards": 1,
      "labels": {},
      "phase_latency_map": {
        "expand": 0,
        "query": 1,
        "fetch": 0
      },
      "node_id": "zp2vxuVsRwawzBK2u7f7FA",
      "measurements": {
        "latency": {
          "metricType": "latency",
          "number": 2,
          "count": 1,
          "dimensionType": "NONE"
        }
      }
    }
  ]
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Uploading Screen Recording 2024-08-28 at 4.35.10 PM.mov…

ansjcy

The overall class designs looks good to me.

But I have concerns on the correctness of the logic in QueryGroupingService to implement algorithms proposed in opensearch-project/OpenSearch#13357 (comment). Please see the individual comments for details.

Also, a lot of the heap operations are O(n) and O(total possible number of groups), which is not acceptable. Please refer to the comment: opensearch-project/OpenSearch#13357 (comment) to resolve it.

src/main/java/org/opensearch/plugin/insights/rules/model/DimensionType.java

src/main/java/org/opensearch/plugin/insights/rules/model/Measurement.java

src/main/java/org/opensearch/plugin/insights/rules/model/SearchQueryRecord.java

src/main/java/org/opensearch/plugin/insights/settings/QueryInsightsSettings.java

src/test/java/org/opensearch/plugin/insights/QueryInsightsTestUtils.java

src/test/java/org/opensearch/plugin/insights/core/service/QueryGroupingServiceTests.java

ansjcy · 2024-08-05T05:30:53Z

Please also add integration tests for this feature - We are already lacking integration test coverage for many features in Query Insights.

deshsidd · 2024-08-05T20:46:33Z

We can run some benchmarks to view the performance here and keep this feature as experimental/beta and also limit the number of groups. If needed we can use an indexed priority queue from here as followup changes. Let me know your thoughts.

ansjcy · 2024-08-06T05:22:44Z

If needed we can use an indexed priority queue as followup changes.

I don't think this is a good idea, the whole algorithm mentioned in opensearch-project/OpenSearch#13357 (comment) is based on using indexed pq to store the groups. Otherwise we are storing all the queries groups, updating / deletion can take O(total possible number of groups) in a worst case scenario, which is not acceptable.

deshsidd · 2024-08-06T20:28:22Z

Made all the required refactoring based on the comments. Highlights include:

Decouple Measurement and MetricType
Ensure NONE aggregation type performs no aggregations.
Refactor unit tests to re-use code whenever applicable
Added one missing edge case in QueryGroupingService algorithm

Only major open question is regarding the java priority queue verses indexed priority queue.

Indexed priority queue has O(logn) updates while with the java PQ it takes O(n) since we have to remove (O(n)) and then re-add (O(logn)). Note that n is the number of groups in a Top N window.
Only viable indexed priority queue I found was from here. Tried including this in gradle and got errors due to the following issue. There seems to be other ways to add this library but not sure we want to pursue these unconventional routes. Furthermore, not sure about the stability and community support for this library.
We can also consider trying to limit the number of groups per window or implementing our own version of indexed PQ to improve the performance.

Lets discuss more and figure out a path forward!

jainankitk

Overall the logic for query grouping seems complex, and I am wondering if there is a way to simplify some of it by making some reasonable assumptions

src/main/java/org/opensearch/plugin/insights/core/listener/QueryInsightsListener.java

src/main/java/org/opensearch/plugin/insights/core/service/QueryShape.java

src/main/java/org/opensearch/plugin/insights/core/service/QueryGroupingService.java

src/main/java/org/opensearch/plugin/insights/core/service/TopQueriesService.java

src/main/java/org/opensearch/plugin/insights/rules/model/AggregationType.java

deshsidd · 2024-08-29T17:30:38Z

Ran some benchmarks to figure out a reasonable number for the cardinality of the groups and here are the results:

logging heavy: http_logs
Here are the number of groups logged at the end of a window cycle over the course of approx 3 days:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 5, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 3, 7, 6, 3, 0, 0, 0, 0, 0, 0, 0, 0, 7, 6, 3, 0, 0, 0, 0, 0, 0, 0, 0, 7, 6, 3, 0]
Maximum: 8
search heavy: nyc_taxis
Here are the number of groups logged at the end of a window cycle over the course of approx 3 days:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 2, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0]
Maximum: 8
custom workload simulating real world traffic:
[0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 4, 0, 0, 0, 0, 0, 0]
Maximum: 7

Note that window size set to : 1 hour

IMHO it might be reasonable to having a setting max_groups to set the maximum number of groups and limit the PQ to that number. If we exceed this number we can drop and add debug logs. The max_groups value should have a validation such that it cannot be set beyond 10,000.

src/main/java/org/opensearch/plugin/insights/core/listener/QueryInsightsListener.java

src/main/java/org/opensearch/plugin/insights/core/service/QueryInsightsService.java

src/main/java/org/opensearch/plugin/insights/core/service/QueryGrouper.java

src/main/java/org/opensearch/plugin/insights/core/service/QueryInsightsService.java

deshsidd · 2024-08-30T18:18:46Z

I personally think we should not even add a record if the feature is disabled. Whenever it is switched on, we start calculating from that point. Otherwise it looks like more of a leak.

If only query metrics is enabled we always add the records.
If only top N is enabled we perform an optimization and skip adding the records if they do not make it to the Top N.
If grouping is enabled we need to add all the records since we cannot perform the optimization above.

Not sure what you are referring to here?

src/main/java/org/opensearch/plugin/insights/core/service/QueryGrouper.java

src/main/java/org/opensearch/plugin/insights/core/service/QueryInsightsService.java

deshsidd · 2024-08-30T22:59:19Z

As discussed added interfaces and implementation for the following:
interface -> implementation

QueryGrouper -> MinMaxHeapQueryGrouper
TopQueriesStore -> PriorityQueueTopQueriesStore

src/main/java/org/opensearch/plugin/insights/core/service/TopQueriesService.java

src/main/java/org/opensearch/plugin/insights/core/service/grouper/QueryGrouper.java

src/main/java/org/opensearch/plugin/insights/core/service/grouper/MinMaxHeapQueryGrouper.java

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

src/main/java/org/opensearch/plugin/insights/settings/QueryInsightsSettings.java

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

ansjcy

The security based integ tests are failing for the change: #85
Let's double check if we are missing anything for this change on the permission side before merging.

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

deshsidd · 2024-09-04T00:52:00Z

The security based integ tests are failing for the change: #85
Let's double check if we are missing anything for this change on the permission side before merging.

Thanks for checking! The integration test PR build is failing due to grouping settings not found. This PR needs to be merged for the builds to pass there. The security ITs are run in the checks for this PR. Also ran the security ITs locally and they are passing.

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

ansjcy · 2024-09-04T14:26:21Z

src/main/java/org/opensearch/plugin/insights/core/service/grouper/MinMaxHeapQueryGrouper.java

+        if (maxHeapQueryStore.size() > 0) {
+            addToMaxPQPromoteToMinPQ(aggregateSearchQueryRecord, groupId);
+        } else {
+            addToMinPQOverflowToMaxPQ(aggregateSearchQueryRecord, groupId);
+        }


Why do we need to do this if/else here? Can't we simply always add to max/min queue and then do a swap top?
Also the "else" logic looks effectiveness to me. If the execution enters this else, that means we have already removed a record from the min queue, and max queue is also empty. So when we do addToMinPQOverflowToMaxPQ we are adding the previous (but updated) record back it to the min queue again and we won't overflow anything to max queue.

ansjcy · 2024-09-04T15:24:30Z

src/main/java/org/opensearch/plugin/insights/core/service/grouper/MinMaxHeapQueryGrouper.java

+    }
+
+    private boolean checkMaxGroupsLimitReached(String groupId) {
+        if (maxGroups <= maxHeapQueryStore.size() && minHeapTopQueriesStore.size() >= topNSize) {


We should emit a metric for this as well.

ansjcy · 2024-09-04T15:37:48Z

src/main/java/org/opensearch/plugin/insights/settings/QueryInsightsSettings.java

+    public static final GroupingType DEFAULT_GROUPING_TYPE = GroupingType.NONE;
+    public static final int DEFAULT_GROUPS_EXCLUDING_TOPN_LIMIT = 100;
+
+    public static final int MAX_GROUPS_EXCLUDING_TOPN_LIMIT = 10000;


How much memory would 10000 records consume based on the benchmark results?

Did not capture this and we did not reach anywhere close to the 10000 limit in the benchmarks.

If a search query record is around 1kb then 10000 groups means we will consume 10mb memory at most for this feature, which should be fine. but we still need to watch out on the memory consumption here for queries with large source.

This analysis seems reasonable but we would need to keep watch out for the memory consumption here.

ansjcy · 2024-09-04T15:38:51Z

IMHO it might be reasonable to having a setting max_groups to set the maximum number of groups and limit the PQ to that number. If we exceed this number we can drop and add debug logs. The max_groups value should have a validation such that it cannot be set beyond 10,000.

The benchmark numbers looks good, I think it's a good idea to confirm the reasonable upper bound for number of groups as well so that we won't consume too much memory.

ansjcy · 2024-09-04T15:56:48Z

src/main/java/org/opensearch/plugin/insights/core/service/grouper/MinMaxHeapQueryGrouper.java

+     * @return return the search query record that represents the group
+     */
+    @Override
+    public SearchQueryRecord add(SearchQueryRecord searchQueryRecord) {


This grouper can be simplied with something like:

public SearchQueryRecord add(SearchQueryRecord searchQueryRecord) { if (!groupIdToAggSearchQueryRecord.containsKey(groupId)) { boolean maxGroupsLimitReached = checkMaxGroupsLimitReached(groupId); if (maxGroupsLimitReached) { return null; } aggregateSearchQueryRecord = searchQueryRecord; aggregateSearchQueryRecord.setGroupingId(groupId); aggregateSearchQueryRecord.setMeasurementAggregation(metricType, aggregationType); addToMinPQ(aggregateSearchQueryRecord, groupId); } else { aggregateSearchQueryRecord = groupIdToAggSearchQueryRecord.get(groupId).v1(); boolean isPresentInMinPQ = groupIdToAggSearchQueryRecord.get(groupId).v2(); if (isPresentInMinPQ) { minHeapTopQueriesStore.remove(aggregateSearchQueryRecord); } else { maxHeapTopQueriesStore.remove(aggregateSearchQueryRecord); } addAndPromote(searchQueryRecord, aggregateSearchQueryRecord, groupId); } return aggregateSearchQueryRecord; } private void addToMinPQ(SearchQueryRecord searchQueryRecord, String groupId) { minHeapTopQueriesStore.add(searchQueryRecord); groupIdToAggSearchQueryRecord.put(groupId, new Tuple<>(searchQueryRecord, true)); overflow(); } private void addAndPromote(SearchQueryRecord searchQueryRecord, SearchQueryRecord aggregateSearchQueryRecord, String groupId) { Number measurementToAdd = searchQueryRecord.getMeasurement(metricType); aggregateSearchQueryRecord.addMeasurement(metricType, measurementToAdd); addToMinPQ(aggregateSearchQueryRecord, groupId); if (maxHeapQueryStore.isEmpty()) { return; } if (SearchQueryRecord.compare(maxHeapQueryStore.peek(), minHeapTopQueriesStore.peak(), metricType) > 0) { SearchQueryRecord recordMovedFromMaxToMin = maxHeapQueryStore.poll(); addToMinPQ(recordMovedFromMaxToMin, recordMovedFromMaxToMin.getGroupingId()); } } private void overflow() { if (minHeapTopQueriesStore.size() > topNSize) { SearchQueryRecord recordMovedFromMinToMax = minHeapTopQueriesStore.poll(); maxHeapQueryStore.add(recordMovedFromMinToMax); groupIdToAggSearchQueryRecord.put(recordMovedFromMinToMax.getGroupingId(), new Tuple<>(recordMovedFromMinToMax, false)); } }

ansjcy · 2024-09-04T15:57:19Z

Overall it looks good and I'm fine approving it. But I still have some concerns and we need follow-ups to resolve them.

Since we are not considering indexed pq in this PR, then removing an element in max pq becomes O(number of groups) operation - and remember this is O(number of groups) per search request so potentially it could be very bad. So the possible number of groups in the real world matters a lot in this case. We should have a metric emited to track this very important information so we can make decisions on whether to increase or decrease the max number of groups limit.
If a search query record is around 1kb then 10000 groups means we will consume 10mb memory at most for this feature, which should be fine. but we still need to watch out on the memory consumption here for queries with large source.
The logic in the grouper is too complicated and refactoring is needed to simplify it.

ansjcy · 2024-09-04T16:02:06Z

src/main/java/org/opensearch/plugin/insights/rules/model/SearchQueryRecord.java

-        measurements = new HashMap<>();
-        in.readMap(MetricType::readFromStream, StreamInput::readGenericValue)
-            .forEach(((metricType, o) -> measurements.put(metricType, metricType.parseValue(o))));
+        if (in.getVersion().onOrAfter(Version.V_2_17_0)) {


I'm wondering why is this needed? SearchQueryRecord is only used internally and we are not providing any clients that could cause version mismatch.

This is used when getting the top queries : https://github.com/opensearch-project/query-insights/blob/main/src/main/java/org/opensearch/plugin/insights/rules/action/top_queries/TopQueries.java#L36

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

deshsidd · 2024-09-04T18:07:04Z

Thanks @ansjcy.

Will add metrics to track the number of groups discarded as a followup
Refactored the logic for grouper

…ity (#66) * Query grouping framework and group by query similarity Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Spotless apply Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Build fix Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Properly configure settings update consumer Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Refactor unit tests Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Decouple Measurement and MetricType Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Aggregate type NONE will ensure no aggregations computed Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Perform renaming Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Integrate query shape library with grouping Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Spotless Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Create and consume string hashcode interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Health checks in code Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Fix tests and spotless apply Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Minor fixes Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Max groups setting and unit tests Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Create query grouper interface and top query store interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Removed unused interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Rebase main and spotless Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Renaming variable Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Remove TopQueriesStore interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Drain top queries service on group change Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Rename max groups setting and allow minimum 0 Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Make write/read from io backword compatible Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Minor fix Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Refactor query grouper Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> --------- Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> (cherry picked from commit 65e4489) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…ity (#66) (#86) (cherry picked from commit 65e4489) Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…ity (#66) * Query grouping framework and group by query similarity Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Spotless apply Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Build fix Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Properly configure settings update consumer Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Refactor unit tests Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Decouple Measurement and MetricType Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Aggregate type NONE will ensure no aggregations computed Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Perform renaming Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Integrate query shape library with grouping Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Spotless Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Create and consume string hashcode interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Health checks in code Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Fix tests and spotless apply Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Minor fixes Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Max groups setting and unit tests Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Create query grouper interface and top query store interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Address review comments Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Removed unused interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Rebase main and spotless Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Renaming variable Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Remove TopQueriesStore interface Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Drain top queries service on group change Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Rename max groups setting and allow minimum 0 Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Make write/read from io backword compatible Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Minor fix Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Refactor query grouper Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> --------- Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> (cherry picked from commit 65e4489) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…ity (#66) (#104) * Query grouping framework and group by query similarity * Spotless apply * Build fix * Properly configure settings update consumer * Address review comments * Refactor unit tests * Decouple Measurement and MetricType * Aggregate type NONE will ensure no aggregations computed * Perform renaming * Integrate query shape library with grouping * Spotless * Create and consume string hashcode interface * Health checks in code * Fix tests and spotless apply * Minor fixes * Max groups setting and unit tests * Address review comments * Address review comments * Create query grouper interface and top query store interface * Address review comments * Removed unused interface * Rebase main and spotless * Renaming variable * Remove TopQueriesStore interface * Drain top queries service on group change * Rename max groups setting and allow minimum 0 * Make write/read from io backword compatible * Minor fix * Refactor query grouper --------- (cherry picked from commit 65e4489) Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

deshsidd requested review from ansjcy, jainankitk and dzane17 as code owners August 2, 2024 18:46

deshsidd changed the title ~~Query grouping framework and group by query similarity~~ Query grouping framework for Top N queries and group by query similarity Aug 2, 2024

ansjcy requested changes Aug 5, 2024

View reviewed changes

deshsidd mentioned this pull request Aug 5, 2024

[Feature Request] [RFC] Grouping similar Top N Queries by Latency and Resource Usage opensearch-project/OpenSearch#13357

Closed

jainankitk reviewed Aug 8, 2024

View reviewed changes

deshsidd force-pushed the sid/query-shape branch 2 times, most recently from f36d2a9 to 886ca2f Compare August 28, 2024 20:26

dzane17 reviewed Aug 30, 2024

View reviewed changes

src/main/java/org/opensearch/plugin/insights/core/listener/QueryInsightsListener.java Show resolved Hide resolved

src/main/java/org/opensearch/plugin/insights/core/service/QueryInsightsService.java Outdated Show resolved Hide resolved

dzane17 approved these changes Aug 30, 2024

View reviewed changes

sgup432 reviewed Aug 30, 2024

View reviewed changes

deshsidd added the backport 2.x label Aug 30, 2024

sgup432 reviewed Aug 30, 2024

View reviewed changes

src/main/java/org/opensearch/plugin/insights/core/service/grouper/MinMaxHeapQueryGrouper.java Outdated Show resolved Hide resolved

sgup432 reviewed Aug 30, 2024

View reviewed changes

src/main/java/org/opensearch/plugin/insights/core/service/grouper/MinMaxHeapQueryGrouper.java Outdated Show resolved Hide resolved

deshsidd added 6 commits August 30, 2024 16:57

Query grouping framework and group by query similarity

c14385f

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

Spotless apply

0a923b8

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

Build fix

f56df30

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

Properly configure settings update consumer

0995a3a

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

Address review comments

af90f5c

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

Refactor unit tests

a337ef0

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

deshsidd mentioned this pull request Sep 3, 2024

Query Grouping Integration Tests #85

Merged

jainankitk reviewed Sep 3, 2024

View reviewed changes

src/main/java/org/opensearch/plugin/insights/settings/QueryInsightsSettings.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/plugin/insights/settings/QueryInsightsSettings.java Outdated Show resolved Hide resolved

Rename max groups setting and allow minimum 0

07ce827

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

ansjcy requested changes Sep 4, 2024

View reviewed changes

Make write/read from io backword compatible

357d2cd

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

jainankitk approved these changes Sep 4, 2024

View reviewed changes

Minor fix

d891909

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

deshsidd requested a review from ansjcy September 4, 2024 01:15

ansjcy reviewed Sep 4, 2024

View reviewed changes

ansjcy approved these changes Sep 4, 2024

View reviewed changes

ansjcy reviewed Sep 4, 2024

View reviewed changes

Refactor query grouper

3493e65

Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com>

ansjcy approved these changes Sep 4, 2024

View reviewed changes

ansjcy merged commit 65e4489 into opensearch-project:main Sep 4, 2024
16 checks passed

opensearch-trigger-bot bot mentioned this pull request Sep 4, 2024

[Backport 2.x] Query grouping framework for Top N queries and group by query similarity #86

Merged

deshsidd mentioned this pull request Sep 5, 2024

[META] Grouping similar Top N Queries by Latency and Resource Usage opensearch-project/OpenSearch#13419

Closed

6 tasks

ansjcy added the backport 2.17 label Sep 5, 2024

opensearch-trigger-bot bot mentioned this pull request Sep 5, 2024

[Backport 2.17] Query grouping framework for Top N queries and group by query similarity #104

Merged

deshsidd mentioned this pull request Sep 11, 2024

Grouping Top N queries documentation opensearch-project/documentation-website#8173

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query grouping framework for Top N queries and group by query similarity #66

Query grouping framework for Top N queries and group by query similarity #66

deshsidd commented Aug 2, 2024 •

edited

Loading

ansjcy left a comment

ansjcy commented Aug 5, 2024

deshsidd commented Aug 5, 2024

ansjcy commented Aug 6, 2024

deshsidd commented Aug 6, 2024

jainankitk left a comment

deshsidd commented Aug 29, 2024

deshsidd commented Aug 30, 2024

deshsidd commented Aug 30, 2024

ansjcy left a comment

deshsidd commented Sep 4, 2024

ansjcy Sep 4, 2024 •

edited

Loading

ansjcy Sep 4, 2024

ansjcy Sep 4, 2024

deshsidd Sep 4, 2024 •

edited

Loading

ansjcy commented Sep 4, 2024

ansjcy Sep 4, 2024 •

edited

Loading

ansjcy commented Sep 4, 2024

ansjcy Sep 4, 2024

deshsidd Sep 4, 2024

deshsidd commented Sep 4, 2024

Query grouping framework for Top N queries and group by query similarity #66

Query grouping framework for Top N queries and group by query similarity #66

Conversation

deshsidd commented Aug 2, 2024 • edited Loading

Description

Issues Resolved

Configure Grouping

Configure Grouping Error Response

Get Top N Queries by latency with grouping enabled, group_by SIMILARITY

Get Top N queries with group_by NONE

ansjcy left a comment

Choose a reason for hiding this comment

ansjcy commented Aug 5, 2024

deshsidd commented Aug 5, 2024

ansjcy commented Aug 6, 2024

deshsidd commented Aug 6, 2024

jainankitk left a comment

Choose a reason for hiding this comment

deshsidd commented Aug 29, 2024

deshsidd commented Aug 30, 2024

deshsidd commented Aug 30, 2024

ansjcy left a comment

Choose a reason for hiding this comment

deshsidd commented Sep 4, 2024

ansjcy Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

ansjcy Sep 4, 2024

Choose a reason for hiding this comment

ansjcy Sep 4, 2024

Choose a reason for hiding this comment

deshsidd Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

ansjcy commented Sep 4, 2024

ansjcy Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

ansjcy commented Sep 4, 2024

ansjcy Sep 4, 2024

Choose a reason for hiding this comment

deshsidd Sep 4, 2024

Choose a reason for hiding this comment

deshsidd commented Sep 4, 2024

deshsidd commented Aug 2, 2024 •

edited

Loading

ansjcy Sep 4, 2024 •

edited

Loading

deshsidd Sep 4, 2024 •

edited

Loading

ansjcy Sep 4, 2024 •

edited

Loading