[BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count #10435

jed326 · 2023-10-05T23:59:04Z

Describe the bug
When both size=0 and terminate_after parameters are provided to a search request, the incorrect total hit count is sometimes returned. This was discovered as a part of the investigation for a flakey test, see #9946 (comment)

To Reproduce
Set size=0 in the search request in the below test.

./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT" -Dtests.method="testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=C37FD3C5A66D02BC -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=he-IL -Dtests.timezone=Asia/Qostanay -Druntime.java=17

Expected behavior
The above test should always pass

The text was updated successfully, but these errors were encountered:

msfroh · 2023-10-11T16:08:11Z

If terminate_after is specified, won't collection terminate after some amount of time? In that case, it won't collect all the hits, so the total hit count may be off?

jed326 · 2023-10-11T16:15:40Z

@msfroh these tests are not flaky whenever size != 0, and specifically the tests assert that hit_count == terminate_after count. size=0 is a special case where the TopDocsCollector is not created since we won't need to return any top docs, however I'm not sure what specifically is causing this flakiness in that code path yet.

Ref:

OpenSearch/server/src/internalClusterTest/java/org/opensearch/search/simple/SimpleSearchIT.java

Lines 301 to 311 in 489de2a

    
           for (int i = 1; i < max; i++) { 
        
               searchResponse = client().prepareSearch("test") 
        
                   .setQuery(QueryBuilders.rangeQuery("field").gte(1).lte(max)) 
        
                   .setTerminateAfter(i) 
        
                   .setSize(size) 
        
                   .setTrackTotalHits(true) 
        
                   .get(); 
        
               assertHitCount(searchResponse, i); 
        
               assertTrue(searchResponse.isTerminatedEarly()); 
        
               assertEquals(Math.min(i, size), searchResponse.getHits().getHits().length); 
        
           }

austintlee · 2023-10-23T00:19:26Z

I have narrowed down the cause of this "issue" to a change introduced in this PR - #4043. (cc @reta)

Specifically, this change:

OpenSearch/server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java

Lines 306 to 308 in 9b7a9d0

    
           // See please https://github.com/apache/lucene/pull/964 
        
           collector.setWeight(weight); 
        
           leafCollector = collector.getLeafCollector(ctx);

That PR brought in a Lucene feature (https://issues.apache.org/jira/browse/LUCENE-10620) that allows you to take advantage of Weights that can speed up Collectors' work during search.

Now, what is really interesting about this change is that when combined with these three (I think ALL three) search conditions - 1) size = 0, 2) terminate_after is set and 3) tracK_total_hits set to true, it hits the following code path where weight.count() returns the total count for a Lucene segment (it's an optimization).

https://github.com/apache/lucene/blob/bbf197fdc2822e9f1ba1767f83ae23037336ed27/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java#L47-L52

public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException {
    int leafCount = weight == null ? -1 : weight.count(context);
    if (leafCount != -1) {
      totalHits += leafCount;
      throw new CollectionTerminatedException();
    }

And, of course, that's exactly the setup for this test. But it is not a bug. I would say it's just incompatible with what you might expect when you set terminate_after, but I think it makes use of the intended optimization without actually violating the "contract" of the terminate_after flag (since it does terminate early and terminatedEarly is set to true in the response as well).

As to the non-deterministic manifestation of this issue, there are two things that pertain to this feature that seem to contribute to non-determinism. One is that the test uses indexRandom to randomly distribute documents and this (by design) leads to a non-deterministic layout of Lucene segments each time. The other is that the Weight objects are stored in an LRUCache. Also, the test makes a range query and this also seems to uniquely contribute to how the Weights are used depending on the layout and sequence of documents IDs within a segment.

reta · 2023-10-23T12:27:25Z

And, of course, that's exactly the setup for this test. But it is not a bug. I would say it's just incompatible with what you might expect when you set terminate_after, but I think it makes use of the intended optimization without actually violating the "contract" of the terminate_after flag (since it does terminate early and terminatedEarly is set to true in the response as well).

Thanks a lot for digging this one in. The issue seems to manifest itself only during the concurrent search scenario which probably makes sense since we may go over segments concurrently (depending on random distribution).

UPD: ~~manifest itself only during the concurrent search scenario~~ not really

Tests with failures:
 - org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCountSize0 {p0={"search.concurrent_segment_search.enabled":"true"}}
 - org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCountSize0 {p0={"search.concurrent_segment_search.enabled":"false"}}

austintlee · 2023-10-23T13:17:25Z

It has nothing to do with concurrent search and I don't know if concurrent search makes it more likely. It has more to do with what's in a segment and how Weight#count is being used.

(Interestingly, it was your concurrent search blog that drew me to this. I thought I'd help with a small contribution.)

reta · 2023-10-23T13:40:50Z

It has nothing to do with concurrent search

Right, corrected myself in #10435 (comment), sadly we didn't have the tests for this specific case before so the issue stayed unnoticed

jed326 · 2023-10-23T21:06:16Z

@austintlee Thanks for looking into this!

But it is not a bug. I would say it's just incompatible with what you might expect when you set terminate_after, but I think it makes use of the intended optimization without actually violating the "contract" of the terminate_after flag (since it does terminate early and terminatedEarly is set to true in the response as well).

I think there's some more discussion needed on what the correct "contract" for terminate_after actually looks like. From the docs it's defined as "The maximum number of documents OpenSearch should process before terminating the request" which to me sounds like if I pass a specific terminate_after value in a search request, I would expect the total hit count have a upper bound of the terminate_after value if the terminated_early is true, which is not necessarily true in the edge case you've identified here.

This is something that we've struggled with for concurrent search as well -- it's not straightforward to have concurrent search early terminate at the exact terminate_after count either. See #8371

Curious to hear your thoughts on this too @sohami

msfroh · 2023-10-24T14:30:33Z

From the docs it's defined as "The maximum number of documents OpenSearch should process before terminating the request" which to me sounds like if I pass a specific terminate_after value in a search request, I would expect the total hit count have a upper bound of the terminate_after value if the terminated_early is true, which is not necessarily true in the edge case you've identified here.

But it's not processing that many docs -- it's using the count returned from the Weight. If the Weight returned a count, it did so in O(1) time. See https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/search/Weight.html#count(org.apache.lucene.index.LeafReaderContext):

The default implementation returns -1 for every query. This indicates that the count could not be computed in O(1) time.

So, if the Weight returned a value other than -1, it did so in O(1) time, without processing documents. The contract is still upheld.

sohami · 2023-10-26T05:55:56Z

Sorry for delayed response.

@msfroh Even though the weight.count optimization is O(1) operation but it is counting more hits than what user expected and not really holding the terminate_after contract. I think hit_count should reflect the number of processed documents as well. Is there any other case where hit_count is returned more than processed documents ? This can be even more confusing when aggregators are in picture in the query.

Lets assume there is a single leaf with 21 matches. So the collectors for this test query is of type: EarlyTerminatingCollector(maxHitCount= say 2) ---> TotalHitCountCollector(). If the optimization for weight.count() kicks in, then TotalHitCountCollector will record that value while creating leafCollector itself and will not participate in collection anymore so leaving EarlyTerminatingCollector to process the collection on this leaf docs. EarlyTerminatingCollector will terminate the collection after 3rd match but total hits returned will be 21 in this case.

Now assume there is an Aggregator in the mix like histo aggs what will happen is collector tree will be of form: EarlyTerminatingCollector(maxHitCount= say 2) ---> TotalHitCountCollector() ---> BucketCollector(histo). Again the TotalHitCountCollector will say hit count is 21 and will get removed from leaf collectors chain leaving behind EarlyTerminatingCollector ---> BucketCollector. Then after collecting 2 docs the query will be terminated. To the user, it will see only 2 docs collected in aggs output but hitcount will be 21.

I guess the question is in case of terminate_after + track_total_hits=true scenario what is the point of returning this extra hit count above terminate_after value. To user it can be confusing because somehow OS is able to find total docs matching the query which is greater than terminate_after but aggs or any collector in the chain has only collected at max terminated_after documents. I tried with sample tests and here is the response:

{"took":148625,"timed_out":false,"terminated_early":true,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":21,"relation":"eq"},"max_score":null,"hits":[]},"aggregations":{"histo":{"buckets":[{"key":15.0,"doc_count":1},{"key":16.0,"doc_count":0},{"key":17.0,"doc_count":0},{"key":18.0,"doc_count":0},{"key":19.0,"doc_count":0},{"key":20.0,"doc_count":0},{"key":21.0,"doc_count":1}]}}}

austintlee · 2023-10-26T13:45:17Z

@sohami Did you run that query with size = 0? Also, keep in mind that the specific instance of this problem doesn't always occur so I am not sure what conclusion we can draw from the response you posted above...

I think we should document this behavior as best we can. I don't think anyone is proposing we make further code changes to change this behavior.

msfroh · 2023-10-26T16:37:07Z

IMO, we should just clearly document that the purpose of terminate_after is to avoid iterating through too many documents -- it's essentially a way of saying "I'm willing to accept approximately-correct results in order to limit the cost of my query", not "I would like incorrect results and will be offended if you give me the correct answer efficiently".

sohami · 2023-10-26T21:51:57Z

@sohami Did you run that query with size = 0? Also, keep in mind that the specific instance of this problem doesn't always occur so I am not sure what conclusion we can draw from the response you posted above...

@austintlee Yes the query was with size=0. I just updated same query in test to include the aggregation. It will not occur if the optimization to fetch the hit count via weight.count doesn't kick in which depends upon the weights created for the query.

I think we should document this behavior as best we can. I don't think anyone is proposing we make further code changes to change this behavior.

Thats what I am discussing here if we need to handle this case separately instead of just keeping the current behavior with proper documentation. Seems like current alignment is to document it properly

IMO, we should just clearly document that the purpose of terminate_after is to avoid iterating through too many documents

I guess then in documentation we should be very clear that hits.count in response is not controlled by terminate_after. Later controls collecting/processing number of the matched document whereas former is count of matching docs and both the information can be fetched independently for each shard ?

austintlee · 2023-10-27T03:54:58Z

Later controls collecting/processing number of the matched document whereas former is count of matching docs and both the information can be fetched independently for each shard ?

"Independently" would be mischaracterization as they are still correlated.

How about:

The latter (spelling correction) controls collecting/processing number of the matched document whereas former is count of matching docs and the actual values of hits.count and terminate_after can differ when Weight.count, a sub-linear time counting optimization, contributes to a higher count than is needed to satisfy the counting requirement set by terminate_after. Put another way, OpenSearch can yield higher output (hits.count) while maintaining the same input/cost (terminate_after) thanks to the aforementioned optimization.

jed326 · 2023-11-08T22:11:08Z

Apologies for the super delayed response here. I'm also in agreement that the best path forward is to document this difference. In addition to the reasons previously discussed, this "issue" has been present since OpenSearch 2.2.0 and there doesn't seem to have been any complaints about this so adding an explanation to the docs seems sufficient.

I'm happy to close out this issue with PR #10836 from @austintlee -- I will create a docs issue to follow up on the documentation updates.

jed326 · 2023-11-08T22:29:35Z

Docs update issue:

[DOC] Update terminate_after documentation with explanation of edge cases when using size=0 and track_total_hits=true documentation-website#5553

jed326 added bug Something isn't working untriaged Search Search query, autocomplete ...etc labels Oct 5, 2023

This was referenced Oct 6, 2023

[BUG] org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"true"}} #9946

Closed

Avoid size=0 for terminate_after tests #10436

Merged

msfroh removed the untriaged label Oct 11, 2023

austintlee mentioned this issue Oct 23, 2023

Fix a flaky test that relies on terminate_after for an exact count match #10836

Closed

7 tasks

yigithub assigned austintlee Oct 25, 2023

jed326 mentioned this issue Nov 8, 2023

[DOC] Update terminate_after documentation with explanation of edge cases when using size=0 and track_total_hits=true opensearch-project/documentation-website#5553

Open

4 tasks

This was referenced Feb 6, 2024

Fix a flaky test that relies on terminate_after for an exact count match #12178

Closed

Fix a flaky test that relies on terminate_after for an exact count ma… #12179

Merged

reta closed this as completed in #12179 Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count #10435

[BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count #10435

jed326 commented Oct 5, 2023

msfroh commented Oct 11, 2023

jed326 commented Oct 11, 2023 •

edited

Loading

austintlee commented Oct 23, 2023

reta commented Oct 23, 2023 •

edited

Loading

austintlee commented Oct 23, 2023 •

edited

Loading

reta commented Oct 23, 2023

jed326 commented Oct 23, 2023

msfroh commented Oct 24, 2023

sohami commented Oct 26, 2023

austintlee commented Oct 26, 2023

msfroh commented Oct 26, 2023

sohami commented Oct 26, 2023

austintlee commented Oct 27, 2023

jed326 commented Nov 8, 2023

jed326 commented Nov 8, 2023

[BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count #10435

[BUG] Search requests with size=0 & terminate_after sometimes return incorrectly hit count #10435

Comments

jed326 commented Oct 5, 2023

msfroh commented Oct 11, 2023

jed326 commented Oct 11, 2023 • edited Loading

austintlee commented Oct 23, 2023

reta commented Oct 23, 2023 • edited Loading

austintlee commented Oct 23, 2023 • edited Loading

reta commented Oct 23, 2023

jed326 commented Oct 23, 2023

msfroh commented Oct 24, 2023

sohami commented Oct 26, 2023

austintlee commented Oct 26, 2023

msfroh commented Oct 26, 2023

sohami commented Oct 26, 2023

austintlee commented Oct 27, 2023

jed326 commented Nov 8, 2023

jed326 commented Nov 8, 2023

jed326 commented Oct 11, 2023 •

edited

Loading

reta commented Oct 23, 2023 •

edited

Loading

austintlee commented Oct 23, 2023 •

edited

Loading