Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force merge with only_expunge_deletes should honor max segment size #7644

Closed
msfroh opened this issue May 19, 2023 · 2 comments · Fixed by #10036
Closed

Force merge with only_expunge_deletes should honor max segment size #7644

msfroh opened this issue May 19, 2023 · 2 comments · Fixed by #10036
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing v2.12.0 Issues and PRs related to version 2.12.0

Comments

@msfroh
Copy link
Collaborator

msfroh commented May 19, 2023

Is your feature request related to a problem? Please describe.
There is longstanding logic that ignores the max segment size when doing any kind of forced merge, here https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/OpenSearchTieredMergePolicy.java#L59.

This makes sense when you have an index that stops receiving updates, has been switched to read-only, and you want to merge down to a single segment.

There are times, though, when someone just wants to periodically expunge deletes, because soft deletes mean that they have accumulated way too many deleted docs. As mentioned in e.g. the answer to this Stackoverflow question, this may result in segments that are larger than the max segment size, so they will never merge away deletes going forward (unless there are so many deletes that the segment could drop back below the max segment size).

Describe the solution you'd like
I would like a force merge with only_expunge_deletes to honor the max segment size.

The simple way to do that is to change this line to use regularMergePolicy instead of forcedMergePolicy.

Describe alternatives you've considered
I guess we could add another option to the force merge API to say "merge away deletes, but still honor the max segment size" (or even just add a general "honor the max segment size" argument).

In my opinion, only_expunge_deletes already implies that I just want to get rid of deletes and am not trying to merge to one big segment.

Additional context
N/A

@msfroh msfroh added enhancement Enhancement or improvement to existing feature or request untriaged labels May 19, 2023
@andrross andrross added the Indexing Indexing, Bulk Indexing and anything related to indexing label May 30, 2023
@msfroh msfroh self-assigned this Sep 13, 2023
@andrross
Copy link
Member

This makes sense when you have an index that stops receiving updates, has been switched to read-only, and you want to merge down to a single segment.

Does supplying max_num_segments=1 accomplish this explicitly?

Is there some other reason why the current behavior of only_expunge_deletes might be desirable? I agree with your analysis but it seems odd it wasn't implement the way you've proposed here from the beginning.

@msfroh
Copy link
Collaborator Author

msfroh commented Sep 15, 2023

Does supplying max_num_segments=1 accomplish this explicitly?

Yeah, specifying max_num_segments will really merge down to a single segment, ignoring the configured max segment size.

When Lucene made the changes to ensure that any force-merge would honor the max segment size constraint, it broke that behavior. That's why this composite merge policy was added -- to get back that old "merge down to a single segment regardless of max segment size" behavior.

Is there some other reason why the current behavior of only_expunge_deletes might be desirable? I agree with your analysis but it seems odd it wasn't implement the way you've proposed here from the beginning.

I could be mistaken, but I think there was a desire to just get back the old force-merge behavior, essentially undoing the Lucene change from https://issues.apache.org/jira/browse/LUCENE-7976.

In my opinion, this was wrong. I love the singleton-merging of large segments (to expunge deletes) that https://issues.apache.org/jira/browse/LUCENE-7976 introduced.

@msfroh msfroh added the v2.12.0 Issues and PRs related to version 2.12.0 label Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing v2.12.0 Issues and PRs related to version 2.12.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants