-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-16879: add queue for expensive admin operations #1761
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some little comments but overall looks good
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java
Outdated
Show resolved
Hide resolved
Once solr/CHANGES.txt is updated, I think it's ready to merge |
While testing this internally, I found a minor issue (will push a fix soon). The flag to set some operation as expensive is not correctly passed. |
Hi @dsmiley, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on how this bug slipped by our notice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea I want to write down...
One way to look at this problem, is that maybe "async" should only be used for potentially expensive things. If we agreed on this, then the essential change would only be to make CoreAdminHandler.parallelExecutor's thread pool configurable. Secondarily, any place within Solr a core admin request is made, we would ask ourselves, is this cheap or expensive? Today, the SolrCloud commands (which in turn invoke these core commands) often use async or not async based on its caller's choice (i.e. the user) but (A) this is harder -- sometimes dual code paths, and (B) I think internally the commands know what's potentially expensive, and need not internally use async just because async is used at a high level. In other words, just because a user might choose to create a collection in the async style doesn't mean we are compelled to internally use async for the core admin operations to complete the task.
@@ -506,30 +507,23 @@ private static NamedList<Object> waitForCoreAdminAsyncCallToComplete( | |||
} | |||
|
|||
String r = (String) srsp.getSolrResponse().getResponse().get("STATUS"); | |||
if (r.equals("running")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we change this to a switch
statement? Even IntelliJ is recommending this :-) It seems an underlying bug here would have been averted had a switch statement been used as it sort of forces you to consider all the enum values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't check an enum, but string values.
I often feel a switch
statement on strings is error prone, and doesn't protect from regressions.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting the switch cases should be on the enum instances, not string literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may miss something. Do you suggest to introduce a new enum for this? I don't think there is one for request statuses.
Here PENDING
is just a constant string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.apache.solr.client.solrj.response.RequestStatusState
but it looks like I confused collection async status with core async status. Too bad they don't use the same enum!
read through this, I'm not really familiar with any of this code but the change looks good to me. another one for my personal understanding: when the status of a task is reported as |
I was unable to write a unit that actually do an expensive operation (backup or restore). The low-level queue mechanism is well tested, but I'm not sure how to write an end-to-end test with expensive operations at scale.
Yes, that would be another option.
You're correct, RUNNING status means 'sent to executor'. I added the |
I like Alex's idea of a second thread pool / executor; I think it would mean we would not need the explicit expensiveTaskQueue that is fiddly to work with and could rely on the more familiar Executor (that has an impl using a queue but it's not our code to worry about). But I don't have a strong opinion.
|
Closed in favor of #1864 |
https://issues.apache.org/jira/browse/SOLR-16879
Description
Solution
Add queue for expensive admin operations
CoreAdminHandler
, we count the number of in-flight expensive operations. If more than the limit (currently 5 by default) are already in-flight, we don't submit any new ones to the thread pool, but we add them into a queue.Tests
Added in
CoreAdminHandlerTest
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.