SOLR-16871: Fix for duplicated replica added from first coordinator node #1794
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://issues.apache.org/jira/browse/SOLR-16871
Description
PR #1762 fixes various race condition for coordinator node. One of the fixes restricts the name of the synthetic core to ensure at most 1 core is created per coordinator node.
Unfortunately unit test cases failed because it could still add 2 cores for the first coordinator node (1st from collection creation with default naming scheme, and then 2nd from addReplica call with a different name)
For example this is the list of replicas on a failed run of
TestCoordinatorRole#testConcurrentAccess
which is supposed to only create 4 cores, one of each for the 4 coordinator nodes:Solution
Instead of restricting the core name, which is hard to get it right, perhaps we can synchronize the replica block. This block should be rarely called - only once per collection on first query after node start, and the replica creation is even less frequent - only very first time on a coordinator node that encounters a new config. So I think it's probably better to simply synchronize the block.
Tests
Re-ran the test cases 10 times and ensure that they all passed
./gradlew :solr:core:beast -Ptests.dups=10 --tests "org.apache.solr.search.TestCoordinatorRole.testConcurrentAccess"
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.