[Enhancement] only call getAliveComputeNodes once per OlapScanNode (backport #52168) #52267
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
I found some queries which were slow (order of 3-5 seconds) which were bottlenecked in the frontend. Their query profiles indicated that much of the query execution time was spent planning. I did some jstack profiling of the frontends while sending this type of query, see jstack_example9.txt for an example of the profile. The takeaway is that the large majority of threads were busy doing
WarehouseManager.getAliveComputeNodes
fromOlapScanNode.addScanRangeLocations
, just to check if there are any living compute nodes. This is done once perPhysicalPartition
, even though the check for living CN is not parameterized by anything other than warehouse id. This is wasteful and seriously slow when there are large partition/tablet counts. We can eliminate this bottleneck.What I'm doing:
Ensuring that
getAliveComputeNodes
is called once per instance of OlapScanNode (once per query).This can be seen as a follow up to #46913
Fixes #issue
What type of PR is this:
This is a fix for a performance issue, which I'll call an enhancement.
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request #52168 done by [Mergify](https://mergify.com). ## Why I'm doing:
I found some queries which were slow (order of 3-5 seconds) which were bottlenecked in the frontend. Their query profiles indicated that much of the query execution time was spent planning. I did some jstack profiling of the frontends while sending this type of query, see jstack_example9.txt for an example of the profile. The takeaway is that the large majority of threads were busy doing
WarehouseManager.getAliveComputeNodes
fromOlapScanNode.addScanRangeLocations
, just to check if there are any living compute nodes. This is done once perPhysicalPartition
, even though the check for living CN is not parameterized by anything other than warehouse id. This is wasteful and seriously slow when there are large partition/tablet counts. We can eliminate this bottleneck.What I'm doing:
Ensuring that
getAliveComputeNodes
is called once per instance of OlapScanNode (once per query).This can be seen as a follow up to #46913
Fixes #issue
What type of PR is this:
This is a fix for a performance issue, which I'll call an enhancement.
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist: