Improving VM conversion performance. #18957
Merged
+261
−185
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The major change here is using a precomputed import table in VM conversion patterns. This removes the symbol lookup that was happening on each call. In models with 100k calls to imports this speeds things up a lot.
Also squashed a few more perf issues involving symbol lookups while profiling and made some passes that could nest on function-like ops do so.
These changes drop VM translation of the 405b model from 3.5mins to ~1.5min. Disabling verification (
-verify-each=0
to iree-opt or-verify=false
to iree-compile) takes it to 1min.Remaining work is mostly around parallelizing some passes that are not trivially parallelizable (FoldGlobals, DropUnusedCalls, etc) and parallelizing some analysis (Explorer global init, call graph walking) that tends to get real expensive when there are 250k calls and 500k ops. Any place that does a symbol use walk is going to suffer. Many of these fixes are in our code but there's several upstream components that fall over with this amount of IR (CallGraph, DataFlowSolver, the verifier, etc).