Improving VM conversion performance. #18957

benvanik · 2024-10-30T20:05:44Z

The major change here is using a precomputed import table in VM conversion patterns. This removes the symbol lookup that was happening on each call. In models with 100k calls to imports this speeds things up a lot.

Also squashed a few more perf issues involving symbol lookups while profiling and made some passes that could nest on function-like ops do so.

These changes drop VM translation of the 405b model from 3.5mins to ~1.5min. Disabling verification (-verify-each=0 to iree-opt or -verify=false to iree-compile) takes it to 1min.

Remaining work is mostly around parallelizing some passes that are not trivially parallelizable (FoldGlobals, DropUnusedCalls, etc) and parallelizing some analysis (Explorer global init, call graph walking) that tends to get real expensive when there are 250k calls and 500k ops. Any place that does a symbol use walk is going to suffer. Many of these fixes are in our code but there's several upstream components that fall over with this amount of IR (CallGraph, DataFlowSolver, the verifier, etc).

The major change here is using a precomputed import table in VM conversion patterns. This removes the symbol lookup that was happening on each call. In models with 100k calls to imports this speeds things up a lot. Squashed a few more perf issues involving symbol lookups while profiling. These changes drop VM translation of the 405b model from 3.5mins to ~1.5min. Disabling verification (`-verify-each=0` to iree-opt or `-verify=false` to iree-compile) takes it to 1min. Remaining work is mostly around parallelizing some passes that are not trivially parallelizable (FoldGlobals, DropUnusedCalls, etc) and parallelizing some analysis (Explorer global init, call graph walking) that tends to get real expensive when there are 250k calls and 500k ops. Any place that does a symbol use walk is going to suffer. Many of these fixes are in our code but there's several upstream components that fall over with this amount of IR (CallGraph, DataFlowSolver, the verifier, etc).

ScottTodd · 2024-10-30T21:07:19Z

👀 tagging #11994 on this, and the notes about verification also point to #12095. Thanks for improving compilation time!

ScottTodd

LGTM when tests pass

benvanik added compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) performance ⚡ Performance/optimization related work across the compiler and runtime labels Oct 30, 2024

benvanik requested review from stellaraccident and ScottTodd October 30, 2024 20:05

ScottTodd approved these changes Oct 30, 2024

View reviewed changes

benvanik added 2 commits October 30, 2024 14:52

Fixing some import call corner cases.

27b9322

Simplifying weird logic.

731444c

benvanik marked this pull request as ready for review October 30, 2024 23:12

benvanik requested review from hanhanW, MaheshRavishankar and IanWood1 as code owners October 30, 2024 23:12

benvanik merged commit 2ec9017 into main Oct 30, 2024
39 checks passed

benvanik deleted the users/benvanik/vm-conversion-perf branch October 30, 2024 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving VM conversion performance. #18957

Improving VM conversion performance. #18957

benvanik commented Oct 30, 2024

ScottTodd commented Oct 30, 2024

ScottTodd left a comment

Improving VM conversion performance. #18957

Improving VM conversion performance. #18957

Conversation

benvanik commented Oct 30, 2024

ScottTodd commented Oct 30, 2024

ScottTodd left a comment

Choose a reason for hiding this comment