Move more pass from Flow stage to GlobalOptimization stage. #14707

hanhanW · 2023-08-16T20:16:11Z

Move four more passes to GlobalOptimization stage.
- ConvertElementwiseToLinalgPass
- GeneralizeLinalgNamedOpsPass
- FuseDequantizationMatmulPass
- FoldUnitExtentDimsPass
Move Flow transformation_pipeline.mlir test to GlobalOptimization/test. It is mainly for testing ConvertElementwiseToLinalg pass which is also tested upstream. We probably can remove it as a follow-up.

github-actions · 2023-08-16T21:33:23Z

Abbreviated Benchmark Summary

@ commit 96335ad4164fa6c47702f65a2f51fde33430d79f (vs. base 31a51206afaccd6293c1e7a77e5b9c2ebdcaff7e)

Regressed Latencies 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV2\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-4[big-core]	14.671 (vs. 13.529, 8.44%↑)	14.064	1.334
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with zeros @ pixel-6-pro[gpu]	78.804 (vs. 74.269, 6.11%↑)	78.777	0.563

Improved Latencies 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_sync(embedded\_elf)[full-inference,default-flags] with zeros @ pixel-6-pro[little-core]	65.093 (vs. 73.163, 11.03%↓)	65.090	0.030
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-4[big-core]	5101.562 (vs. 5687.836, 10.31%↓)	5139.404	146.600
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-4[big-core]	991.911 (vs. 1090.495, 9.04%↓)	988.962	17.841

[Top 3 out of 10 results showed]

Improved Stream IR Dispatch Count (# of cmd.dispatch ops) 🎉

Benchmark Name	Stream IR Dispatch Count (# of cmd.dispatch ops)
Unet2dPT(linalg) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats]	958 (vs. 980, 2.24%↓)

For more information:

Source Workflow Run

hanhanW · 2023-08-16T21:41:58Z

The result is interesting... overall is positive to me.

MaheshRavishankar · 2023-08-16T21:48:10Z

I'd try to triage the difference in the number of dispatches created.

MaheshRavishankar

I think needs a bit of triage on the number of dispatches created.

hanhanW · 2023-08-17T22:45:13Z

Agree that we need more investigation. I was just trying to see what's happening in this case. I'd like to scope it moving SetEncoding to FlowPreprocessing stage. (The invesgitation can happen when we work on improving const-eval heuristic. At that point we need to do some basic fusion and move some passes to preprocessing stage.)

MaheshRavishankar · 2023-08-21T19:46:01Z

compiler/src/iree/compiler/Dialect/Flow/Transforms/Passes.cpp

-      .addPass(IREE::Flow::createConvert1X1FilterConv2DToMatmulPass);
-  passManager.addPass(IREE::Flow::createEraseUnusedLinalgOperands());
+      .addPass(IREE::Flow::createConvert1X1FilterConv2DToMatmulPass)
+      .addPredicatedPass(clEnableDataTiling, createSetEncodingPass);

  // Start of Flow pipeline, verify input legality.
  passManager.addPass(IREE::Flow::createVerifyInputLegalityPass());


Do you want to move this also before the preprocessing passes?

hanhanW · 2023-08-28T21:25:56Z

@MaheshRavishankar We can't move raising special op pass before global optimization now. Because it introduces huge regression to CUDA. Something falls into sequential computation. So far we can move few passes to global optimization phase. Please take a look if it is okay to move them to global optimization phase. It helps us enable const-eval for data tiling (i.e., #14792) because we wont need to handle rank-reduced cases. We can run FoldUnitExtentDimsPass pass before SetEncoding pass.

benvanik · 2023-08-28T21:28:13Z

(wonder if this is CUDA transform dialect matchers or some other CUDA-specific patterns special cased on rank or something?)

hanhanW · 2023-08-28T21:42:18Z

(wonder if this is CUDA transform dialect matchers or some other CUDA-specific patterns special cased on rank or something?)

I don't know at this moment. :) My take is that RaiseSpecialOps pass is a preprocessing for fusion. If we move it before const-eval, const-eval could break the behavior. It hoists some ops to globals, which breaks the graph. So the matcher no longer work. It is one of passes that should be run right before fusion.

qedawkins · 2023-08-28T21:47:34Z

Can we run RaiseSpecialOps in multiple places? I could see it being worth running both before and after FoldUnitExtentDims/other passes.

benvanik · 2023-08-28T21:56:48Z

RaiseSpecialOps indeed would be useful to run at various stages, including possibly as part of a fixed point iteration around many such passes.

qedawkins · 2023-08-28T21:58:19Z

Cool, that matches my thinking about the pass as well. It should basically contain patterns that are always worth applying, but we don't know exactly when they're going to apply.

MaheshRavishankar · 2023-08-28T22:12:24Z

I'd rather wait for thigns to stabilize before we run them multiple times.....

MaheshRavishankar

Ok, this is fine for now.

hanhanW · 2023-08-28T22:14:30Z

I think we can run it multiple times, if we do know where we want to apply them. I'm not going to study if we can put it to multiple places or not in this PR. The intention is moving more passes to global optimization phase if it makes sense.

stellaraccident · 2023-08-29T05:39:31Z

Note that this increased the amount of constant folding by 20-30x on llama2: it is memorizing a bunch of collapse_shapes mostly, it seems. We may want to deny-list collaspe_shape as constexpr leaf nodes because there is seldom any real value in constevaling a metadata change.

Does not seem to have made any significant change to latency, either of runtime or compile time (except for one compile time outlier that I assume was a fluke).

benvanik approved these changes Aug 16, 2023

View reviewed changes

hanhanW marked this pull request as ready for review August 16, 2023 21:41

hanhanW requested a review from MaheshRavishankar as a code owner August 16, 2023 21:41

MaheshRavishankar requested changes Aug 16, 2023

View reviewed changes

hanhanW force-pushed the flow-shuffle branch from 7c6af58 to eec3e26 Compare August 18, 2023 04:03

hanhanW changed the title ~~[Flow] Move more passes to FlowPreprocessing stage.~~ [Flow] Move SetEncoding pass to FlowPreprocessing stage. Aug 18, 2023

MaheshRavishankar reviewed Aug 21, 2023

View reviewed changes

hanhanW force-pushed the flow-shuffle branch from eec3e26 to aa33602 Compare August 24, 2023 21:58

hanhanW changed the title ~~[Flow] Move SetEncoding pass to FlowPreprocessing stage.~~ Move more pass from Flow stage to GlobalOptimization stage. Aug 24, 2023

hanhanW added 3 commits August 25, 2023 13:57

Move more passes to GlobalOptimization

36a91bc

move them after expand_shape

58113c7

test for CUDA

39a67e2

hanhanW force-pushed the flow-shuffle branch from f0e7ad8 to 39a67e2 Compare August 25, 2023 20:58

hanhanW added 5 commits August 25, 2023 15:36

move some passes back to flow

e693d61

move two more passes to GlobalOptimization

cad7586

add cleanup passes

49a9eb7

Move createInterchangeGenericOpsPass back to Flow

561ff91

Move RaiseSpeicial pass back to Flow

759e5e3

hanhanW requested a review from MaheshRavishankar August 28, 2023 21:26

hanhanW added 2 commits August 28, 2023 14:53

fix lit tests

9d0b731

fix enforce_glob

3e492c3

hanhanW added 2 commits August 28, 2023 15:06

update testing pipeline name

35b0a49

Merge branch 'main' into flow-shuffle

097ad2a

MaheshRavishankar approved these changes Aug 28, 2023

View reviewed changes

hanhanW added 2 commits August 28, 2023 15:22

fix lit tests

5fce904

fix tools/test/compile_pipelines.mlir test

8321a1e

hanhanW merged commit afa74de into iree-org:main Aug 28, 2023
58 checks passed

hanhanW deleted the flow-shuffle branch August 28, 2023 23:43

stellaraccident mentioned this pull request Aug 29, 2023

[consteval] Reshuffling flow/global opt passes caused large increase in neutral consteval transactions #14865

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move more pass from Flow stage to GlobalOptimization stage. #14707

Move more pass from Flow stage to GlobalOptimization stage. #14707

hanhanW commented Aug 16, 2023 •

edited

Loading

github-actions bot commented Aug 16, 2023 •

edited

Loading

hanhanW commented Aug 16, 2023

MaheshRavishankar commented Aug 16, 2023

MaheshRavishankar left a comment

hanhanW commented Aug 17, 2023

MaheshRavishankar Aug 21, 2023

hanhanW commented Aug 28, 2023

benvanik commented Aug 28, 2023

hanhanW commented Aug 28, 2023

qedawkins commented Aug 28, 2023

benvanik commented Aug 28, 2023

qedawkins commented Aug 28, 2023

MaheshRavishankar commented Aug 28, 2023

MaheshRavishankar left a comment

hanhanW commented Aug 28, 2023

stellaraccident commented Aug 29, 2023

Move more pass from Flow stage to GlobalOptimization stage. #14707

Move more pass from Flow stage to GlobalOptimization stage. #14707

Conversation

hanhanW commented Aug 16, 2023 • edited Loading

github-actions bot commented Aug 16, 2023 • edited Loading

Abbreviated Benchmark Summary

Regressed Latencies 🚩

Improved Latencies 🎉

Improved Stream IR Dispatch Count (# of cmd.dispatch ops) 🎉

hanhanW commented Aug 16, 2023

MaheshRavishankar commented Aug 16, 2023

MaheshRavishankar left a comment

Choose a reason for hiding this comment

hanhanW commented Aug 17, 2023

MaheshRavishankar Aug 21, 2023

Choose a reason for hiding this comment

hanhanW commented Aug 28, 2023

benvanik commented Aug 28, 2023

hanhanW commented Aug 28, 2023

qedawkins commented Aug 28, 2023

benvanik commented Aug 28, 2023

qedawkins commented Aug 28, 2023

MaheshRavishankar commented Aug 28, 2023

MaheshRavishankar left a comment

Choose a reason for hiding this comment

hanhanW commented Aug 28, 2023

stellaraccident commented Aug 29, 2023

hanhanW commented Aug 16, 2023 •

edited

Loading

github-actions bot commented Aug 16, 2023 •

edited

Loading