[DNS] Test batch mmt4d #14542

pzread · 2023-08-01T18:26:39Z

No description provided.

github-actions · 2023-08-09T19:42:39Z

Abbreviated Benchmark Summary

@ commit 6c18616a89ce611893e00b192f672268817f64fc (vs. base 62f52876418b68945c4c75c763e8f4fcd35a9276)

Improved Latencies 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	20.780 (vs. 23.142, 10.21%↓)	20.783	0.068

Regressed Stream IR Dispatch Count (# of cmd.dispatch ops) 🚩

Benchmark Name	Stream IR Dispatch Count (# of cmd.dispatch ops)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel,compile-stats]	823 (vs. 751, 9.59%↑)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel,compile-stats]	416 (vs. 380, 9.47%↑)

For more information:

Source Workflow Run

pzread · 2023-08-11T16:04:26Z

An initial results on large models running with data tiling + ukernel on c2-standard-60 after vs. before this change. Each benchmark is running on c2-standard-60 for 50 iterations.

We see sligthly regressions on BertLarge and T5 models. I'll need to look deeper to find out the root causes.

ResNet50 VMFBs are identical before and after the change as expected, so their values can be seen as the references for stability of VM.

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
T5LargeTFBatch32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	36637.602 (vs. 33026.190, 10.93%↑)	36617.765	106.204
T5LargeTFBatch16(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	18432.568 (vs. 16764.591, 9.95%↑)	18414.192	56.613
T5LargeTFBatch1(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	1839.990 (vs. 1737.543, 5.90%↑)	1839.125	7.967
BertLargeTFBatch64(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	22836.697 (vs. 21648.951, 5.49%↑)	22822.126	77.858
BertLargeTFBatch32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	12265.789 (vs. 11760.119, 4.30%↑)	12249.995	50.335
BertLargeTFBatch1(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	455.370 (vs. 434.912, 4.70%↑)	454.843	3.093
Resnet50TFBatch128(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	3637.981 (vs. 3556.597, 2.29%↑)	3631.393	32.384
Resnet50TFBatch64(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	1853.184 (vs. 1812.858, 2.22%↑)	1848.911	25.854
Resnet50TFBatch1(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	34.068 (vs. 33.796, 0.80%↑)	33.869	0.478

This reverts commit a0da52a.

pzread force-pushed the batch-matmul-test branch from a1b2ae4 to d6b0730 Compare August 9, 2023 18:17

pzread added the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 9, 2023

pzread force-pushed the batch-matmul-test branch from d6b0730 to bcc14d2 Compare August 9, 2023 19:04

pzread mentioned this pull request Aug 11, 2023

Support data tiling + microkernels for batch_matmul #14431

Closed

pzread force-pushed the batch-matmul-test branch 5 times, most recently from 210f7a4 to f187f1f Compare August 15, 2023 22:57

pzread removed the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 15, 2023

pzread force-pushed the batch-matmul-test branch 11 times, most recently from 7159e5d to e11c6b4 Compare August 17, 2023 22:48

pzread force-pushed the batch-matmul-test branch 3 times, most recently from 1c4e61a to 588591b Compare August 23, 2023 08:11

pzread added the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 23, 2023

pzread force-pushed the batch-matmul-test branch 3 times, most recently from 9cc4914 to 8e7011a Compare August 23, 2023 21:03

pzread force-pushed the batch-matmul-test branch 3 times, most recently from 46c4e62 to 3726959 Compare August 28, 2023 20:38

Jerry Wu added 2 commits August 29, 2023 04:58

Set encoding for batch_matmul

23202a2

Drop unit dims

a0da52a

pzread force-pushed the batch-matmul-test branch from ab75999 to f49d76e Compare August 29, 2023 05:45

Check LHS batch pack

a170aa5

pzread force-pushed the batch-matmul-test branch from f49d76e to a170aa5 Compare August 29, 2023 07:22

Revert "Drop unit dims"

4cc440b

This reverts commit a0da52a.

pzread removed the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 30, 2023

pzread force-pushed the batch-matmul-test branch from 68400ea to 857af86 Compare August 30, 2023 19:17

Add bench scripts

4550842

pzread force-pushed the batch-matmul-test branch from 857af86 to 4550842 Compare August 30, 2023 19:23

pzread closed this Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNS] Test batch mmt4d #14542

[DNS] Test batch mmt4d #14542

pzread commented Aug 1, 2023 •

edited

Loading

github-actions bot commented Aug 9, 2023 •

edited

Loading

pzread commented Aug 11, 2023 •

edited

Loading

[DNS] Test batch mmt4d #14542

[DNS] Test batch mmt4d #14542

Conversation

pzread commented Aug 1, 2023 • edited Loading

github-actions bot commented Aug 9, 2023 • edited Loading

Abbreviated Benchmark Summary

Improved Latencies 🎉

Regressed Stream IR Dispatch Count (# of cmd.dispatch ops) 🚩

pzread commented Aug 11, 2023 • edited Loading

pzread commented Aug 1, 2023 •

edited

Loading

github-actions bot commented Aug 9, 2023 •

edited

Loading

pzread commented Aug 11, 2023 •

edited

Loading