Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNS] Test batch mmt4d #14542

Closed
wants to merge 5 commits into from
Closed

[DNS] Test batch mmt4d #14542

wants to merge 5 commits into from

Conversation

pzread
Copy link
Contributor

@pzread pzread commented Aug 1, 2023

No description provided.

@pzread pzread added the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 9, 2023
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Abbreviated Benchmark Summary

@ commit 6c18616a89ce611893e00b192f672268817f64fc (vs. base 62f52876418b68945c4c75c763e8f4fcd35a9276)

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 20.780 (vs. 23.142, 10.21%↓) 20.783 0.068

Regressed Stream IR Dispatch Count (# of cmd.dispatch ops) 🚩

Benchmark Name Stream IR Dispatch Count (# of cmd.dispatch ops)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel,compile-stats] 823 (vs. 751, 9.59%↑)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel,compile-stats] 416 (vs. 380, 9.47%↑)

For more information:

Source Workflow Run

@pzread
Copy link
Contributor Author

pzread commented Aug 11, 2023

An initial results on large models running with data tiling + ukernel on c2-standard-60 after vs. before this change. Each benchmark is running on c2-standard-60 for 50 iterations.

We see sligthly regressions on BertLarge and T5 models. I'll need to look deeper to find out the root causes.

ResNet50 VMFBs are identical before and after the change as expected, so their values can be seen as the references for stability of VM.

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
T5LargeTFBatch32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 36637.602 (vs. 33026.190, 10.93%↑) 36617.765 106.204
T5LargeTFBatch16(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 18432.568 (vs. 16764.591, 9.95%↑) 18414.192 56.613
T5LargeTFBatch1(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 1839.990 (vs. 1737.543, 5.90%↑) 1839.125 7.967
BertLargeTFBatch64(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 22836.697 (vs. 21648.951, 5.49%↑) 22822.126 77.858
BertLargeTFBatch32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 12265.789 (vs. 11760.119, 4.30%↑) 12249.995 50.335
BertLargeTFBatch1(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 455.370 (vs. 434.912, 4.70%↑) 454.843 3.093
Resnet50TFBatch128(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 3637.981 (vs. 3556.597, 2.29%↑) 3631.393 32.384
Resnet50TFBatch64(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 1853.184 (vs. 1812.858, 2.22%↑) 1848.911 25.854
Resnet50TFBatch1(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 34.068 (vs. 33.796, 0.80%↑) 33.869 0.478

@pzread pzread force-pushed the batch-matmul-test branch 5 times, most recently from 210f7a4 to f187f1f Compare August 15, 2023 22:57
@pzread pzread removed the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 15, 2023
@pzread pzread force-pushed the batch-matmul-test branch 11 times, most recently from 7159e5d to e11c6b4 Compare August 17, 2023 22:48
@pzread pzread force-pushed the batch-matmul-test branch 3 times, most recently from 1c4e61a to 588591b Compare August 23, 2023 08:11
@pzread pzread added the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 23, 2023
@pzread pzread force-pushed the batch-matmul-test branch 3 times, most recently from 9cc4914 to 8e7011a Compare August 23, 2023 21:03
@pzread pzread force-pushed the batch-matmul-test branch 3 times, most recently from 46c4e62 to 3726959 Compare August 28, 2023 20:38
This reverts commit a0da52a.
@pzread pzread removed the benchmarks:x86_64 Run default x86_64 benchmarks label Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant