[Epic][CPU] Enable predictable performance for matrix multiplication using data-tiling and micro-kernels #13216
Labels
codegen/llvm
LLVM code generation compiler backend
codegen
Shared code generation infrastructure and dialects
epic
Tracking issues for related deliverables (as in Agile dev)
p1
This EPIC tracking all the micro-kernel related work.
Tasks related to core functionality.
Blocking issues with data-tiling on e2e workloads
Amortizing costs related to packing/unpacking by using propagation
Enable propagation in models where packing can be done without padding, starting with MobileBertfp32.
Some supports are already developed in -upstream. To enable it for MobileBert, we need
Evaluating propagation in models where packing needs padding as well.
This needs more investigation. Propagation of tensor.pack ops with padding values is very challenging. We are able to propagate it through some limited cases. Need more investigation for this part.
The text was updated successfully, but these errors were encountered: