Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Lower linalg.batch.matmul to micro-kernels #14445

Closed
dcaballe opened this issue Jul 19, 2023 · 1 comment
Closed

[CPU] Lower linalg.batch.matmul to micro-kernels #14445

dcaballe opened this issue Jul 19, 2023 · 1 comment
Assignees
Labels
codegen/llvm LLVM code generation compiler backend

Comments

@dcaballe
Copy link
Contributor

It looks like we are not lowering linalg.batch.matmul to micro-kernels:

This is an example from Bert Large:

hal.executable public @forward_dispatch_23 {
  hal.executable.variant public @system_elf_x86_64, target = <"llvm-cpu", "system-elf-x86_64", {cpu = "cascadelake", cpu_features = "+cmov,+mmx,+popcnt,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+avx,+avx2,+fma,+avx512f,+bmi,+bmi2,+aes,+pclmul,+avx512vl,+avx512bw,+avx512dq,+avx512cd,+avx512vnni,+adx,+clflushopt,+clwb,+cx16,+cx8,+crc32,+f16c,+fsgsbase,+fxsr,+invpcid,+lzcnt,+movbe,+pku,+prfchw,+rdrnd,+rdseed,+sahf,+x87,+xsave,+xsavec,+xsaveopt,+xsaves", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-unknown-linux-elf", ukernels = true}> {
    hal.executable.export public @forward_dispatch_23_batch_matmul_1024x384x64x384_f32 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>) {
    ^bb0(%arg0: !hal.device):
      %x, %y, %z = flow.dispatch.workgroup_count_from_slice 
      hal.return %x, %y, %z : index, index, index
    }
    builtin.module {
      func.func @forward_dispatch_23_batch_matmul_1024x384x64x384_f32() {
        %c503414784 = arith.constant 503414784 : index
        %c100761600 = arith.constant 100761600 : index
        %c201424896 = arith.constant 201424896 : index
        %cst = arith.constant 0.000000e+00 : f32
        %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c503414784) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x384x384xf32>>
        %1 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c100761600) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x384x64xf32>>
        %2 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c201424896) : !flow.dispatch.tensor<writeonly:tensor<1024x384x64xf32>>
        %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 384, 384], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x384x384xf32>> -> tensor<1024x384x384xf32>
        %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [1024, 384, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x384x64xf32>> -> tensor<1024x384x64xf32>
        %5 = tensor.empty() : tensor<1024x384x64xf32>
        %6 = linalg.fill ins(%cst : f32) outs(%5 : tensor<1024x384x64xf32>) -> tensor<1024x384x64xf32>
        %7 = linalg.batch_matmul ins(%3, %4 : tensor<1024x384x384xf32>, tensor<1024x384x64xf32>) outs(%6 : tensor<1024x384x64xf32>) -> tensor<1024x384x64xf32>
        flow.dispatch.tensor.store %7, %2, offsets = [0, 0, 0], sizes = [1024, 384, 64], strides = [1, 1, 1] : tensor<1024x384x64xf32> -> !flow.dispatch.tensor<writeonly:tensor<1024x384x64xf32>>
        return
      }
    }
  }
}
@dcaballe dcaballe added the codegen/llvm LLVM code generation compiler backend label Jul 19, 2023
@dcaballe
Copy link
Contributor Author

Duplicated: #14431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/llvm LLVM code generation compiler backend
Projects
None yet
Development

No branches or pull requests

3 participants