[CPU] Lower `linalg.batch.matmul` to micro-kernels #14445

dcaballe · 2023-07-19T22:20:03Z

It looks like we are not lowering linalg.batch.matmul to micro-kernels:

This is an example from Bert Large:

hal.executable public @forward_dispatch_23 {
  hal.executable.variant public @system_elf_x86_64, target = <"llvm-cpu", "system-elf-x86_64", {cpu = "cascadelake", cpu_features = "+cmov,+mmx,+popcnt,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+avx,+avx2,+fma,+avx512f,+bmi,+bmi2,+aes,+pclmul,+avx512vl,+avx512bw,+avx512dq,+avx512cd,+avx512vnni,+adx,+clflushopt,+clwb,+cx16,+cx8,+crc32,+f16c,+fsgsbase,+fxsr,+invpcid,+lzcnt,+movbe,+pku,+prfchw,+rdrnd,+rdseed,+sahf,+x87,+xsave,+xsavec,+xsaveopt,+xsaves", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-unknown-linux-elf", ukernels = true}> {
    hal.executable.export public @forward_dispatch_23_batch_matmul_1024x384x64x384_f32 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>) {
    ^bb0(%arg0: !hal.device):
      %x, %y, %z = flow.dispatch.workgroup_count_from_slice 
      hal.return %x, %y, %z : index, index, index
    }
    builtin.module {
      func.func @forward_dispatch_23_batch_matmul_1024x384x64x384_f32() {
        %c503414784 = arith.constant 503414784 : index
        %c100761600 = arith.constant 100761600 : index
        %c201424896 = arith.constant 201424896 : index
        %cst = arith.constant 0.000000e+00 : f32
        %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c503414784) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x384x384xf32>>
        %1 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c100761600) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x384x64xf32>>
        %2 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c201424896) : !flow.dispatch.tensor<writeonly:tensor<1024x384x64xf32>>
        %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 384, 384], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x384x384xf32>> -> tensor<1024x384x384xf32>
        %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [1024, 384, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x384x64xf32>> -> tensor<1024x384x64xf32>
        %5 = tensor.empty() : tensor<1024x384x64xf32>
        %6 = linalg.fill ins(%cst : f32) outs(%5 : tensor<1024x384x64xf32>) -> tensor<1024x384x64xf32>
        %7 = linalg.batch_matmul ins(%3, %4 : tensor<1024x384x384xf32>, tensor<1024x384x64xf32>) outs(%6 : tensor<1024x384x64xf32>) -> tensor<1024x384x64xf32>
        flow.dispatch.tensor.store %7, %2, offsets = [0, 0, 0], sizes = [1024, 384, 64], strides = [1, 1, 1] : tensor<1024x384x64xf32> -> !flow.dispatch.tensor<writeonly:tensor<1024x384x64xf32>>
        return
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

dcaballe · 2023-07-19T22:28:45Z

Duplicated: #14431

dcaballe added the codegen/llvm LLVM code generation compiler backend label Jul 19, 2023

dcaballe assigned MaheshRavishankar and hanhanW Jul 19, 2023

dcaballe closed this as completed Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Lower `linalg.batch.matmul` to micro-kernels #14445

[CPU] Lower `linalg.batch.matmul` to micro-kernels #14445

dcaballe commented Jul 19, 2023

dcaballe commented Jul 19, 2023

[CPU] Lower linalg.batch.matmul to micro-kernels #14445

[CPU] Lower linalg.batch.matmul to micro-kernels #14445

Comments

dcaballe commented Jul 19, 2023

dcaballe commented Jul 19, 2023

[CPU] Lower `linalg.batch.matmul` to micro-kernels #14445

[CPU] Lower `linalg.batch.matmul` to micro-kernels #14445