[GPUCodegen] Characterize performance for dynamic fused self attention #18931

manupak · 2024-10-29T11:09:44Z

We would like to know performance characteristics for dynamic parameters for fused self attention.

M dynamic lengths : 1024, 2048, 3072, 4096, 5120, 6144, 7168, 8192, 16384
M tile sizes : 16, 32, 64 and 128
K1 values to be used : 64, 128
K2 values : this should be equal to M to be self-attention

manupak · 2024-10-29T11:11:31Z

@MaheshRavishankar @Groverkss I ve summarized the info about what needs to be analyzed here.
Let me know if this has to be something different.

I think for llm s we only care about self-attention -- thus K2 = M.

Groverkss · 2024-10-29T11:13:12Z

For K1/N you can use 64/128. You can probably ignore 256.

manupak · 2024-10-29T15:25:29Z

Do we want this done for MI300X in CPX ?

Groverkss · 2024-10-29T15:26:16Z

Do we want this done for MI300X in CPX ?

SPX/CPX either on MI300X should be fine.

manupak · 2024-10-29T15:28:02Z

I ll start with CPX then...

MaheshRavishankar · 2024-10-29T16:02:18Z

Either should be fine, but I think we have SPX available more easily. The trends should be the same.

manupak · 2024-10-29T18:27:42Z

!dtype = f16
!Q     = tensor<1x?x64xf16>
!K     = tensor<1x?x64xf16>
!V     = tensor<1x?x64xf16>
!O     = tensor<1x?x64xf16>

#tuning = #iree_codegen.compilation_info<lowering_config = #iree_gpu.lowering_config<{ workgroup = [1, 64, 0, 0, 0], reduction = [0, 0, 0, 0, 32] }>, translation_info = #iree_codegen.translation_info<LLVMGPUVectorDistribute workgroup_size = [64, 4] subgroup_size = 64 ,{mma_schedule = #iree_gpu.mma_schedule<intrinsic = #iree_gpu.mma_layout<MFMA_F32_32x32x8_F16>, subgroup_m_count = 4, subgroup_n_count = 1> , llvm_func_attrs = { "amdgpu-waves-per-eu" = "2","denormal-fp-math-f32" = "preserve-sign" }}>>


#Q = affine_map<(b, m, n, k1, k2) -> (b, m, k1)>
#K = affine_map<(b, m, n, k1, k2) -> (b, k2, k1)>
#V = affine_map<(b, m, n, k1, k2) -> (b, k2, n)>
#S = affine_map<(b, m, n, k1, k2) -> ()>
#O = affine_map<(b, m, n, k1, k2) -> (b, m, n)>

func.func @main(%Q : !Q, %K : !K, %V : !V) -> !O {
  %scale = arith.constant 1.0 : !dtype
  %c1 = arith.constant 1 : index
  %size1 = tensor.dim %Q, %c1 : !O
  %empty = tensor.empty(%size1) : !O
  %O = iree_linalg_ext.attention 
       { indexing_maps = [#Q, #K, #V, #S, #O]
         ,compilation_info = #tuning
       }
       ins(%Q, %K, %V, %scale : !Q, !K, !V, !dtype) outs(%empty : !O) {
          ^bb0(%score: f32):
            iree_linalg_ext.yield %score : f32
        } -> !O
  return %O : !O
}

I ve managed to generate IRs as above but its not compiling as of now.
Im suspecting Im missing something much simpler given that dynamic attention kernels are supported., yes?

is there any test/example of a dynamic attention kernel in the codebase?

MaheshRavishankar · 2024-10-29T20:28:32Z

Maybe we should try static sizes for those shapes. Making the shape dynamic will not give us as clear a signal yet.

manupak changed the title ~~[GPUCodegen] Characterize performance for dynamic fused self? attention~~ [GPUCodegen] Characterize performance for dynamic fused self attention Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPUCodegen] Characterize performance for dynamic fused self attention #18931

[GPUCodegen] Characterize performance for dynamic fused self attention #18931

manupak commented Oct 29, 2024 •

edited

Loading

manupak commented Oct 29, 2024

Groverkss commented Oct 29, 2024

manupak commented Oct 29, 2024

Groverkss commented Oct 29, 2024

manupak commented Oct 29, 2024

MaheshRavishankar commented Oct 29, 2024

manupak commented Oct 29, 2024

MaheshRavishankar commented Oct 29, 2024

[GPUCodegen] Characterize performance for dynamic fused self attention #18931

[GPUCodegen] Characterize performance for dynamic fused self attention #18931

Comments

manupak commented Oct 29, 2024 • edited Loading

manupak commented Oct 29, 2024

Groverkss commented Oct 29, 2024

manupak commented Oct 29, 2024

Groverkss commented Oct 29, 2024

manupak commented Oct 29, 2024

MaheshRavishankar commented Oct 29, 2024

manupak commented Oct 29, 2024

MaheshRavishankar commented Oct 29, 2024

manupak commented Oct 29, 2024 •

edited

Loading