Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iree-codegen-iree-comprehensive-bufferize genereates memrefs with dynamic offset #847

Closed
makslevental opened this issue Oct 16, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@makslevental
Copy link
Collaborator

makslevental commented Oct 16, 2024

#845 is blocked because at that commit of IREE, iree-codegen-iree-comprehensive-bufferize generates memrefs with dynamic offsets and we get an error here.

@MaheshRavishankar any clue what changed recently that might produce this behavior? Possibly @pashu123 might be able to give a hint (I'm seeing recent changes in git-blame...).

cc @jtuyls @yzhang93 @newling @Abhishek-Varma

Failing snippet follows; what stands out to me as odd/a clue is that hal.interface.binding.subspan now has a memref.assume_alignment with a dynamic offset:

func.func @mm_in_bf16_out_f32_dispatch_0_matmul_64x64x64_bf16xbf16xf32() attributes {translation_info = #iree_codegen.translation_info<Custom>} {
  %c0 = arith.constant 0 : index
  %cst = arith.constant 0.000000e+00 : f32
  %alloc = memref.alloc() : memref<1x1x8x4x8x4xbf16, 2 : i32>
  %alloc_0 = memref.alloc() : memref<1x1x4x8x4x8xbf16, 2 : i32>
  %alloc_1 = memref.alloc() : memref<1x2x32x32xbf16, 1 : i32>
  %alloc_2 = memref.alloc() : memref<2x1x32x32xbf16, 1 : i32>
  %alloc_3 = memref.alloc() : memref<2x2x8x8x4x4xf32, 2 : i32>
  %alloc_4 = memref.alloc() : memref<2x2x32x32xf32, 1 : i32>
  %0:3 = util.assume.int 
      %c0<umin = 0, umax = 0>, 
      %c0<umin = 0, umax = 0>, 
      %c0<umin = 0, umax = 0>
    : index, index, index
  %1 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(0) alignment(64) offset(%0#0) flags("ReadOnly|Indirect") : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
  memref.assume_alignment %1, 1 : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
  %2 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(1) alignment(64) offset(%0#1) flags("ReadOnly|Indirect") : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
  memref.assume_alignment %2, 1 : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
  %3 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(2) alignment(64) offset(%0#2) flags(Indirect) : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
  memref.assume_alignment %3, 1 : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
  scf.forall (%arg0, %arg1) = (0, 0) to (64, 64) step (64, 64) {
    %subview = memref.subview %1[%arg0, 0] [64, 64] [1, 1] : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    %subview_5 = memref.subview %2[0, %arg1] [64, 64] [1, 1] : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    %subview_6 = memref.subview %3[%arg0, %arg1] [64, 64] [1, 1] : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    %subview_7 = memref.subview %subview[0, 0] [64, 32] [1, 1] : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<64x32xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    iree_linalg_ext.pack %subview_7 inner_dims_pos = [0, 1] inner_tiles = [32, 32] into %alloc_2 : (memref<64x32xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> memref<2x1x32x32xbf16, 1 : i32>)
    %subview_8 = memref.subview %subview_5[0, 0] [32, 64] [1, 1] : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<32x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    iree_linalg_ext.pack %subview_8 outer_dims_perm = [0, 1] inner_dims_pos = [0, 1] inner_tiles = [32, 32] into %alloc_1 : (memref<32x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> memref<1x2x32x32xbf16, 1 : i32>)
    scf.forall (%arg2, %arg3) in (2, 2) {
      %subview_12 = memref.subview %alloc_2[%arg2, 0, 0, 0] [1, 1, 32, 32] [1, 1, 1, 1] : memref<2x1x32x32xbf16, 1 : i32> to memref<1x1x32x32xbf16, strided<[1024, 1024, 32, 1], offset: ?>, 1 : i32>
      iree_linalg_ext.pack %subview_12 outer_dims_perm = [0, 1, 3, 2] inner_dims_pos = [2, 3] inner_tiles = [4, 8] into %alloc_0 : (memref<1x1x32x32xbf16, strided<[1024, 1024, 32, 1], offset: ?>, 1 : i32> memref<1x1x4x8x4x8xbf16, 2 : i32>)
      %subview_13 = memref.subview %alloc_1[0, %arg3, 0, 0] [1, 1, 32, 32] [1, 1, 1, 1] : memref<1x2x32x32xbf16, 1 : i32> to memref<1x1x32x32xbf16, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32>
      iree_linalg_ext.pack %subview_13 outer_dims_perm = [0, 1, 3, 2] inner_dims_pos = [2, 3] inner_tiles = [8, 4] into %alloc : (memref<1x1x32x32xbf16, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32> memref<1x1x8x4x8x4xbf16, 2 : i32>)
      %subview_14 = memref.subview %alloc_3[%arg2, %arg3, 0, 0, 0, 0] [1, 1, 8, 8, 4, 4] [1, 1, 1, 1, 1, 1] : memref<2x2x8x8x4x4xf32, 2 : i32> to memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>
      linalg.fill ins(%cst : f32) outs(%subview_14 : memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>)
      linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8) -> (d0, d2, d5, d3, d6, d8)>, affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8) -> (d2, d1, d4, d5, d8, d7)>, affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8) -> (d0, d1, d4, d3, d6, d7)>], iterator_types = ["parallel", "parallel", "reduction", "parallel", "parallel", "reduction", "parallel", "parallel", "reduction"]} ins(%alloc_0, %alloc : memref<1x1x4x8x4x8xbf16, 2 : i32>, memref<1x1x8x4x8x4xbf16, 2 : i32>) outs(%subview_14 : memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>) attrs =  {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [0, 0, 1], [1, 1, 0, 0, 0, 0]]>, packing_config = #amdaie.packing_config<packing_config = [{packedSizes = [32, 32, 32], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]>} {
      ^bb0(%in: bf16, %in_16: bf16, %out: f32):
        %4 = arith.extf %in : bf16 to f32
        %5 = arith.extf %in_16 : bf16 to f32
        %6 = arith.mulf %4, %5 : f32
        %7 = arith.addf %out, %6 : f32
        linalg.yield %7 : f32
      }
      %subview_15 = memref.subview %alloc_3[%arg2, %arg3, 0, 0, 0, 0] [1, 1, 8, 8, 4, 4] [1, 1, 1, 1, 1, 1] : memref<2x2x8x8x4x4xf32, 2 : i32> to memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>
      linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d2, d3, d4, d5)>, affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d2, d3, d4, d5)>], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "parallel"]} ins(%subview_14 : memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>) outs(%subview_15 : memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>) {
      ^bb0(%in: f32, %out: f32):
        linalg.yield %in : f32
      }
    } {mapping = [#gpu.thread<y>, #gpu.thread<x>]}
    %subview_9 = memref.subview %subview[0, 32] [64, 32] [1, 1] : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<64x32xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    iree_linalg_ext.pack %subview_9 inner_dims_pos = [0, 1] inner_tiles = [32, 32] into %alloc_2 : (memref<64x32xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> memref<2x1x32x32xbf16, 1 : i32>)
    %subview_10 = memref.subview %subview_5[32, 0] [32, 64] [1, 1] : memref<64x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<32x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    iree_linalg_ext.pack %subview_10 outer_dims_perm = [0, 1] inner_dims_pos = [0, 1] inner_tiles = [32, 32] into %alloc_1 : (memref<32x64xbf16, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> memref<1x2x32x32xbf16, 1 : i32>)
    scf.forall (%arg2, %arg3) in (2, 2) {
      %subview_12 = memref.subview %alloc_2[%arg2, 0, 0, 0] [1, 1, 32, 32] [1, 1, 1, 1] : memref<2x1x32x32xbf16, 1 : i32> to memref<1x1x32x32xbf16, strided<[1024, 1024, 32, 1], offset: ?>, 1 : i32>
      iree_linalg_ext.pack %subview_12 outer_dims_perm = [0, 1, 3, 2] inner_dims_pos = [2, 3] inner_tiles = [4, 8] into %alloc_0 : (memref<1x1x32x32xbf16, strided<[1024, 1024, 32, 1], offset: ?>, 1 : i32> memref<1x1x4x8x4x8xbf16, 2 : i32>)
      %subview_13 = memref.subview %alloc_1[0, %arg3, 0, 0] [1, 1, 32, 32] [1, 1, 1, 1] : memref<1x2x32x32xbf16, 1 : i32> to memref<1x1x32x32xbf16, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32>
      iree_linalg_ext.pack %subview_13 outer_dims_perm = [0, 1, 3, 2] inner_dims_pos = [2, 3] inner_tiles = [8, 4] into %alloc : (memref<1x1x32x32xbf16, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32> memref<1x1x8x4x8x4xbf16, 2 : i32>)
      %subview_14 = memref.subview %alloc_3[%arg2, %arg3, 0, 0, 0, 0] [1, 1, 8, 8, 4, 4] [1, 1, 1, 1, 1, 1] : memref<2x2x8x8x4x4xf32, 2 : i32> to memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>
      linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8) -> (d0, d2, d5, d3, d6, d8)>, affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8) -> (d2, d1, d4, d5, d8, d7)>, affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8) -> (d0, d1, d4, d3, d6, d7)>], iterator_types = ["parallel", "parallel", "reduction", "parallel", "parallel", "reduction", "parallel", "parallel", "reduction"]} ins(%alloc_0, %alloc : memref<1x1x4x8x4x8xbf16, 2 : i32>, memref<1x1x8x4x8x4xbf16, 2 : i32>) outs(%subview_14 : memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>) attrs =  {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [0, 0, 1], [1, 1, 0, 0, 0, 0]]>, packing_config = #amdaie.packing_config<packing_config = [{packedSizes = [32, 32, 32], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]>} {
      ^bb0(%in: bf16, %in_18: bf16, %out: f32):
        %4 = arith.extf %in : bf16 to f32
        %5 = arith.extf %in_18 : bf16 to f32
        %6 = arith.mulf %4, %5 : f32
        %7 = arith.addf %out, %6 : f32
        linalg.yield %7 : f32
      }
      %subview_15 = memref.subview %alloc_4[%arg2, %arg3, 0, 0] [1, 1, 32, 32] [1, 1, 1, 1] : memref<2x2x32x32xf32, 1 : i32> to memref<1x1x32x32xf32, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32>
      iree_linalg_ext.unpack %subview_14 outer_dims_perm = [0, 1, 3, 2] inner_dims_pos = [2, 3] inner_tiles = [4, 4] into %subview_15 : (memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32> memref<1x1x32x32xf32, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32>)
      %subview_16 = memref.subview %alloc_3[%arg2, %arg3, 0, 0, 0, 0] [1, 1, 8, 8, 4, 4] [1, 1, 1, 1, 1, 1] : memref<2x2x8x8x4x4xf32, 2 : i32> to memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>
      linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d2, d3, d4, d5)>, affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d2, d3, d4, d5)>], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "parallel"]} ins(%subview_14 : memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>) outs(%subview_16 : memref<1x1x8x8x4x4xf32, strided<[2048, 1024, 128, 16, 4, 1], offset: ?>, 2 : i32>) {
      ^bb0(%in: f32, %out: f32):
        linalg.yield %in : f32
      }
      %subview_17 = memref.subview %alloc_4[%arg2, %arg3, 0, 0] [1, 1, 32, 32] [1, 1, 1, 1] : memref<2x2x32x32xf32, 1 : i32> to memref<1x1x32x32xf32, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32>
      linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%subview_15 : memref<1x1x32x32xf32, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32>) outs(%subview_17 : memref<1x1x32x32xf32, strided<[2048, 1024, 32, 1], offset: ?>, 1 : i32>) {
      ^bb0(%in: f32, %out: f32):
        linalg.yield %in : f32
      }
    } {mapping = [#gpu.thread<y>, #gpu.thread<x>]}
    iree_linalg_ext.unpack %alloc_4 inner_dims_pos = [0, 1] inner_tiles = [32, 32] into %subview_6 : (memref<2x2x32x32xf32, 1 : i32> memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>)
    %subview_11 = memref.subview %3[%arg0, %arg1] [64, 64] [1, 1] : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
    linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%subview_6 : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>) outs(%subview_11 : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>) {
    ^bb0(%in: f32, %out: f32):
      linalg.yield %in : f32
    }
  } {mapping = [#gpu.block<y>, #gpu.block<x>]}
  linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%3 : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>) outs(%3 : memref<64x64xf32, strided<[64, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  }
  memref.dealloc %alloc_4 : memref<2x2x32x32xf32, 1 : i32>
  memref.dealloc %alloc_3 : memref<2x2x8x8x4x4xf32, 2 : i32>
  memref.dealloc %alloc_2 : memref<2x1x32x32xbf16, 1 : i32>
  memref.dealloc %alloc_1 : memref<1x2x32x32xbf16, 1 : i32>
  memref.dealloc %alloc_0 : memref<1x1x4x8x4x8xbf16, 2 : i32>
  memref.dealloc %alloc : memref<1x1x8x4x8x4xbf16, 2 : i32>
  return
}
@makslevental makslevental self-assigned this Oct 16, 2024
@makslevental makslevental added the bug Something isn't working label Oct 16, 2024
@makslevental makslevental changed the title iree-codegen-iree-comprehensive-bufferize genereates memref.subview with dynamic size iree-codegen-iree-comprehensive-bufferize genereates memrefs with dynamic offset Oct 16, 2024
@pashu123
Copy link

I've made a change to duplicate Empty tensor ops here: https://github.com/iree-org/iree/blob/05bbcf1385146d075829cd940a52bf06961614d0/compiler/src/iree/compiler/Codegen/Common/IREEComprehensiveBufferizePass.cpp#L177 Since, we are not using the destination passing style as a preprocessing for distribute-using-for-all we had to make that decision. If your pipeline uses convert-to-destination passing style pass, then it shouldn't make a difference. @MaheshRavishankar, do you think the error might be caused by the change mentioned?

@yzhang93
Copy link
Contributor

yzhang93 commented Oct 17, 2024

I've made a change to duplicate Empty tensor ops here: https://github.com/iree-org/iree/blob/05bbcf1385146d075829cd940a52bf06961614d0/compiler/src/iree/compiler/Codegen/Common/IREEComprehensiveBufferizePass.cpp#L177 Since, we are not using the destination passing style as a preprocessing for distribute-using-for-all we had to make that decision. If your pipeline uses convert-to-destination passing style pass, then it shouldn't make a difference. @MaheshRavishankar, do you think the error might be caused by the change mentioned?

No, I don't think the error is caused by your change.

The reason is like @makslevental mentioned because of

%0:3 = util.assume.int 
      %c0<umin = 0, umax = 0>, 
      %c0<umin = 0, umax = 0>, 
      %c0<umin = 0, umax = 0>
    : index, index, index
  %1 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(0) alignment(64) offset(%0#0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<128x128xi32>>
  %2 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(1) alignment(64) offset(%0#1) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<128x128xi32>>
  %3 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(2) alignment(64) offset(%0#2) flags(Indirect) : !flow.dispatch.tensor<writeonly:tensor<128x128xi32>>

It generates memref.assume_alignment with a dynamic offset after bufferization.

I don't know how to get rid of the dynamic offsets, but if we remove this check for now, then we can proceed without problem.

@MaheshRavishankar
Copy link
Collaborator

@yzhang93
Copy link
Contributor

I think Stella's optimization PRs from yesterday solved the problem, my local build with new iree bump works. I'll update the branch later after fixing some other conflicts.

@makslevental
Copy link
Collaborator Author

I think Stella's optimization PRs from yesterday solved the problem, my local build with new iree bump works. I'll update the branch later after fixing some other conflicts.

that's like two wrongs make a right lol. cool.

@MaheshRavishankar
Copy link
Collaborator

I think Stella's optimization PRs from yesterday solved the problem, my local build with new iree bump works. I'll update the branch later after fixing some other conflicts.

that's like two wrongs make a right lol. cool.

Hey maybe this is two rights!!

@makslevental
Copy link
Collaborator Author

Fixed by #845

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants