[Bugfix] Fix improper touched buffer assignment of Pass MergeSharedMemoryAllocations #17438

LeiWang1999 · 2024-10-04T16:44:28Z

As discussed in issue #17375, the current rule for assigning touched buffers is not appropriate. Consider the following example:

code_block_0
for k in range(0, 10): # (the gen point of A_shared and B_shared will be injected into this for expression)
    for i in range(0, 10):
          A_shared <- A
    for i in range(0, 10):
          B_shared <- B
    code_block_1 (consume A_shared and B_shared)
code_block_2 (produce and consume C_shared)

This setup works by chance in simple GEMM scenarios. However, the correct approach should be

code_block_0
for k in range(0, 10): 
    for i in range(0, 10):
          A_shared <- A # (the gen point of A_shared should be bind into this BufferStore Node)
    for i in range(0, 10):
          B_shared <- B # (the gen point of B_shared be bind into this BufferStore Node)
    code_block_1 (consume A_shared and B_shared)
code_block_2 (produce and consume C_shared)

This approach works correctly even in more complex scenarios, such as batched GEMM, where the naive template would fail.

This pull request made a simple modification for MergeSharedMemory Pass to enable the right analysis, and always disable the naive naive shared memory buffer fuse if kernel with dynamic in StorageRewrite Pass

LeiWang1999 · 2024-10-04T16:45:41Z

src/tir/transforms/merge_shared_memory_allocations.cc

      }
    }
  }

-  void VisitExpr_(const CallNode* op) final {


remove this visit function, as it only allow visit indices, which will lead to some buffer load statement not be traced.

LeiWang1999 · 2024-10-04T16:47:48Z

src/tir/transforms/storage_rewrite.cc

@@ -1755,7 +1793,7 @@ Pass StorageRewrite() {
    // padded out to 32 bits) would require either rewriting
    // AllocateConst::data, or would require the code generators to
    // handle vectorized constants.
-    return PointerValueTypeRewrite(std::move(f), true, false, false, true, true, true, false,
+    return PointerValueTypeRewrite(std::move(f), true, false, false, false, true, true, false,


The fourth condition must be false, as the vectorized buffer merge (for example, merge half B_shared[1024] into halfx8 B_shared[128]) will occasional lead to a unhandled behavior during async copy lowering phase.

LeiWang1999 added 2 commits October 4, 2024 16:30

fix liveness analysis.

e00b4e4

always disable storage fuse within dynamic

e4d3bdd

LeiWang1999 commented Oct 4, 2024

View reviewed changes

tqchen requested review from spectrometerHBH and vinx13 October 4, 2024 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix improper touched buffer assignment of Pass MergeSharedMemoryAllocations #17438

[Bugfix] Fix improper touched buffer assignment of Pass MergeSharedMemoryAllocations #17438

LeiWang1999 commented Oct 4, 2024

LeiWang1999 Oct 4, 2024

LeiWang1999 Oct 4, 2024

[Bugfix] Fix improper touched buffer assignment of Pass MergeSharedMemoryAllocations #17438

Are you sure you want to change the base?

[Bugfix] Fix improper touched buffer assignment of Pass MergeSharedMemoryAllocations #17438

Conversation

LeiWang1999 commented Oct 4, 2024

LeiWang1999 Oct 4, 2024

Choose a reason for hiding this comment

LeiWang1999 Oct 4, 2024

Choose a reason for hiding this comment