Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix improper touched buffer assignment of Pass MergeSharedMemoryAllocations #17438

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

LeiWang1999
Copy link
Contributor

As discussed in issue #17375, the current rule for assigning touched buffers is not appropriate. Consider the following example:

code_block_0
for k in range(0, 10): # (the gen point of A_shared and B_shared will be injected into this for expression)
    for i in range(0, 10):
          A_shared <- A
    for i in range(0, 10):
          B_shared <- B
    code_block_1 (consume A_shared and B_shared)
code_block_2 (produce and consume C_shared)

This setup works by chance in simple GEMM scenarios. However, the correct approach should be

code_block_0
for k in range(0, 10): 
    for i in range(0, 10):
          A_shared <- A # (the gen point of A_shared should be bind into this BufferStore Node)
    for i in range(0, 10):
          B_shared <- B # (the gen point of B_shared be bind into this BufferStore Node)
    code_block_1 (consume A_shared and B_shared)
code_block_2 (produce and consume C_shared)

This approach works correctly even in more complex scenarios, such as batched GEMM, where the naive template would fail.

This pull request made a simple modification for MergeSharedMemory Pass to enable the right analysis, and always disable the naive naive shared memory buffer fuse if kernel with dynamic in StorageRewrite Pass

}
}
}

void VisitExpr_(const CallNode* op) final {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this visit function, as it only allow visit indices, which will lead to some buffer load statement not be traced.

@@ -1755,7 +1793,7 @@ Pass StorageRewrite() {
// padded out to 32 bits) would require either rewriting
// AllocateConst::data, or would require the code generators to
// handle vectorized constants.
return PointerValueTypeRewrite(std::move(f), true, false, false, true, true, true, false,
return PointerValueTypeRewrite(std::move(f), true, false, false, false, true, true, false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fourth condition must be false, as the vectorized buffer merge (for example, merge half B_shared[1024] into halfx8 B_shared[128]) will occasional lead to a unhandled behavior during async copy lowering phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant