-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][OpenMP] Implement more robust loop-nest detection logic #127
base: amd-trunk-dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
! Tests loop-nest detection algorithm for do-concurrent mapping. | ||
|
||
! REQUIRES: asserts | ||
|
||
! RUN: %flang_fc1 -emit-hlfir -fopenmp -fdo-concurrent-parallel=host \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did this to show loops on the Fortran source level. Just makes it easy to correlate for which loop nests do we detect that they are perfectly nested. If you don't think this is a good enough arugment to have the test on the fortran level, I will replace it with MLIR instead. |
||
! RUN: -mmlir -debug %s -o - 2> %t.log || true | ||
|
||
! RUN: FileCheck %s < %t.log | ||
|
||
program main | ||
implicit none | ||
|
||
contains | ||
|
||
subroutine foo(n) | ||
implicit none | ||
integer :: n, m | ||
integer :: i, j, k | ||
integer :: x | ||
integer, dimension(n) :: a | ||
integer, dimension(n, n, n) :: b | ||
|
||
! NOTE This for sure is a perfect loop nest. However, the way `do-concurrent` | ||
! loops are now emitted by flang is probably not correct. This is being looked | ||
! into at the moment and once we have flang emitting proper loop headers, we | ||
! will revisit this. | ||
! | ||
! CHECK: Loop pair starting at location | ||
! CHECK: loc("{{.*}}":[[# @LINE + 1]]:{{.*}}) is not perfectly nested | ||
do concurrent(i=1:n, j=1:bar(n*m, n/m)) | ||
a(i) = n | ||
end do | ||
|
||
! NOTE same as above. | ||
! | ||
! CHECK: Loop pair starting at location | ||
! CHECK: loc("{{.*}}":[[# @LINE + 1]]:{{.*}}) is not perfectly nested | ||
do concurrent(i=bar(n, x):n, j=1:bar(n*m, n/m)) | ||
a(i) = n | ||
end do | ||
|
||
! NOTE This is **not** a perfect nest since the inner call to `bar` will allocate | ||
! memory for the temp results of `n*m` and `n/m` **inside** the outer loop. | ||
! | ||
! CHECK: Loop pair starting at location | ||
! CHECK: loc("{{.*}}":[[# @LINE + 1]]:{{.*}}) is not perfectly nested | ||
do concurrent(i=bar(n, x):n) | ||
do concurrent(j=1:bar(n*m, n/m)) | ||
a(i) = n | ||
end do | ||
end do | ||
|
||
! CHECK: Loop pair starting at location | ||
! CHECK: loc("{{.*}}":[[# @LINE + 1]]:{{.*}}) is not perfectly nested | ||
do concurrent(i=1:n) | ||
x = 10 | ||
do concurrent(j=1:m) | ||
b(i,j,k) = i * j + k | ||
end do | ||
end do | ||
|
||
! CHECK: Loop pair starting at location | ||
! CHECK: loc("{{.*}}":[[# @LINE + 1]]:{{.*}}) is not perfectly nested | ||
do concurrent(i=1:n) | ||
do concurrent(j=1:m) | ||
b(i,j,k) = i * j + k | ||
end do | ||
x = 10 | ||
end do | ||
|
||
! CHECK: Loop pair starting at location | ||
! CHECK: loc("{{.*}}":[[# @LINE + 1]]:{{.*}}) is perfectly nested | ||
do concurrent(i=1:n) | ||
do concurrent(j=1:m) | ||
b(i,j,k) = i * j + k | ||
x = 10 | ||
end do | ||
end do | ||
end subroutine | ||
|
||
pure function bar(n, m) | ||
implicit none | ||
integer, intent(in) :: n, m | ||
integer :: bar | ||
|
||
bar = n + m | ||
end function | ||
|
||
end program main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please alter the test to not rely on debug statements for failure/success determination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not ideal indeed. Do you have any suggestions on how to do this differently?
I thought about doing this as a unit test, but setting up the test and later reading the test I think would be more complicated than a lit test the clearly shows what the loops look like and what the outcome should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can a special flag to print loop info to
llvm::outs()
. But I am not sure this is worth it tbh.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't like relying on debug output either. It randomly interleaves with stdout and only works with assertions-builds. Additionally, it makes the test dependent on how often/in which order the
isPerfectlyNested
function is called internally, making it sensitive to even NFC patches. But I have seen this in LLVM and Clang enough to be consider a established pattern there, although not for MLIR/Flang.If you do this, you still must:
2>
instead&>
REQUIRES: asserts
or it will fail in release buildsIn MLIR I indeed usually see an internal option enabling additional printing. or print debug counters E.g.
-test-print-shape-mapping
, or a pass that just prints the analysis result, e.g.-pass-pipeline=...test-print-dominance
, or test the diagnostic from-Rpass
output.Since it is a transformation pass, one would typically (also) test whether the output is what was expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info @Meinersbur, really useful.
I used
2>
andREQUIRES: asserts
.All other
do-concurrent-conversion
tests verify the output. I wanted this one to test only one thing which is loop-nest detection. My reasoning is that this test isolates this particular part of the pass so that we debug issues in nest detection more easily.