-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the 3D buffer copy example using different uniformElements loops #2377
Rewrite the 3D buffer copy example using different uniformElements loops #2377
Conversation
@psychocoderHPC @SimeonEhrig do you prefer that the three kernels use three different approaches (just to showcase them) or only one of them ? And, in case, which one ? |
I like the comparison of the three approaches. |
@fwyzard Can you please push again. Looks like the CI trigger didn't worked. |
85fa529
to
c67a881
Compare
It's still not working :( |
@fwyzard please amend and force push this PR again, the CI is fixed |
c67a881
to
a45dac3
Compare
c272fd5
a45dac3
to
c272fd5
Compare
d7b1039
to
ee766b1
Compare
dfbe6c8
to
4f046dc
Compare
e009662
to
de6ca89
Compare
one job in the ci failed with
I restarted the job to see if it is a temporary issue but typical it means that GPU device used invalid kernel start parameter e.g. to many threads per block. |
I did a local check of the register footprint for this example to check if it could be that we can not use as many threads per block anymore due to high register usage which will reduce the valid blocksize. |
IMO the problem that we run into the error Note: I am not against using the new iterator for this example. I think an easy fix is adding |
I opened #2382 to track possible optimizations |
Note: I am a little bit confused by the test even the develop branch version.
|
It is a printf bug |
de6ca89
to
c2c9b07
Compare
c2c9b07
to
cc40198
Compare
No description provided.