Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird test failures #249

Open
chillenzer opened this issue Mar 27, 2024 · 3 comments
Open

Weird test failures #249

chillenzer opened this issue Mar 27, 2024 · 3 comments
Assignees

Comments

@chillenzer
Copy link
Contributor

chillenzer commented Mar 27, 2024

On the system at hand (HAL), the tests are all passing individually but fail if and only if "2D AccGpuCudaRt" and "3D AccGpuCudaRt" are both run in the same test run, so the following works

./tests <any test name>
./tests -f list.txt

whenever list.txt doesn't contain both of the above while the latter fails with any other list.txt, in particular

2D AccGpuCudaRt
3D AccGpuCudaRt

as a MWE. From what I can tell, neither the ordering of the two nor other tests in between change this observation. The error message from the ./tests is always

FAILED:
due to unexpected exception with message:
  /home/lenz93/workspace/src/mallocMC-makeReservationPolicyOwning/alpaka/
  include/alpaka/platform/PlatformUniformCudaHipRt.hpp(125) 'rc' A previous API
  call (not this one) set the error  : 'cudaErrorMisalignedAddress':
  'misaligned address'!

while

$ compute-sanitizer --show-backtrace=yes  --tool memcheck ./tests
========= COMPUTE-SANITIZER
========= Invalid __global__ atomic of size 4 bytes
=========     at atomicSub(unsigned int *, unsigned int)+0x4c0 in /opt/spack-modules/opt/spack/linux-ubuntu22.04-zen2/gcc-12.2.0/cuda-12.3.0-vhkdpnlaplsol7vzlt4wj55im3uemjkq/include/device_atomic_functions.hpp:122
=========     by thread (0,1,0) in block (0,3,1)
=========     Address 0x7ff15fe22051 is misaligned
=========     and is inside the nearest allocation at 0x7ff15fe00000 of size 1,048,576 bytes
=========     Device Frame:alpakaGlobal::AlpakaBuiltInAtomic<alpaka::AtomicSub, unsigned int, alpaka::hierarchy::Grids, void>::atomic(unsigned int *, unsigned int)+0x480 in /home/lenz93/workspace/src/mallocMC-makeReservationPolicyOwning/alpaka/include/alpaka/atomic/AtomicUniformCudaHip.hpp:161
[...]

(sometimes also out of bounds).

Gonna investigate further in the near future but any comments and hints are welcome!

@psychocoderHPC
Copy link
Member

Even if you execute twice the 2D or 3D case it will fail. The reason is that the distribution policy XMallocSIMD has a bug.

Chnaging

mallocMC::DistributionPolicies::XMallocSIMD<DistributionConfig>,
to mallocMC::DistributionPolicies::Noop, will fix the test.

@psychocoderHPC
Copy link
Member

The problem in the XMallocSIMD policy is how we communicate between threads in a warp.

threadcount = popc(ballot(coalescible));

Since Volta threads in a warp can diverge, we should use in warp communication instead of shared memory.
One problem is that alpakas in warp thread shuffel is a collective warp operation. So we need to use native CUDA/ROCm calls.
An example can be found in PIConGPUs aggregated atomic increment

https://github.com/ComputationalRadiationPhysics/picongpu/blob/7edbba8d21172f97cad0e6bd84fdb93098773844/include/pmacc/kernel/atomic.hpp#L199-L204

@chillenzer
Copy link
Contributor Author

Great catch! Thanks! Gonna try and fix it but not sure if I'll make it before the holidays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants