Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda error invalid device ordinal #443

Open
lcebaman opened this issue Jul 13, 2023 · 0 comments
Open

Cuda error invalid device ordinal #443

lcebaman opened this issue Jul 13, 2023 · 0 comments
Labels

Comments

@lcebaman
Copy link

lcebaman commented Jul 13, 2023

Describe the issue:

When running on more than 1 GPU (4 in the example here), I can see entries per each additional GPU:

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Code example:

mpirun -np 4 ./wrapper.sh Benchmark_ITT --mpi 1.1.1.4
$cat wrapper.sh
#!/bin/bash
lrank=$OMPI_COMM_WORLD_LOCAL_RANK   
                                                                                                                                                      
export OMP_NUM_THREADS=1                                                                                                                                                                  
case ${lrank} in                                                                                                                                                                          
    [0])                                                                                                                                                                                  
        GPU=0                                                                                                                                                                             
        CPUBIND="0-19"                                                                                                                                                                    
        ;;                                                                                                                                                                                
    [1])                                                                                                                                                                                  
        GPU=1                                                                                                                                                                             
        CPUBIND="20-39"                                                                                                                                                                   
        ;;                                                                                                                                                                                
    [2])                                                                                                                                                                                  
        GPU=2                                                                                                                                                                             
        CPUBIND="40-59"                                                                                                                                                                   
        ;;                                                                                                                                                                                
    [3])                                                                                                                                                                                  
        GPU=3                                                                                                                                                                             
        CPUBIND="50-79"                                                                                                                                                                   
        ;;                                                                                                                                                                                
esac                                                                                                                                                                                      
                                                                                                                                                                                          
CMD="env CUDA_VISIBLE_DEVICES=${GPU} numactl --physcpubind=${CPUBIND}"                                                                                                                    
echo "$CMD $@"                                                                                                                                                                            
                                                                                                                                                                                          
$CMD $@

Target platform:

Intel (40 cores/node) + 4xA100

Configure options:

../configure --enable-comms=mpi          \
             --enable-simd=GPU           \
             --enable-accelerator=cuda   \
             --prefix $prefix       \
             CXX=nvcc                    \
             LDFLAGS=-L$prefix/lib/ \
            CXXFLAGS="-ccbin mpicxx -gencode arch=compute_80,code=sm_80 -I$prefix/include/ -std=c++14"
@lcebaman lcebaman added the bug label Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant