Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S4 intel environment is not working for MPI related ctests #1356

Open
srherbener opened this issue Oct 22, 2024 · 5 comments
Open

S4 intel environment is not working for MPI related ctests #1356

srherbener opened this issue Oct 22, 2024 · 5 comments
Assignees
Labels
bug Something is not working INFRA JEDI Infrastructure

Comments

@srherbener
Copy link
Collaborator

Describe the bug
The spack-stack-1.8.0 intel compiler unified-env appears to be working for building jedi-bundle, but I'm having trouble getting the MPI related ctests to work. When I run the SLURM script below, I get many test failues with messages like this:

28/2409 Testing: test_generic_unstructured_global_interpolator_parallel
28/2409 Test: test_generic_unstructured_global_interpolator_parallel
Command: "/usr/bin/srun" "-n" "4" "/data/users/sherbener/jedi/build/bin/test_oops_generic_global_interpolator" "test/testinput/unstructured_global_interpolator.yaml"
Directory: /data/users/sherbener/jedi/build/oops/src
"test_generic_unstructured_global_interpolator_parallel" start time: Oct 21 20:08 UTC
Output:
----------------------------------------------------------
[1729541309.956681] [s4-204-c2:1760273:0]          select.c:630  UCX  ERROR   no active messages transport to <no debug data>: self/memory - Destination is unreachable
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)............:
MPID_Init(1548)..................:
MPIDI_OFI_mpi_init_hook(1682)....:
insert_addr_table_roots_only(462): OFI get address vector map failed
In: PMI_Abort(1615247, Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)............:
MPID_Init(1548)..................:
MPIDI_OFI_mpi_init_hook(1682)....:
insert_addr_table_roots_only(462): OFI get address vector map failed)
[1729541309.956681] [s4-204-c2:1760276:0]          select.c:630  UCX  ERROR   no active messages transport to <no debug data>: self/memory - Destination is unreachable
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)............:
MPID_Init(1548)..................:
MPIDI_OFI_mpi_init_hook(1682)....:
insert_addr_table_roots_only(462): OFI get address vector map failed
In: PMI_Abort(1615247, Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)............:
MPID_Init(1548)..................:
MPIDI_OFI_mpi_init_hook(1682)....:
insert_addr_table_roots_only(462): OFI get address vector map failed)
[1729541309.956681] [s4-204-c2:1760277:0]          select.c:630  UCX  ERROR   no active messages transport to <no debug data>: self/memory - Destination is unreachable
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)............:
MPID_Init(1548)..................:
MPIDI_OFI_mpi_init_hook(1682)....:
insert_addr_table_roots_only(462): OFI get address vector map failed
In: PMI_Abort(1615247, Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)............:
MPID_Init(1548)..................:
MPIDI_OFI_mpi_init_hook(1682)....:
insert_addr_table_roots_only(462): OFI get address vector map failed)
PANIC: ::pthread_mutex_destroy(&mutex_) in  (/data/prod/jedi/spack-stack/spack-stack-1.8.0/cache/build_stage/jedipara/spack-stage-eckit-1.27.0-q65bkkvbbpfsb726w2wosbniooxv2c2j/spack-src/src/eckit/thread/Mutex.cc:31 ~Mutex)
PANIC: ::pthread_mutex_destroy(&mutex_) in  (/data/prod/jedi/spack-stack/spack-stack-1.8.0/cache/build_stage/jedipara/spack-stage-eckit-1.27.0-q65bkkvbbpfsb726w2wosbniooxv2c2j/spack-src/src/eckit/thread/Mutex.cc:31 ~Mutex)
----------------------------------------
BACKTRACE
----------------------------------------
PANIC: ::pthread_mutex_destroy(&mutex_) in  (/data/prod/jedi/spack-stack/spack-stack-1.8.0/cache/build_stage/jedipara/spack-stage-eckit-1.27.0-q65bkkvbbpfsb726w2wosbniooxv2c2j/spack-src/src/eckit/thread/Mutex.cc:31 ~Mutex)
...

It's quite possible that I've got something configured incorrectly. For example using srun instead of mipexec for the ctest system. This choice was based on trying a simple test on the compute nodes where I ran the following SLURM script:

#!/usr/bin/bash
#SBATCH --job-name=srh-ctest
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=4:00:00
#SBATCH --mail-user=stephenh@ucar.edu

# Insert the module purge and load statements in here
source /etc/bashrc
source /data/users/sherbener/jedi/setup.sh
module list
ulimit -s unlimited
ulimit -v unlimited

export SLURM_EXPORT_ENV=ALL
export HDF5_USE_FILE_LOCKING=FALSE
# Required for Intel so that serial jobs of MPI-enabled executables
# run without having to call them through mpiexec/mpirun
unset "${!SLURM@}"

cd /data/users/sherbener/jedi/build/oops/src
echo "****** srun *****"
srun -n 4 echo hello world

echo "****** mpiexec *****"
mpiexec -n 4 echo hello world

Which gave the following result:

****** srun *****
hello world
hello world
hello world
hello world
TOTCPU=00:00:32 ELAP=00:00:01 REQMEM=23500M REQCPUS=4 ALLOCCPUS=32 TIMELIMIT=01:00:00 PART=s4 ACCT=s
tar
MAXRSS=388K MAXVMSIZE=388K
________________________________________________________________
Job Resource Usage Summary for 6858

  CPU Time Used : 00:00:32
  Memory Used : 388K
  Virtual Memory Used : 388K
  Walltime Used : 00:00:01

  Memory Requested : 23500M (n=per node; c=per core)
  CPUs Requested / Allocated : 4 / 32
  Walltime Requested : 01:00:00

  Execution Queue : s4
  Head Node : 
  Charged to : star

  Job Stopped : Tue Oct 22 20:37:09 UTC 2024
_____________________________________________________________________

****** mpiexec *****
[mpiexec@s4-204-c1.s4] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on s4-204-c1.s4 (pid 2351096, exit code 256)
[mpiexec@s4-204-c1.s4] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@s4-204-c1.s4] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@s4-204-c1.s4] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1065): error waiting for event
[mpiexec@s4-204-c1.s4] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1026): error setting up the bootstrap proxies
[mpiexec@s4-204-c1.s4] Possible reasons:
[mpiexec@s4-204-c1.s4] 1. Host is unavailable. Please check that all hosts are available.
[mpiexec@s4-204-c1.s4] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[mpiexec@s4-204-c1.s4] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[mpiexec@s4-204-c1.s4] 4. slurm bootstrap cannot launch processes on remote host. You may try using -bootstrap option to select alternative launcher.

To Reproduce
Steps to reproduce the behavior:

Set up the intel environment by sourcing the following settings:

#!/bin/bash

echo "Loading EWOK-SKYLAB Environment Using Spack-Stack 1.8.0"

# load modules
module purge

module use /data/prod/jedi/spack-stack/spack-stack-1.8.0/envs/ue-intel-2021.10.0/install/modulefiles/Core
module load stack-intel/2021.10.0
module load stack-intel-oneapi-mpi/2021.10.0
module load stack-python/3.11.7
module unuse /opt/apps/modulefiles/Compiler/intel/23

# Load JEDI modules
module load jedi-fv3-env
module load ewok-env
module load soca-env

Run ecbuild for jedi-bundle, followed by make:

ecbuild -DPython3_EXECUTABLE=$(which python3) -DMPIEXEC_EXECUTABLE=$(which srun) -DMPIEXEC_NUMPROC_FLAG="-n" $JEDI_SRC

make -j8

Run the ctests on the compute nodes using the following SLURM script:

#!/usr/bin/bash
#SBATCH --job-name=srh-ctest
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --time=4:00:00
#SBATCH --mail-user=stephenh@ucar.edu

# Insert the module purge and load statements in here
source /etc/bashrc
source /data/users/sherbener/jedi/setup.sh
module list

ulimit -s unlimited
ulimit -v unlimited

export SLURM_EXPORT_ENV=ALL
export HDF5_USE_FILE_LOCKING=FALSE
# Required for Intel so that serial jobs of MPI-enabled executables
# run without having to call them through mpiexec/mpirun
unset "${!SLURM@}"

cd /data/users/sherbener/jedi/build
ctest
#ctest -R fv3
#ctest -R iodaconv

exit 0

Expected behavior
All tests including the MPI related ctests complete successfully (pass).

System:
What system(s) are you running the code on?
S4, Intel

Additional context

@srherbener srherbener added bug Something is not working INFRA JEDI Infrastructure labels Oct 22, 2024
@climbfuji
Copy link
Collaborator

climbfuji commented Oct 22, 2024

Please report this error to the SSEC helpdesk - we use their intel module, we don't compile the intel compiler or MPI library ourselves. However, a 3-minute internet search revealed this: openucx/ucx#4742

[dheinzeller@s4-submit ~]$ UCX_TLS=ud,sm,self srun -n 4 ./hello_world.x
Hello world from processor s4-204-c2.s4, rank 1 out of 4 processors
Hello world from processor s4-204-c2.s4, rank 2 out of 4 processors
Hello world from processor s4-204-c2.s4, rank 3 out of 4 processors
Hello world from processor s4-204-c2.s4, rank 0 out of 4 processors
TOTCPU=00:00:32 ELAP=00:00:01 REQMEM=0 REQCPUS=0 ALLOCCPUS=32 TIMELIMIT= PART= ACCT=jcsda
MAXRSS=0 MAXVMSIZE=44K
________________________________________________________________
Job Resource Usage Summary for 6925

  CPU Time Used : 00:00:32
  Memory Used : 0
  Virtual Memory Used : 44K
  Walltime Used : 00:00:01

  Memory Requested : 0 (n=per node; c=per core)
  CPUs Requested / Allocated : 0 / 32
  Walltime Requested :

  Execution Queue :
  Head Node :
  Charged to : jcsda

  Job Stopped : Tue Oct 22 22:43:09 UTC 2024
_____________________________________________________________________

Might be a good idea to let Jesse know - maybe he can set those vars or something equivalent in the Intel module so that we don't have to set them.

@InnocentSouopgui-NOAA
Copy link

I emailed Jesse.

@srherbener
Copy link
Collaborator Author

I emailed Jesse.

@InnocentSouopgui-NOAA, thanks for taking care of this

@srherbener
Copy link
Collaborator Author

Jesse recently modified the setting of FI_PROVIDER in the intel/2023.2 lua module script (setenv("FI_PROVIDER", "psm3")) which seems to help. The test failure I reported earlier is working now. I still had a bunch of test failures, but these appear due to another (unrelated to MPI) environment issue. I've repaired that and am trying to build and test from scratch again.

@srherbener
Copy link
Collaborator Author

With further testing I have discovered that the FI_PROVIDER="psm3" setting only offers intermittent success. Jesse mentioned when using FI_PROVIDER="verbs" this helped get WRF runs executing successfully. So, I tried FI_PROVIDER="verbs" with the jedi-bundle ctests, and all of the srun/MPI execution isses were cleared up! I think this is the solution for us, and I've asked Jesse if we can change the FI_PROVIDER setting to "verbs" in the intel/2023.2 lua script.

Note I still see 9 test failures, but in this run of ctest there were no MPI crashes:

The following tests FAILED:
        500 - saber_test_dirac_spectralb_from_CS_1-1 (Failed)
        1272 - test_ioda_bufr_python_encoder (Failed)
        1729 - ufo_test_tier1_test_ufo_qc_historycheck (Failed)
        1730 - ufo_test_tier1_test_ufo_qc_historycheck_mpi (Failed)
        1731 - ufo_test_tier1_test_ufo_qc_historycheck_unittests (Failed)
        1764 - ufo_test_tier1_test_ufo_qc_variableassignment (Failed)
        2318 - test_soca_linearization_error (Failed)
        2344 - test_mpasjedi_geometry (Failed)
        2396 - test_mpasjedi_lgetkf_height_vloc (Failed)
Errors while running CTest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working INFRA JEDI Infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants