Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased gsi.x and enkf.x executable wall times following Orion Rocky 9 upgrade #1166

Open
RussTreadon-NOAA opened this issue Jun 27, 2024 · 31 comments
Assignees
Labels
bug Something is not working NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center

Comments

@RussTreadon-NOAA
Copy link

Describe the bug
gsi.x and enkf.x wall times significantly increase following Orion Rocky 9 upgrade. See this comment in GSI issue #754 for details.

To Reproduce
Steps to reproduce the behavior:

  1. clone and build GSI source code in GSI PR #764 on Orion
  2. run ctests

Expected behavior
We expect Orion Rocky 9 gsi.x and enkf.x wall times to be comparable with what we observed when building and running on Orion Centos 7.

System:
Orion

Additional context
The modulefiles used to build gsi.x and enkf.x are gsi_orion.intel.lua and gsi_common.lua. gsi_orion.intel.lua sets

prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-rocky9/install/modulefiles/Core")

Ticket RDHPCS#2024062754000098 has been opened with the Orion helpdesk. Perhaps system settings changed with the Orion Rocky 9 upgrade.

@RussTreadon-NOAA
Copy link
Author

The MSU helpdesk replied. They have not experienced any slowdown in executable wall times following the Rocky 9 upgrade. There was one MSU user who's jobs ran slower but this was traced to an errant environment variable.

The GSI build uses the following modulefiles to build gsi.x and enkf.x on Orion

Were all these modules recompiled for Orion Rocky 9? Was there any change in build options? Do all the modules perform as efficiently on Rocky 9 and Centos 7?

@AlexanderRichert-NOAA
Copy link
Collaborator

We're using the same set of versions/build options as before. The only difference that jumps out at me is the change in compiler and MPI. You could try the 'unified-env-intel-2023.2.4' environment (also under spack-stack-1.6.0 on Orion) and see if that makes any difference: /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-intel-2023.2.4/install/modulefiles/Core
Do you have any way of getting insight into whether it could be filesystem vs. network fabric vs...? Such as any info on timing within jobs that could be compared with pre-Rocky runs?

@RussTreadon-NOAA
Copy link
Author

The GSI does not build using /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-intel-2023.2.4/install/modulefiles/Core. The GSI requires the gsi-addon-env to build. I confirmed this by trying unified-env-intel-2023.2.4. The module load failed with

orion-login-3:/work2/noaa/da/rtreadon/git/gsi/develop/modulefiles$ module load gsi_orion.intel.lua
Lmod has detected the following error:  These module(s) or extension(s) exist but cannot be loaded as
requested: "stack-intel/2021.9.0", "netcdf-fortran/4.6.1", "cmake/3.23.1", "python/3.11.6"
   Try: "module spider stack-intel/2021.9.0 netcdf-fortran/4.6.1 cmake/3.23.1 python/3.11.6" to see how to load the
module(s).

Executing this command requires loading "stack-intel/2021.9.0" which failed while processing the following
module(s):

    Module fullname  Module Filename
    ---------------  ---------------
    gsi_orion.intel  /work2/noaa/da/rtreadon/git/gsi/develop/modulefiles/gsi_orion.intel.luaExecuting this command requires loading "python/3.11.6" which failed while processing the following module(s):

    Module fullname  Module Filename
    ---------------  ---------------
    gsi_orion.intel  /work2/noaa/da/rtreadon/git/gsi/develop/modulefiles/gsi_orion.intel.luaExecuting this command requires loading "cmake/3.23.1" which failed while processing the following module(s):

    Module fullname  Module Filename
    ---------------  ---------------
    gsi_orion.intel  /work2/noaa/da/rtreadon/git/gsi/develop/modulefiles/gsi_orion.intel.luaExecuting this command requires loading "netcdf-fortran/4.6.1" which failed while processing the following
module(s):

    Module fullname  Module Filename
    ---------------  ---------------
    gsi_common       /work2/noaa/da/rtreadon/git/gsi/develop/modulefiles/gsi_common.lua
    gsi_orion.intel  /work2/noaa/da/rtreadon/git/gsi/develop/modulefiles/gsi_orion.intel.lua

We did not profile gsi.x and enkf.x on Orion Centos 7. Given that gsi.x and enkf.x run efficiently on Hercules, we did not expect slower running executables on Orion Rocky 9.

Watching stdout as gsi.x runs suggests that the slowdown occurs while reading either bufr observation dump files or netcdf background files (both read in parallel). This, however, is qualitative statement. We do not have quantitative performance statistics.

@AlexanderRichert-NOAA
Copy link
Collaborator

Right, sorry, forgot about the addon environment. Please try /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-intel-2023.2.4/install/modulefiles/Core
Note that the compiler and MPI versions will need to be updated to '2021.10.0'.

@RussTreadon-NOAA
Copy link
Author

@AlexanderRichert-NOAA , the following modifications were made to a copy of gsi_orion.intel.lua

@@ -1,11 +1,12 @@
 help([[
 ]])

-prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-roc
ky9/install/modulefiles/Core")
+--prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-r
ocky9/install/modulefiles/Core")
+prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-int
el-2023.2.4/install/modulefiles/Core")

 local stack_python_ver=os.getenv("stack_python_ver") or "3.11.6"
-local stack_intel_ver=os.getenv("stack_intel_ver") or "2021.9.0"
-local stack_impi_ver=os.getenv("stack_impi_ver") or "2021.9.0"
+local stack_intel_ver=os.getenv("stack_intel_ver") or "2021.10.0"
+local stack_impi_ver=os.getenv("stack_impi_ver") or "2021.10.0"
 local cmake_ver=os.getenv("cmake_ver") or "3.23.1"
 local prod_util_ver=os.getenv("prod_util_ver") or "2.1.1"

However, module load gsi_orion.intel fails with

orion-login-4:/work/noaa/da/rtreadon/git/gsi/test_orion/modulefiles$ module load gsi_orion.intel
Lmod has detected the following error:  Cannot load module "stack-intel/2021.10.0" without these
module(s) loaded:
   icc/2023.2.4

While processing the following module(s):
    Module fullname        Module Filename
    ---------------        ---------------
    stack-intel/2021.10.0  /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-intel-2023.2.4/install/modulefiles/Core/stack-intel/2021.10.0.lua
    gsi_orion.intel        /work/noaa/da/rtreadon/git/gsi/test_orion/modulefiles/gsi_orion.intel.lua

Adding load("icc/2023.2.4") before the stack-intel load did not help. module spider icc returns Unable to find: "icc".

@RussTreadon-NOAA
Copy link
Author

Add timers to gsi source code. Comparison of Orion and Hercules gsi.x timings show greatest wall time difference when processing observations from bufr files

Orion

timer stats: read_obs:finish read obs loop avg   142.585     min   66.6217     max   551.629

Hercules

timer stats: read_obs:finish read obs loop avg   37.0401     min   18.3169     max   144.775

See GSI issue #771 for details

@AlexanderRichert-NOAA
Copy link
Collaborator

Sorry about that, please try again. I forgot to add a line to the stack-intel module.

@RussTreadon-NOAA
Copy link
Author

@AlexanderRichert-NOAA , got a bit further but still encountered an error. Below is what I see

orion-login-2:/work/noaa/da/rtreadon/git/gsi/test_orion/ush$ ./build.sh
++++ readlink -f -n ./build.sh
+++ dirname /work/noaa/da/rtreadon/git/gsi/test_orion/ush/build.sh
++ cd /work/noaa/da/rtreadon/git/gsi/test_orion/ush/..
++ pwd -P
+ readonly DIR_ROOT=/work/noaa/da/rtreadon/git/gsi/test_orion
+ DIR_ROOT=/work/noaa/da/rtreadon/git/gsi/test_orion
+ BUILD_VERBOSE=YES
+ BUILD_TYPE=Release
+ CMAKE_OPTS=
+ COMPILER=intel
+ BUILD_DIR=/work/noaa/da/rtreadon/git/gsi/test_orion/build
+ INSTALL_PREFIX=/work/noaa/da/rtreadon/git/gsi/test_orion/install
+ GSI_MODE=Regional
+ ENKF_MODE=GFS
+ REGRESSION_TESTS=YES
+ source /work/noaa/da/rtreadon/git/gsi/test_orion/ush/detect_machine.sh
++ [[ -n '' ]]
++ case $(hostname -f) in
+++ hostname -f
++ MACHINE_ID=orion
++ [[ orion == \U\N\K\N\O\W\N ]]
++ MACHINE_ID=orion
++ [[ orion != \U\N\K\N\O\W\N ]]
++ return
+ set +x
Loading icc version 2023.2.4
Loading compiler-rt version 2023.2.4

Currently Loaded Modules:
  1) compiler-rt/2023.2.4              12) sqlite/3.43.2           23) openblas/0.3.24       34) wrf-io/1.2.0
  2) icc/2023.2.4                      13) util-linux-uuid/2.38.1  24) py-setuptools/63.4.3  35) ncio/1.1.2
  3) stack-intel/2021.10.0             14) python/3.11.6           25) py-numpy/1.23.4       36) crtm-fix/2.4.0.1_emc
  4) mpi/2021.10.0                     15) nghttp2/1.57.0          26) bufr/11.7.0           37) git-lfs/3.1.2
  5) stack-intel-oneapi-mpi/2021.10.0  16) curl/8.4.0              27) bacio/2.4.1           38) crtm/2.4.0.1
  6) zlib/1.2.13                       17) cmake/3.23.1            28) w3emc/2.10.0          39) gsi-ncdiag/1.1.2
  7) pigz/2.7                          18) snappy/1.1.10           29) sp/2.5.0              40) gsi_common
  8) zstd/1.5.2                        19) c-blosc/1.21.5          30) ip/4.3.0              41) prod_util/2.1.1
  9) tar/1.34                          20) hdf5/1.14.0             31) sigio/2.3.2           42) intel-oneapi-mkl/2022.2.1
 10) gettext/0.21.1                    21) netcdf-c/4.9.2          32) sfcio/1.4.1           43) gsi_orion.intel
 11) libxcrypt/4.4.35                  22) netcdf-fortran/4.6.1    33) nemsio/2.5.4



+ CONTROLPATH=/work/noaa/da/rtreadon/git/gsi/test_orion/../develop/install/bin
+ CMAKE_OPTS+=' -DCMAKE_BUILD_TYPE=Release'
+ CMAKE_OPTS+=' -DCMAKE_INSTALL_PREFIX=/work/noaa/da/rtreadon/git/gsi/test_orion/install'
+ CMAKE_OPTS+=' -DGSI_MODE=Regional -DENKF_MODE=GFS'
+ [[ YES =~ [yYtT] ]]
+ CMAKE_OPTS+=' -DBUILD_REG_TESTING=ON -DCONTROLPATH=/work/noaa/da/rtreadon/git/gsi/test_orion/../develop/install/bin'
+ [[ YES =~ [yYtT] ]]
+ rm -rf /work/noaa/da/rtreadon/git/gsi/test_orion/build
+ mkdir -p /work/noaa/da/rtreadon/git/gsi/test_orion/build
+ cd /work/noaa/da/rtreadon/git/gsi/test_orion/build
+ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/work/noaa/da/rtreadon/git/gsi/test_orion/install -DGSI_MODE=Regional -DENKF_MODE=GFS -DBUILD_REG_TESTING=ON -DCONTROLPATH=/work/noaa/da/rtreadon/git/gsi/test_orion/../develop/install/bin /work/noaa/da/rtreadon/git/gsi/test_orion
-- The C compiler identification is Intel 2021.10.0.20230609
-- The Fortran compiler identification is Intel 2021.10.0.20230609
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /work/noaa/epic/role-epic/spack-stack/orion/compilers/oneapi/compiler/2023.2.4/linux/bin/intel64/icc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /work/noaa/epic/role-epic/spack-stack/orion/compilers/oneapi/compiler/2023.2.4/linux/bin/intel64/ifort - skipped
-- OPENMP ................. OFF
-- ENABLE_MKL ............. ON
-- BUILD_GSDCLOUD ......... OFF
-- BUILD_MGBF ............. ON
-- BUILD_GSI .............. ON
-- BUILD_ENKF ............. ON
-- BUILD_REG_TESTING ...... ON
-- Building MGBF library
-- Building GSI
-- GSI: OPENMP ................. OFF
-- GSI: ENABLE_MKL ............. ON
-- GSI: USE_GSDCLOUD ........... OFF
-- GSI: USE_MGBF ............... ON
-- GSI: GSI_MODE ............... Regional
-- Found MKL: /apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2022.2.1-ifnryw36ggh5z5fxmdn2w3n5syxhvnr5/mkl/2022.2.1/lib/intel64/libmkl_intel_lp64.so;/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2022.2.1-ifnryw36ggh5z5fxmdn2w3n5syxhvnr5/mkl/2022.2.1/lib/intel64/libmkl_sequential.so;/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2022.2.1-ifnryw36ggh5z5fxmdn2w3n5syxhvnr5/mkl/2022.2.1/lib/intel64/libmkl_core.so
-- Checking for module 'mpi-c'
--   Package 'mpi-c', required by 'virtual:world', not found
-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS)
-- Checking for module 'mpi-fort'
--   Package 'mpi-fort', required by 'virtual:world', not found
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_LIB_NAMES MPI_Fortran_F77_HEADER_DIR MPI_Fortran_MODULE_DIR MPI_Fortran_WORKS)
CMake Error at /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-intel-2023.2.4/install/intel/2021.10.0/cmake-3.23.1-ovxmp4p/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND MPI_Fortran_FOUND)
Call Stack (most recent call first):
  /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-intel-2023.2.4/install/intel/2021.10.0/cmake-3.23.1-ovxmp4p/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-intel-2023.2.4/install/intel/2021.10.0/cmake-3.23.1-ovxmp4p/share/cmake-3.23/Modules/FindMPI.cmake:1830 (find_package_handle_standard_args)
  src/gsi/CMakeLists.txt:60 (find_package)


-- Configuring incomplete, errors occurred!
See also "/work/noaa/da/rtreadon/git/gsi/test_orion/build/CMakeFiles/CMakeOutput.log".
See also "/work/noaa/da/rtreadon/git/gsi/test_orion/build/CMakeFiles/CMakeError.log".

The modified module file is orion-login-2:/work/noaa/da/rtreadon/git/gsi/test_orion/modulefiles/gsi_orion.intel.lua. It contains the following local changes

@@ -1,11 +1,12 @@
 help([[
 ]])

-prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-rocky9/install/modulefiles/Core")
+--prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-rocky9/install/modulefiles/Core")
+prepend_path("MODULEPATH", "/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-intel-2023.2.4/install/modulefiles/Co
re")

 local stack_python_ver=os.getenv("stack_python_ver") or "3.11.6"
-local stack_intel_ver=os.getenv("stack_intel_ver") or "2021.9.0"
-local stack_impi_ver=os.getenv("stack_impi_ver") or "2021.9.0"
+local stack_intel_ver=os.getenv("stack_intel_ver") or "2021.10.0"
+local stack_impi_ver=os.getenv("stack_impi_ver") or "2021.10.0"
 local cmake_ver=os.getenv("cmake_ver") or "3.23.1"
 local prod_util_ver=os.getenv("prod_util_ver") or "2.1.1"

@RussTreadon-NOAA
Copy link
Author

dclock calls added to read_obs.F90 indicate that reading bufr files on Orion takes 3 to 4x longer than the same bufr files on Hercules. Why would processing bufr files be 3 to 4x slower on Orion than Hercules?

Below are wall times in seconds for indicated bufr observation reads on Orion and Hercules.

Orion

    21: timer stats: READ_CRIS                28.6474
    22: timer stats: READ_CRIS                28.6474
    26: timer stats: READ_CRIS                28.6474
    23: timer stats: READ_CRIS                28.6474
    29: timer stats: READ_CRIS                28.6474
    16: timer stats: READ_CRIS                28.6468
    31: timer stats: READ_CRIS                28.6474
    17: timer stats: READ_CRIS                28.6474
    24: timer stats: READ_CRIS                28.6474
    18: timer stats: READ_CRIS                28.6474
    25: timer stats: READ_CRIS                28.6474
    19: timer stats: READ_CRIS                28.6474
    27: timer stats: READ_CRIS                28.6474
    20: timer stats: READ_CRIS                28.6476
    28: timer stats: READ_CRIS                28.6474
    30: timer stats: READ_CRIS                28.6474
    63: timer stats: READ_IASI                54.4328
    62: timer stats: READ_IASI                54.4329
    55: timer stats: READ_IASI                54.4332
    61: timer stats: READ_IASI                54.4330
    59: timer stats: READ_IASI                54.4332
    60: timer stats: READ_IASI                54.4330
    49: timer stats: READ_IASI                54.4333
    50: timer stats: READ_IASI                54.4332
    51: timer stats: READ_IASI                54.4333
    52: timer stats: READ_IASI                54.4333
    53: timer stats: READ_IASI                54.4333
    54: timer stats: READ_IASI                54.4332
    56: timer stats: READ_IASI                54.4332
    57: timer stats: READ_IASI                54.4332
    58: timer stats: READ_IASI                54.4331
    48: timer stats: READ_IASI                54.4341
    15: timer stats: READ_IASI                57.7201
    14: timer stats: READ_IASI                57.7204
    13: timer stats: READ_IASI                57.7207
    12: timer stats: READ_IASI                57.7209
    10: timer stats: READ_IASI                57.7245
    11: timer stats: READ_IASI                57.7212
     8: timer stats: READ_IASI                57.7249
     9: timer stats: READ_IASI                57.7249
     1: timer stats: READ_IASI                57.7212
     2: timer stats: READ_IASI                57.7213
     3: timer stats: READ_IASI                57.7241
     4: timer stats: READ_IASI                57.7241
     5: timer stats: READ_IASI                57.7245
     6: timer stats: READ_IASI                57.7250
     7: timer stats: READ_IASI                57.7241
     0: timer stats: READ_IASI                57.7223
    23: timer stats: READ_AVHRR               40.0085
    22: timer stats: READ_AVHRR               40.0086
    19: timer stats: READ_AVHRR               40.0086
    21: timer stats: READ_AVHRR               40.0086
    18: timer stats: READ_AVHRR               40.0086
    20: timer stats: READ_AVHRR               40.0086
    17: timer stats: READ_AVHRR               40.0086
    16: timer stats: READ_AVHRR               40.0086
    31: timer stats: READ_AVHRR               45.7324
    29: timer stats: READ_AVHRR               45.7325
    26: timer stats: READ_AVHRR               45.7326
    27: timer stats: READ_AVHRR               45.7325
    30: timer stats: READ_AVHRR               45.7325
    28: timer stats: READ_AVHRR               45.7327
    25: timer stats: READ_AVHRR               45.7328
    24: timer stats: READ_AVHRR               45.7328
    47: timer stats: READ_CRIS                87.6466
    46: timer stats: READ_CRIS                87.6467
    33: timer stats: READ_CRIS                87.6470
    45: timer stats: READ_CRIS                87.6468
    34: timer stats: READ_CRIS                87.6469
    44: timer stats: READ_CRIS                87.6468
    35: timer stats: READ_CRIS                87.6469
    43: timer stats: READ_CRIS                87.6468
    42: timer stats: READ_CRIS                87.6470
    41: timer stats: READ_CRIS                87.6469
    40: timer stats: READ_CRIS                87.6470
    38: timer stats: READ_CRIS                87.6470
    39: timer stats: READ_CRIS                87.6470
    36: timer stats: READ_CRIS                87.6470
    37: timer stats: READ_CRIS                87.6472
    32: timer stats: READ_CRIS                87.6475
    95: timer stats: READ_VIIRS               119.945
    94: timer stats: READ_VIIRS               119.945
    93: timer stats: READ_VIIRS               119.946
    91: timer stats: READ_VIIRS               119.946
    90: timer stats: READ_VIIRS               119.946
    89: timer stats: READ_VIIRS               119.946
    81: timer stats: READ_VIIRS               119.946
    86: timer stats: READ_VIIRS               119.946
    82: timer stats: READ_VIIRS               119.946
    92: timer stats: READ_VIIRS               119.946
    83: timer stats: READ_VIIRS               119.946
    85: timer stats: READ_VIIRS               119.946
    88: timer stats: READ_VIIRS               119.946
    87: timer stats: READ_VIIRS               119.946
    84: timer stats: READ_VIIRS               119.946
    80: timer stats: READ_VIIRS               119.946
    93: timer stats: READ_OZONE              0.353938
    79: timer stats: READ_VIIRS               121.121
    78: timer stats: READ_VIIRS               121.122
    76: timer stats: READ_VIIRS               121.122
    75: timer stats: READ_VIIRS               121.122
    77: timer stats: READ_VIIRS               121.122
    74: timer stats: READ_VIIRS               121.122
    71: timer stats: READ_VIIRS               121.122
    73: timer stats: READ_VIIRS               121.122
    72: timer stats: READ_VIIRS               121.122
    67: timer stats: READ_VIIRS               121.122
    68: timer stats: READ_VIIRS               121.122
    66: timer stats: READ_VIIRS               121.122
    69: timer stats: READ_VIIRS               121.122
    70: timer stats: READ_VIIRS               121.122
    65: timer stats: READ_VIIRS               121.122
    64: timer stats: READ_VIIRS               121.123
    85: timer stats: READ_OZONE               1.62620
    89: timer stats: READ_OZONE               1.98980
    94: timer stats: READ_OZONE               2.11263
    81: timer stats: READ_BUFRTOVS            2.68223
    75: timer stats: READ_PREPBUFR            4.38559
    79: timer stats: READ_NSSTBUFR            5.04728
    47: timer stats: READ_AVHRR               40.6250
    46: timer stats: READ_AVHRR               40.6249
    45: timer stats: READ_AVHRR               40.6250
    44: timer stats: READ_AVHRR               40.6249
    42: timer stats: READ_AVHRR               40.6249
    43: timer stats: READ_AVHRR               40.6249
    41: timer stats: READ_AVHRR               40.6248
    40: timer stats: READ_AVHRR               40.6250
    82: timer stats: READ_BUFRTOVS            8.67870
    86: timer stats: READ_BUFRTOVS            8.98850
    65: timer stats: READ_BUFRTOVS            8.09661
    64: timer stats: READ_BUFRTOVS            8.09578
    69: timer stats: READ_BUFRTOVS            8.43694
    68: timer stats: READ_BUFRTOVS            8.43661
    67: timer stats: READ_BUFRTOVS            8.61997
    66: timer stats: READ_BUFRTOVS            8.60470
    80: timer stats: READ_GPS                 9.92920
    36: timer stats: READ_AVHRR               43.3248
    39: timer stats: READ_AVHRR               43.3251
    35: timer stats: READ_AVHRR               43.3250
    38: timer stats: READ_AVHRR               43.3252
    33: timer stats: READ_AVHRR               43.3251
    37: timer stats: READ_AVHRR               43.3253
    34: timer stats: READ_AVHRR               43.3251
    32: timer stats: READ_AVHRR               43.3242
    88: timer stats: READ_BUFRTOVS            11.3951
    74: timer stats: READ_PREPBUFR            11.5022
    77: timer stats: READ_PREPBUFR            13.0966
    72: timer stats: READ_PREPBUFR            13.3955
    15: timer stats: READ_CRIS                85.2315
    14: timer stats: READ_CRIS                85.2314
    12: timer stats: READ_CRIS                85.2309
    13: timer stats: READ_CRIS                85.2311
     6: timer stats: READ_CRIS                85.2307
     7: timer stats: READ_CRIS                85.2307
     8: timer stats: READ_CRIS                85.2307
     9: timer stats: READ_CRIS                85.2307
    10: timer stats: READ_CRIS                85.2307
    11: timer stats: READ_CRIS                85.2307
     1: timer stats: READ_CRIS                85.2307
     2: timer stats: READ_CRIS                85.2307
     3: timer stats: READ_CRIS                85.2307
     4: timer stats: READ_CRIS                85.2307
     5: timer stats: READ_CRIS                85.2307
     0: timer stats: READ_CRIS                85.2297
     1: timer stats: READ_OZONE              0.433676
    73: timer stats: READ_PREPBUFR            23.1917
    70: timer stats: READ_PREPBUFR            23.2368
    71: timer stats: READ_PREPBUFR            23.2690
     7: timer stats: READ_OZONE               1.97127
     3: timer stats: READ_BUFRTOVS            9.23574
    76: timer stats: READ_PREPBUFR            36.9617
    62: timer stats: READ_ABI                 111.597
    58: timer stats: READ_ABI                 111.597
    60: timer stats: READ_ABI                 111.598
    59: timer stats: READ_ABI                 111.597
    63: timer stats: READ_ABI                 111.598
    57: timer stats: READ_ABI                 111.597
    61: timer stats: READ_ABI                 111.598
    56: timer stats: READ_ABI                 111.597
    50: timer stats: READ_ABI                 116.369
    51: timer stats: READ_ABI                 116.369
    52: timer stats: READ_ABI                 116.369
    53: timer stats: READ_ABI                 116.369
    54: timer stats: READ_ABI                 116.369
    55: timer stats: READ_ABI                 116.369
    49: timer stats: READ_ABI                 116.369
    48: timer stats: READ_ABI                 116.366
     4: timer stats: READ_AHI                 30.5457
     2: timer stats: READ_OZONE               35.9112
     0: timer stats: READ_OZONE               35.9427
     6: timer stats: READ_OZONE               36.9671
    91: timer stats: READ_ATMS                115.133
    90: timer stats: READ_ATMS                116.143
     5: timer stats: READ_ATMS                115.214
    84: timer stats: READ_SSMIS               155.442
    83: timer stats: READ_SSMIS               172.926
    92: timer stats: READ_GMI                 318.892
    78: timer stats: READ_SATWND              435.596

Hercules

    25: timer stats: READ_CRIS                7.89926
    18: timer stats: READ_CRIS                7.89897
    24: timer stats: READ_CRIS                7.89927
    20: timer stats: READ_CRIS                7.89895
    26: timer stats: READ_CRIS                7.89927
    23: timer stats: READ_CRIS                7.89895
    27: timer stats: READ_CRIS                7.89926
    16: timer stats: READ_CRIS                7.89836
    28: timer stats: READ_CRIS                7.89924
    17: timer stats: READ_CRIS                7.89899
    29: timer stats: READ_CRIS                7.89933
    19: timer stats: READ_CRIS                7.89900
    30: timer stats: READ_CRIS                7.89930
    21: timer stats: READ_CRIS                7.89901
    31: timer stats: READ_CRIS                7.89931
    22: timer stats: READ_CRIS                7.89900
    63: timer stats: READ_IASI                14.2766
    58: timer stats: READ_IASI                14.2768
    60: timer stats: READ_IASI                14.2767
    59: timer stats: READ_IASI                14.2769
    61: timer stats: READ_IASI                14.2767
    56: timer stats: READ_IASI                14.2770
    62: timer stats: READ_IASI                14.2766
    53: timer stats: READ_IASI                14.2770
    54: timer stats: READ_IASI                14.2770
    55: timer stats: READ_IASI                14.2770
    57: timer stats: READ_IASI                14.2769
    49: timer stats: READ_IASI                14.2771
    50: timer stats: READ_IASI                14.2770
    51: timer stats: READ_IASI                14.2771
    52: timer stats: READ_IASI                14.2771
    48: timer stats: READ_IASI                14.2615
    15: timer stats: READ_IASI                15.1522
    14: timer stats: READ_IASI                15.1524
    13: timer stats: READ_IASI                15.1525
    12: timer stats: READ_IASI                15.1524
    10: timer stats: READ_IASI                15.1559
    11: timer stats: READ_IASI                15.1526
     1: timer stats: READ_IASI                15.1526
     2: timer stats: READ_IASI                15.1527
     3: timer stats: READ_IASI                15.1552
     4: timer stats: READ_IASI                15.1559
     5: timer stats: READ_IASI                15.1554
     6: timer stats: READ_IASI                15.1552
     7: timer stats: READ_IASI                15.1555
     8: timer stats: READ_IASI                15.1561
     9: timer stats: READ_IASI                15.1555
     0: timer stats: READ_IASI                15.1533
    23: timer stats: READ_AVHRR               10.2355
    19: timer stats: READ_AVHRR               10.2356
    21: timer stats: READ_AVHRR               10.2355
    22: timer stats: READ_AVHRR               10.2355
    17: timer stats: READ_AVHRR               10.2356
    18: timer stats: READ_AVHRR               10.2356
    20: timer stats: READ_AVHRR               10.2356
    16: timer stats: READ_AVHRR               10.2356
    31: timer stats: READ_AVHRR               11.8628
    26: timer stats: READ_AVHRR               11.8629
    27: timer stats: READ_AVHRR               11.8629
    28: timer stats: READ_AVHRR               11.8629
    29: timer stats: READ_AVHRR               11.8629
    30: timer stats: READ_AVHRR               11.8628
    25: timer stats: READ_AVHRR               11.8630
    24: timer stats: READ_AVHRR               11.8553
    47: timer stats: READ_CRIS                23.1123
    34: timer stats: READ_CRIS                23.1128
    44: timer stats: READ_CRIS                23.1125
    35: timer stats: READ_CRIS                23.1128
    45: timer stats: READ_CRIS                23.1125
    33: timer stats: READ_CRIS                23.1130
    46: timer stats: READ_CRIS                23.1124
    41: timer stats: READ_CRIS                23.1127
    42: timer stats: READ_CRIS                23.1126
    43: timer stats: READ_CRIS                23.1125
    36: timer stats: READ_CRIS                23.1127
    37: timer stats: READ_CRIS                23.1127
    38: timer stats: READ_CRIS                23.1128
    39: timer stats: READ_CRIS                23.1127
    40: timer stats: READ_CRIS                23.1127
    32: timer stats: READ_CRIS                23.1135
    79: timer stats: READ_VIIRS               30.1726
    76: timer stats: READ_VIIRS               30.1728
    65: timer stats: READ_VIIRS               30.1726
    77: timer stats: READ_VIIRS               30.1727
    71: timer stats: READ_VIIRS               30.1727
    78: timer stats: READ_VIIRS               30.1727
    75: timer stats: READ_VIIRS               30.1727
    66: timer stats: READ_VIIRS               30.1727
    72: timer stats: READ_VIIRS               30.1729
    67: timer stats: READ_VIIRS               30.1726
    74: timer stats: READ_VIIRS               30.1729
    68: timer stats: READ_VIIRS               30.1726
    73: timer stats: READ_VIIRS               30.1730
    69: timer stats: READ_VIIRS               30.1727
    70: timer stats: READ_VIIRS               30.1727
    64: timer stats: READ_VIIRS               30.1731
    92: timer stats: READ_VIIRS               30.5467
    89: timer stats: READ_VIIRS               30.5468
    90: timer stats: READ_VIIRS               30.5468
    91: timer stats: READ_VIIRS               30.5468
    93: timer stats: READ_VIIRS               30.5468
    94: timer stats: READ_VIIRS               30.5467
    95: timer stats: READ_VIIRS               30.5469
    84: timer stats: READ_VIIRS               30.5470
    85: timer stats: READ_VIIRS               30.5469
    86: timer stats: READ_VIIRS               30.5469
    87: timer stats: READ_VIIRS               30.5469
    88: timer stats: READ_VIIRS               30.5468
    81: timer stats: READ_VIIRS               30.5471
    82: timer stats: READ_VIIRS               30.5470
    83: timer stats: READ_VIIRS               30.5470
    80: timer stats: READ_VIIRS               30.5475
    93: timer stats: READ_OZONE              0.115166
    94: timer stats: READ_OZONE              0.517453
    85: timer stats: READ_OZONE              0.587360
    89: timer stats: READ_OZONE              0.649517
    81: timer stats: READ_BUFRTOVS           0.829050
    75: timer stats: READ_PREPBUFR            1.47294
    79: timer stats: READ_NSSTBUFR            1.90115
    65: timer stats: READ_BUFRTOVS            2.64807
    64: timer stats: READ_BUFRTOVS            2.64672
    69: timer stats: READ_BUFRTOVS            2.66557
    68: timer stats: READ_BUFRTOVS            2.66498
    67: timer stats: READ_BUFRTOVS            2.81424
    66: timer stats: READ_BUFRTOVS            2.79706
    82: timer stats: READ_BUFRTOVS            2.49398
    86: timer stats: READ_BUFRTOVS            2.60691
    44: timer stats: READ_AVHRR               10.2207
    45: timer stats: READ_AVHRR               10.2207
    46: timer stats: READ_AVHRR               10.2207
    47: timer stats: READ_AVHRR               10.2208
    41: timer stats: READ_AVHRR               10.2206
    42: timer stats: READ_AVHRR               10.2206
    43: timer stats: READ_AVHRR               10.2206
    40: timer stats: READ_AVHRR               10.2208
    36: timer stats: READ_AVHRR               11.0368
    35: timer stats: READ_AVHRR               11.0369
    38: timer stats: READ_AVHRR               11.0368
    34: timer stats: READ_AVHRR               11.0372
    39: timer stats: READ_AVHRR               11.0368
    37: timer stats: READ_AVHRR               11.0369
    33: timer stats: READ_AVHRR               11.0373
    32: timer stats: READ_AVHRR               11.0363
    80: timer stats: READ_GPS                 3.62201
    88: timer stats: READ_BUFRTOVS            3.62867
    74: timer stats: READ_PREPBUFR            4.07518
    72: timer stats: READ_PREPBUFR            4.60147
    77: timer stats: READ_PREPBUFR            5.14285
    15: timer stats: READ_CRIS                21.6790
    13: timer stats: READ_CRIS                21.6790
     6: timer stats: READ_CRIS                21.6788
    14: timer stats: READ_CRIS                21.6790
     7: timer stats: READ_CRIS                21.6788
    12: timer stats: READ_CRIS                21.6789
     8: timer stats: READ_CRIS                21.6789
     9: timer stats: READ_CRIS                21.6788
    10: timer stats: READ_CRIS                21.6789
    11: timer stats: READ_CRIS                21.6789
     1: timer stats: READ_CRIS                21.6788
     2: timer stats: READ_CRIS                21.6788
     3: timer stats: READ_CRIS                21.6788
     4: timer stats: READ_CRIS                21.6788
     5: timer stats: READ_CRIS                21.6788
     0: timer stats: READ_CRIS                21.6780
     1: timer stats: READ_OZONE              0.129847
     7: timer stats: READ_OZONE              0.651767
    73: timer stats: READ_PREPBUFR            8.52165
    71: timer stats: READ_PREPBUFR            8.64337
    70: timer stats: READ_PREPBUFR            8.68001
     3: timer stats: READ_BUFRTOVS            2.54108
    60: timer stats: READ_ABI                 26.8622
    59: timer stats: READ_ABI                 26.8622
    62: timer stats: READ_ABI                 26.8623
    57: timer stats: READ_ABI                 26.8621
    61: timer stats: READ_ABI                 26.8623
    58: timer stats: READ_ABI                 26.8622
    63: timer stats: READ_ABI                 26.8623
    56: timer stats: READ_ABI                 26.8621
    49: timer stats: READ_ABI                 27.4099
    50: timer stats: READ_ABI                 27.4099
    51: timer stats: READ_ABI                 27.4099
    52: timer stats: READ_ABI                 27.4099
    53: timer stats: READ_ABI                 27.4099
    54: timer stats: READ_ABI                 27.4100
    55: timer stats: READ_ABI                 27.4100
    48: timer stats: READ_ABI                 27.4071
     4: timer stats: READ_AHI                 6.75684
    76: timer stats: READ_PREPBUFR            14.2980
     0: timer stats: READ_OZONE               10.9982
     2: timer stats: READ_OZONE               11.0451
     6: timer stats: READ_OZONE               11.4480
    91: timer stats: READ_ATMS                33.2561
    90: timer stats: READ_ATMS                33.5066
    84: timer stats: READ_SSMIS               36.9816
     5: timer stats: READ_ATMS                33.2350
    83: timer stats: READ_SSMIS               44.8104
    92: timer stats: READ_GMI                 76.9536
    78: timer stats: READ_SATWND              116.393

@RussTreadon-NOAA
Copy link
Author

As a test compile same GSI code on Hera and Dogwood (WCOSS2) and run gsi.x using global_4denvar ctest. Processing of the satellite wind bufr file, read_satwnd, took the most time out of all the bufr readers. Comparison of timings across the machines show timings for Hercules, Hera, and Dogwood are comparable. Orion is the outlier

machine total read_obs time (s) read_satwnd time (s) total gsi.x wall time (s)
dogwood 169.988 120.264 403.800888
hera 194.612 137.553 466.024682
hercules 164.571 116.393 444.405860
orion 608.372 435.489 964.591370

The same comment has been added to GSI issue #771

@RussTreadon-NOAA
Copy link
Author

RussTreadon-NOAA commented Jul 9, 2024

@jbathegit, @jack-woollen, @KateFriedman-NOAA, @DavidHuber-NOAA : have any of you noticed an increase in the time it takes to read bufr files on Orion following the Rocky 9 upgrade?

gsi.x is seeing significant increases in the time it takes to read global bufr dump files on Orion after the Rocky 9 upgrade. FYI, gsi.x is built using bufr/11.7.0 on all machines.

See GSI issue #771 for additional details.

@RussTreadon-NOAA
Copy link
Author

bufr/12.0.1 test
Recompile gsi.x on Orion using bufr/12.0.1. Rerun global_4denvar ctest. The gsi.x wall time was 1143.031401 seconds. This is even slower than the bufr/11.7.0 build. read_obs took 771.891 seconds (read_satwnd accounts for 562.736 s).

@RussTreadon-NOAA
Copy link
Author

debufr test

Use bufr utility debufr to decode 5 gdas bufr dump files from the global_4denvar ctests. debufr was run on WCOSS2 (Dogwood), Hera, Hercules, and Orion. debufr timings are 3x to 4x greater on Orion than other machines. Open NCEPLIBS-bufr issue #608 to document this behavior.

@DavidHuber-NOAA
Copy link
Collaborator

@AlexanderRichert-NOAA suggested compiling the GSI and ENKF on Hercules then running the ctests on Orion with those executables. After compiling, the gsi_common.lua was modified to point to the Hercules BUFR installation, then ctests were run. Runtimes were largely the same as reported by @RussTreadon-NOAA:

runtime.global_4denvar_hiproc_contrl.txt: The total amount of wall time = 1394.165947
runtime.global_4denvar_hiproc_updat.txt: The total amount of wall time = 751.449358
runtime.global_4denvar_loproc_contrl.txt: The total amount of wall time = 1059.320163
runtime.global_4denvar_loproc_updat.txt: The total amount of wall time = 986.416513
runtime.global_enkf_hiproc_contrl.txt: The total amount of wall time = 159.291133
runtime.global_enkf_hiproc_updat.txt: The total amount of wall time = 160.180497
runtime.global_enkf_loproc_contrl.txt: The total amount of wall time = 204.069604
runtime.global_enkf_loproc_updat.txt: The total amount of wall time = 186.185513
runtime.hafs_3denvar_hybens_hiproc_contrl.txt: The total amount of wall time = 635.210342
runtime.hafs_3denvar_hybens_hiproc_updat.txt: The total amount of wall time = 625.259733
runtime.hafs_3denvar_hybens_loproc_contrl.txt: The total amount of wall time = 719.252038
runtime.hafs_3denvar_hybens_loproc_updat.txt: The total amount of wall time = 664.469968
runtime.hafs_4denvar_glbens_hiproc_contrl.txt: The total amount of wall time = 686.031241
runtime.hafs_4denvar_glbens_hiproc_updat.txt: The total amount of wall time = 677.697564
runtime.hafs_4denvar_glbens_loproc_contrl.txt: The total amount of wall time = 804.612061
runtime.hafs_4denvar_glbens_loproc_updat.txt: The total amount of wall time = 738.414877
runtime.rtma_hiproc_contrl.txt: The total amount of wall time = 354.441046
runtime.rtma_hiproc_updat.txt: The total amount of wall time = 351.002411
runtime.rtma_loproc_contrl.txt: The total amount of wall time = 362.241694
runtime.rtma_loproc_updat.txt: The total amount of wall time = 366.629561

@srherbener
Copy link
Collaborator

I asked around, and JCSDA is not seeing specific file I/O cases of severe slow down in applications. But someone mentioned that it is important to use srun instead of mpiexec since srun gets more attention from the Orion IT folks in regard to tuning and optimization. I figure everyone is probably aware of this, so apologies for repeating information, but just wanted to make sure this point is made.

@climbfuji
Copy link
Collaborator

Thanks for checking @srherbener

@srherbener
Copy link
Collaborator

Update: JCSDA has now seen SOCA running much slower than it should on Orion. Here's are excerpts from a Slack discussion today:

Travis: has anyone noticed orion going slower than it should? I haven't used it since the upgrade, but I'm seeing today the marine skylab 3dvar going a lot slower than it should. I'm not sure if it's orion being orion, or if a regression slipped into soca lately.

Cory mentions this issue

Travis: yep, that tracks... soca var is taking 2x as long as it should

Travis added: yeah, it's more than file I/O though. The 3dvar is slowing down to a crawl inside the minimizer where no I/O is being done

@climbfuji
Copy link
Collaborator

Is that with GNU only (UFS and JEDI), or is that also with Intel?

@srherbener
Copy link
Collaborator

Is that with GNU only (UFS and JEDI), or is that also with Intel?

@travissluka can you help clarify? Thanks!

@travissluka
Copy link

Is that with GNU only (UFS and JEDI), or is that also with Intel?

@travissluka can you help clarify? Thanks!

GNU (I didn't try intel), and JEDI only

@climbfuji
Copy link
Collaborator

Would be useful to know if the GSI slowdown reported above for Intel als happens with GNU, and vice versa if the JED slowdown also happens with Intel. From what it sounds like, yes. That would be another good data point that the problem is somewhere in the underlying fabric or slurm integration with it.

@travissluka
Copy link

travissluka commented Jul 12, 2024

Would be useful to know if the GSI slowdown reported above for Intel als happens with GNU, and vice versa if the JED slowdown also happens with Intel. From what it sounds like, yes. That would be another good data point that the problem is somewhere in the underlying fabric or slurm integration with it.

another data point, I'm trying on hercules/intel and it's going just as slow (maybe even slower actually) for SOCA. File I/O doesn't seem to be a particular problem, it seems to be MPI related. If it throw more nodes at it the variational solver goes even slower (inverse scalability!)

@DavidHuber-NOAA
Copy link
Collaborator

I believe I found the issue with the Intel/GCC compilers on Orion. They are bitwise identical to Hercules. The intel-oneapi-compilers/2023.1.0.lua module module file for Orion indicates that the Intel compilers are built for Icelake infrastructure, but Orion's CPUs are Skylake. This may create potentially illegal/less-than-optimal instructions in certain circumstances. I have notified the Orion team and am awaiting a response, but will follow up here when I hear back.

Since the gcc compiler on Orion (/usr/bin/gcc) is identical to Hercules, I believe it may share the same problem. To test this, I installed Intel 2023.1.0 (for Skylake infrastructure) via spack and compiled BUFR with it. The debufr application still runs just as slow this way as with the Icelake installation of Intel compilers. I believe, but cannot prove yet, that this is because Intel is referencing Icelake GNU libraries. To prove it, I think I would need to set up a container on Orion with its own GCC and GNU libraries. Not having much experience with containers, it would be nice to have some help setting this up.

@RussTreadon-NOAA
Copy link
Author

Rebuild GSI develop at 3e27bb8 on Orion following 7/24/2024 maintenance. No appreciable change in gsi.x wall time. The GSI runs slower on Orion Rocky 9 than it did on Orion Centos 7.

@srherbener
Copy link
Collaborator

I believe I found the issue with the Intel/GCC compilers on Orion. They are bitwise identical to Hercules. The intel-oneapi-compilers/2023.1.0.lua module module file for Orion indicates that the Intel compilers are built for Icelake infrastructure, but Orion's CPUs are Skylake. This may create potentially illegal/less-than-optimal instructions in certain circumstances. I have notified the Orion team and am awaiting a response, but will follow up here when I hear back.

Since the gcc compiler on Orion (/usr/bin/gcc) is identical to Hercules, I believe it may share the same problem. To test this, I installed Intel 2023.1.0 (for Skylake infrastructure) via spack and compiled BUFR with it. The debufr application still runs just as slow this way as with the Icelake installation of Intel compilers. I believe, but cannot prove yet, that this is because Intel is referencing Icelake GNU libraries. To prove it, I think I would need to set up a container on Orion with its own GCC and GNU libraries. Not having much experience with containers, it would be nice to have some help setting this up.

From a discussion with an Intel contact (app engineer) I think this is a good avenue to pursue. Here are his comments:

  • The instructions generated for IceLake should be compatible with SkyLake
  • The usage of AVX512 instructions may be performance impacted
    • If the AVX512 instructions are sprinkled about in different code blocks, there will be a definite slowdown
    • If the AVX512 instructions are confined to highly vectorizable code, then the impact should be minimal
    • If it is not obvious right off the bat, I don't think it's worth putting much effort into investigating this. The point rather is that the possibility of creating a bottleneck exists.
  • The core counts are different on IceLake and SkyLake so if anything is being done to take advantage of the IceLake core count, that same scheme will be detrimental on SkyLake.
  • There is a handy diagnostic tool provided by Intel called turbostat
    • It basically shows which cores are in turbo mode and which ones are not. This might be useful for identifying a bottleneck when the load balancing is suboptimal
    • Example usage: turbostat -qS -i1
    • I'm not familiar with turbostat but thought I would pass it on

@DavidHuber-NOAA
Copy link
Collaborator

DavidHuber-NOAA commented Aug 12, 2024

@srherbener Thank you for this very useful information. Based on the differences in AVX512 instructions between Skylake and Icelake cores, I performed a test on Orion where I added the compiler flags
-march=ivybridge
to the Fortran and C flags on compilation. The ivybridge architecture does not support AVX512 instructions, so this will force compilation using AVX1 instructions only. I then reran time debufr, which returned a runtime of 3m55s. I performed this same test on Hera for comparison, which returned a runtime of 2m0s. Unfortunately, that does not seem to explain the timing differences.

If possible, I would still appreciate a container on Orion with its own GLIBC (2.34) and GCC (11.3.0), just so I could rule out any issues in the system GLIBC library.

I attempted to install my own GLIBC/GCC via a crosstool-ng installation here:
/work/noaa/global/dhuber/GCC/crosstool-ng/crosstool-ng/.build/install/x86_64-unknown-linux-gnu
I then linked in just the core utilities from /usr/bin to
/work/noaa/global/dhuber/GCC/crosstool-ng/crosstool-ng/.build/install/x86_64-unknown-linux-gnu/bin
installed miniconda in
/work/noaa/global/dhuber/GCC/crosstool-ng/crosstool-ng/.build/install/x86_64-unknown-linux-gnu/usr/miniconda3
then downloaded spack to
/work/noaa/global/dhuber/GCC/crosstool-ng/crosstool-ng/.build/install/x86_64-unknown-linux-gnu/usr/spack
created the environment
/work/noaa/global/dhuber/GCC/crosstool-ng/crosstool-ng/.build/install/x86_64-unknown-linux-gnu/usr/spack/var/environments/intel
added just intel-oneapi-compilers@2023.2.4 to the environment, then concretized and attempted to install, but a test gcc executable failed to run, causing the prerequisite patchelf to fail. I am not sure where to go from here and feel that a container would be much simpler for this test.

@AlexanderRichert-NOAA
Copy link
Collaborator

@DavidHuber-NOAA can you provide the path for your AVX512-less debufr executable? I had tried to build without AVX512 but it wasn't clear that it was correctly leaving out all those functions. Some kind of instruction-related issue would seem to be consistent with my observation that there's no one piece of the code where things are slow (for example, eliminating exponentiation, and subsequently Intel's special 'pow' functions, sped up the code disproportionately on Orion compared with Hercules, and yet didn't explain even the majority of the time difference...).

@DavidHuber-NOAA
Copy link
Collaborator

@AlexanderRichert-NOAA Sure, you can find it here: /work/noaa/global/dhuber/LIBS/bufr_11.7.0_noavx512/build/install

@AlexanderRichert-NOAA
Copy link
Collaborator

AlexanderRichert-NOAA commented Aug 14, 2024

Hm I'm not sure how siginificant it is, but it seems it's still using some AVX512 routines (I seem to recall that the intel memcpy/memset routines were one of the top items in the gprof runs I did):

$ nm /work/noaa/global/dhuber/LIBS/bufr_11.7.0_noavx512/build/install/bin/debufr | grep avx
$PWD/debufr | grep avx
00000000004f1ff0 T __intel_avx_memmove
00000000004ef750 T __intel_avx_rep_memcpy
00000000004f0f20 T __intel_avx_rep_memset
00000000004ec300 T __intel_mic_avx512f_memcpy
00000000004ede80 T __intel_mic_avx512f_memset

Edit: I just noticed the 'mic' in those two avx512 routines, so those may not be getting used on regular CPUs anyway. In any case, it might be worth running the avx512-less one with gprof to see if there are certain routines that are running slow, vs. just generally running slower.

@DavidHuber-NOAA
Copy link
Collaborator

Interesting. I compiled that executable with -march=ivybridge, which should have applied, at most, AVX-1 instructions. I may have missed this at the linking step, though. I will give it another try in a different directory and try out the same nm test.

@DavidHuber-NOAA
Copy link
Collaborator

DavidHuber-NOAA commented Aug 14, 2024

It looks like the linking step also included -march=ivybridge:

/apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.2.4-cuxy653alrskny7hlvpp6hdx767d3xnt/compiler/2023.2.4/linux/bin/intel64/icc  -g -traceback -O3 -march=ivybridge CMakeFiles/debufr.dir/debufr.c.o CMakeFiles/debufr.dir/debufr.F90.o -o debufr  ../src/libbufr_4.a -lifport -lifcoremt -lpthread

nm shows the same avx entries you saw as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working NOAA-EMC OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center
Projects
None yet
Development

No branches or pull requests

7 participants