Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope out things with rhel8 on Weaver #841

Open
ikalash opened this issue Sep 27, 2022 · 16 comments
Open

Scope out things with rhel8 on Weaver #841

ikalash opened this issue Sep 27, 2022 · 16 comments
Assignees
Labels
CUDA Testing Stuff related to testing Albany (including nightly tests)

Comments

@ikalash
Copy link
Collaborator

ikalash commented Sep 27, 2022

@ikalash will reactivate nightly and report what happens, since it is not pushed to CDash site.

@jewatkins will scope things out depending on results of above.

@ikalash ikalash added Testing Stuff related to testing Albany (including nightly tests) CUDA labels Sep 27, 2022
@ikalash
Copy link
Collaborator Author

ikalash commented Sep 29, 2022

I did this today, and can confirm that Albany builds and runs with RHEL8. The modules and configure scripts to use are here:

https://github.com/sandialabs/Albany/blob/master/doc/dashboards/weaver.sandia.gov/do-cmake-albany-rhel8
https://github.com/sandialabs/Albany/blob/master/doc/dashboards/weaver.sandia.gov/do-cmake-weaver-trilinos-rhel8
https://github.com/sandialabs/Albany/blob/master/doc/dashboards/weaver.sandia.gov/weaver_modules_cuda_rhel8.sh

Note that I had an issue loading a cmake that works with the devpack I am using, so I had to hack the configure scripts to point to a high-enough cmake.

One of the weaver sysadmins is working on the issue of not being able to push to the CDash site from the RHEL8 queue. One workaround always would be to do the pushing at the end from the head node but this is sort of cumbersome.

@jewatkins
Copy link
Collaborator

I also got this working last night with the rhel8 modules. I had to update the trilinos configure script but the only issue I saw was that you have to configure/build on a rhel8 node. The login node is supposed to be updated soon.

@jewatkins
Copy link
Collaborator

I could post a PR and we can push when they update the login node?

@ikalash
Copy link
Collaborator Author

ikalash commented Sep 30, 2022

@jewatkins : I am testing your PR right now. I don't think it's a problem that the builds need to happen on the compute node b/c the way the weaver nightlies are set up, that happens anyway. What is a problem is not being able to push to the CDash site from the rhel8 compute nodes. Kevin, a weaver sysadmin, sent me a proposed fix which I need to test. If it works, we can change the nightlies and push this PR. I will have more info tomorrow hopefully.

@jewatkins
Copy link
Collaborator

@jewatkins : I am testing your PR right now. I don't think it's a problem that the builds need to happen on the compute node b/c the way the weaver nightlies are set up, that happens anyway. What is a problem is not being able to push to the CDash site from the rhel8 compute nodes. Kevin, a weaver sysadmin, sent me a proposed fix which I need to test. If it works, we can change the nightlies and push this PR. I will have more info tomorrow hopefully.

Sounds good, thanks.

@jewatkins
Copy link
Collaborator

@ikalash It looks like we're close, the weaver builds are back up but the performance tests are failing:

CMake Error at CMakeLists.txt:2 (cmake_minimum_required):
  CMake 3.17.0 or higher is required.  You are running version 3.6.2


-- Configuring incomplete, errors occurred!

It looks like it's loading the wrong modules. I'll try updating those files and we can see what happens tomorrow.

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 4, 2022

I have updated the modules in the appropriate directory. Hopefully things will be clean tomorrow.

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 5, 2022

It seems some of the performance tests are failing due to too many procs being required.

--------------------------------------------------------------------------
Your job has requested more processes than the ppr for
this topology can support:

  App: /home/projects/albany/nightlyCDashWeaver/build/AlbBuildSFad/src/Albany
  Number of procs:  8
  PPR: 2:socket

Please revise the conflict and try again.
--------------------------------------------------------------------------

@jewatkins , is this due to a difference b/w the rhel7 and rhel8 queues?

@jewatkins
Copy link
Collaborator

It's possible. I might have to change the run line a bit. I'll look into it today.

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 5, 2022

Thanks @jewatkins !

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 6, 2022

Looks like now the Albany weaver tests fail to configure:

CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_FIND_LIBRARY_PREFIXES
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_FIND_LIBRARY_SUFFIXES
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_FIND_LIBRARY_PREFIXES
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_FIND_LIBRARY_SUFFIXES
-- Unable to find cudart library.
CMake Error at /home/projects/ppc64le-pwr9-nvidia/spack/opt/spack/linux-rhel8-ppc64le/gcc-8.3.1/cmake-3.21.2-z4zaxd77odasv2tpmdofmz4cevwmdz5b/share/cmake-3.21/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find CUDAToolkit (missing: CUDA_CUDART) (found version
  "11.2.152")
Call Stack (most recent call first):
  /home/projects/ppc64le-pwr9-nvidia/spack/opt/spack/linux-rhel8-ppc64le/gcc-8.3.1/cmake-3.21.2-z4zaxd77odasv2tpmdofmz4cevwmdz5b/share/cmake-3.21/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /home/projects/ppc64le-pwr9-nvidia/spack/opt/spack/linux-rhel8-ppc64le/gcc-8.3.1/cmake-3.21.2-z4zaxd77odasv2tpmdofmz4cevwmdz5b/share/cmake-3.21/Modules/FindCUDAToolkit.cmake:801 (find_package_handle_standard_args)
  /home/projects/ppc64le-pwr9-nvidia/spack/opt/spack/linux-rhel8-ppc64le/gcc-8.3.1/cmake-3.21.2-z4zaxd77odasv2tpmdofmz4cevwmdz5b/share/cmake-3.21/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
  /home/projects/albany/nightlyCDashWeaver/build/TrilinosInstall/lib/external_packages/CUDA/CUDAConfig.cmake:2 (find_dependency)
  /home/projects/albany/nightlyCDashWeaver/build/TrilinosInstall/lib/cmake/KokkosCore/KokkosCoreConfig.cmake:156 (include)
  /home/projects/albany/nightlyCDashWeaver/build/TrilinosInstall/lib/cmake/TeuchosCore/TeuchosCoreConfig.cmake:196 (include)
  /home/projects/albany/nightlyCDashWeaver/build/TrilinosInstall/lib/cmake/PanzerCore/PanzerCoreConfig.cmake:159 (include)
  /home/projects/albany/nightlyCDashWeaver/build/TrilinosInstall/lib/cmake/Panzer/PanzerConfig.cmake:162 (include)
  /home/projects/albany/nightlyCDashWeaver/build/TrilinosInstall/lib/cmake/Trilinos/TrilinosConfig.cmake:123 (include)
  CMakeLists.txt:29 (FIND_PACKAGE)


-- Configuring incomplete, errors occurred!

https://sems-cdash-son.sandia.gov/cdash/build/40129/configure

Did something change recently?

@jewatkins
Copy link
Collaborator

It worked the day before so I'm not sure why that happened. I tried to configure this morning and everything still works.

@jewatkins
Copy link
Collaborator

Also, there's some issue with running on multiple nodes, that's why the performance tests fail. Have to wait for weaver admins to respond.

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 6, 2022

Hmmm, weird. We can see what happens tomorrow. Yes, I saw your issue about the multiple nodes.

@jewatkins
Copy link
Collaborator

We need them to install cmake 3.22 on weaver-rhel8 too: trilinos/Trilinos#10355

jewatkins added a commit that referenced this issue Oct 11, 2022
add c++17 to weaver build
 - towards #841 & #842
@jewatkins
Copy link
Collaborator

The "old" cmake builds work on rhel8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Testing Stuff related to testing Albany (including nightly tests)
Projects
None yet
Development

No branches or pull requests

2 participants