Skip to content

Commit

Permalink
Crusher: Work around tcmalloc issue.
Browse files Browse the repository at this point in the history
cee/15.0.0 with GPU MPI buffers can crash in a system lib like this:

#4  0x00007fffe159e35b in (anonymous namespace)::do_free_with_callback(void*, void (*)(void*)) [clone .constprop.0] () from /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libtcmalloc_minimal.so.1
#5  0x00007fffe15a8f16 in tc_free () from /opt/cray/pe/cce/15.0.0/cce/x86_64/lib/libtcmalloc_minimal.so.1
#6  0x00007fffe99c2bcd in _dlerror_run () from /lib64/libdl.so.2
#7  0x00007fffe99c2481 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#8  0x00007fffea7bce42 in _ad_cray_lock_init () from /opt/cray/pe/lib64/libmpi_cray.so.12
#9  0x00007fffed7eb37a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#10 0x00007fffed7eb496 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#11 0x00007fffed7dc58a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#12 0x0000000000000001 in ?? ()
#13 0x00007fffffff42e7 in ?? ()
#14 0x0000000000000000 in ?? ()

Work around this by using cee/14.0.3.
  • Loading branch information
ambrad committed Apr 28, 2023
1 parent e9ece39 commit 3791fab
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions cime_config/machines/config_machines.xml
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,8 @@
<!-- See SCREAM issue #2080. -->
<command name="load">craype-accel-amd-gfx90a</command>
<command name="load">rocm/5.1.0</command>
<!-- Work around tcmalloc crash. -->
<command name="load">cce/14.0.3</command>
</modules>
<modules>
<command name="load">cray-python/3.9.4.2</command>
Expand Down

0 comments on commit 3791fab

Please sign in to comment.