Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why so many memory leaks in chgres_cube? #729

Open
edwardhartnett opened this issue Dec 11, 2022 · 6 comments
Open

why so many memory leaks in chgres_cube? #729

edwardhartnett opened this issue Dec 11, 2022 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@edwardhartnett
Copy link
Collaborator

Are these a known issue? Why so many memory leaks?

==13925==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 578062080 byte(s) in 278 object(s) allocated from:
    #0 0x7f2076ee2808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x555b2a70a3d7 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0xf53d7)

Direct leak of 12940304 byte(s) in 248852 object(s) allocated from:
    #0 0x7f2076ee2808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x555b2a707d42 in 
 (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0xf2d42)

Direct leak of 77748 byte(s) in 1023 object(s) allocated from:
    #0 0x7f2076ee2808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x555b2a7082c0 in �` (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0xf32c0)

Direct leak of 73376 byte(s) in 1022 object(s) allocated from:
    #0 0x7f2076ee2808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x555b2a709a1c in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0xf4a1c)
    #2 0xfffffffffffffffe  (<unknown module>)

Direct leak of 61440 byte(s) in 1024 object(s) allocated from:
    #0 0x7f2076ee2808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x555b2a708f24 in �� (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0xf3f24)

Direct leak of 14688 byte(s) in 17 object(s) allocated from:
    #0 0x7f2076ee4587 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cc:104
    #1 0x555b2a925b2c in ;
 (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x310b2c)

Direct leak of 1208 byte(s) in 1 object(s) allocated from:
    #0 0x7f2076ee4587 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cc:104
    #1 0x555b2adca9e5 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x7b59e5)
    #2 0x555b2ac0b249 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x5f6249)
    #3 0x555b2a933e3d in � (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x31ee3d)
    #4 0x60200002fbaf  (<unknown module>)

Direct leak of 1208 byte(s) in 1 object(s) allocated from:
    #0 0x7f2076ee4587 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cc:104
    #1 0x555b2adca9e5 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x7b59e5)
    #2 0x555b2ac0b249 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x5f6249)
    #3 0x555b2a9340a0 in � (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x31f0a0)
    #4 0x60200003416f  (<unknown module>)

Direct leak of 1208 byte(s) in 1 object(s) allocated from:
    #0 0x7f2076ee4587 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cc:104
    #1 0x555b2adca9e5 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x7b59e5)
    #2 0x555b2ac0b249 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x5f6249)
    #3 0x555b2a9340a0 in � (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x31f0a0)
    #4 0x6020000[378](https://github.com/ufs-community/UFS_UTILS/actions/runs/3669737030/jobs/6203783058#step:11:379)2f  (<unknown module>)

Direct leak of 1208 byte(s) in 1 object(s) allocated from:
    #0 0x7f2076ee4587 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cc:104
    #1 0x555b2adca9e5 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x7b59e5)
    #2 0x555b2ac0b249 in  (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x5f6249)
    #3 0x555b2a9340a0 in � (/home/runner/work/UFS_UTILS/UFS_UTILS/ufs_utils/build/tests/chgres_cube/ftst_read_atm_grib2+0x31f0a0)
    #4 0x6020000[381](https://github.com/ufs-community/UFS_UTILS/actions/runs/3669737030/jobs/6203783058#step:11:382)2f  (<unknown module>)
@edwardhartnett edwardhartnett added the question Further information is requested label Dec 11, 2022
@LarissaReames-NOAA
Copy link
Collaborator

I decided to take a look at this again today while looking through the Issues. All of the memory leaks I'm seeing in the current version of chgres_cube appear to be associated with either g2 or ESMF routines. None of the memory leaks show an address location in any of the chgres_cube files any more so I'm not sure they're something that we can fix.

I do, however, see some memory leaks originating from an fvcom_tools test, ftst_readfvcomnetcdf.F90 . So that might need to be addressed, but I'm not sure if this Issue is the place to do that.

Does anyone else see any chgres_cube tests that have memory leaks that trace back to actual test code and not just external library routines? If not, should we go ahead and close this?

@edwardhartnett
Copy link
Collaborator Author

Just because the memory is in another library does not mean that UFS_UTILS is not responsible. I can't speak for ESMF, but for the latest release of g2, there are no memory leaks (if g2_finalize() is called at the end of all processing).

For many libraries, including g2, a user causes memory to be allocated, and then must call other library functions to free the memory.

Now I can't actually speak for ESMF. But if they have done things correctly, all resources of theirs that were opened, should be closable.

@LarissaReames-NOAA
Copy link
Collaborator

So how would one go about actually tracing back to where the memory leaks occurred in chgres_cube code if the address sanitizer output doesn't even list a file in the chgres_cube repository? For example:
5: Direct leak of 61440 byte(s) in 1024 object(s) allocated from: 5: #0 0x2b4e0eb3a1c8 in __interceptor_malloc /tmp/Role.Apps/spack-stage/spack-stage-gcc-9.2.0-wqdecm4rkyyhejagxwmnabt6lscgm45d/spack-src/libsanitizer/asan/asan_malloc_linux.cc:144 5: #1 0x620fb4 in gf_unpack4_ /scratch1/NCEPDEV/nems/role.epic/hpc-stack/src/gnu-9.2/pkg/g2-v3.4.5/src/gf_unpack4.f:85

@LarissaReames-NOAA
Copy link
Collaborator

I ran one test (ftst_read_atm_grib2) through valgrind and pretty much every instance of ESMF_FieldCreate, ESMF_GridCreate, and getgb2 are being flagged for possible or definite memory leaks. I double and triple checked that all Fields and Grids whose creation calls are being flagged were in fact also attached to destroy commands at the end of the test file. I'm not at all sure what to think of every single getgb2 call being flagged. As an example:
==62912== 60,301,440 bytes in 29 blocks are possibly lost in loss record 243 of 244 ==62912== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==62912== by 0x55F527: gf_unpack7_ (gf_unpack7.f:62) ==62912== by 0x55A9B0: getgb2r_ (getgb2r.f:272) ==62912== by 0x55A511: getgb2_ (getgb2.f:271) ==62912== by 0x4EC1B5: __atm_input_data_MOD_read_input_atm_grib2_file (atm_input_data.F90:2515) ==62912== by 0x52D7F1: __atm_input_data_MOD_read_input_atm_data (atm_input_data.F90:137) ==62912== by 0x4D138F: MAIN__ (ftst_read_atm_grib2.F90:169) ==62912== by 0x4D4A15: main (ftst_read_atm_grib2.F90:12)

@edwardhartnett
Copy link
Collaborator Author

The g2 library allocates unreachable memory in the getgb2() and related subroutines.

THe only way to free this memory is to call the new subroutine g2_finalize() after all g2 operations are complete. This will free the g2 memory. Documentation here: https://noaa-emc.github.io/NCEPLIBS-g2/getidx_8F90.html#ac02dafd6109baa69a66dc278d6b0c083

So this is a great example of how memory checking can find real bugs and allow us to fix them. This was found (and the gf_finalize() added) in response to such a memory checking effort from a using application.

As for ESMF, I suggest you contact their team and ask. Probably there is some function that needs to be called to free memory, either a finalize function or some function to close open ESMF objects.

@LarissaReames-NOAA
Copy link
Collaborator

UFS_UTILS currently uses g2 v3.4.5 which does not have that command, so it won't be possible to remove those leaks associated with g2 unless we update libraries.

I'll see what the ESMF folks have to say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants