Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc 12.2 misoptimizes the code with ipa-ra when linking hdr_histogram_static into an .so file #124

Open
TimWolla opened this issue Apr 11, 2024 · 0 comments

Comments

@TimWolla
Copy link

TimWolla commented Apr 11, 2024

This is probably not an issue in this library, but rather a gcc bug, but I am nevertheless filing this issue for visibility:

Consider the following files:

app.c:

#include <dlfcn.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
	int foo, bar;

	void *handle = dlopen("./lib.so", RTLD_LAZY);


	int64_t (*hdr_test)(int *, int *) = dlsym(handle, "hdr_test");

	int64_t count = hdr_test(&foo, &bar);

	printf("%ld\n", count);
}

lib.c:

#include <stdint.h>
#include "hdr/hdr_histogram.h"

int64_t hdr_test(int *dummy, int *dummy2)
{
	struct hdr_histogram *hdr;

	int res = hdr_init(1, 60000, 2, &hdr);
	hdr->counts[100] = 5;
	hdr->counts[121] = 1;

	hdr_reset_internal_counters(hdr);

	return hdr->total_count;
}

When compiling HdrHistogram_c as follows with gcc 12.2 (e.g. within a Debian 12 Docker Container):

env CFLAGS="-O2 -ggdb" cmake .
make
make install

And then compiling lib.c and app.c as follows:

gcc -shared -g -O0 lib.c -lhdr_histogram_static -lm -o lib.so

gcc app.c -o app

Then executing ./app will return some arbitrary value, instead of the expected value of 6:

$ ./app
129089121466008
$ ./app
126604316296856
$ ./app
131478882424472

I was able to track this down to the ipa-ra optimization of gcc, which is defined as:

Use caller save registers for allocation if those registers are not used by any called function. In that case it is not necessary to save and restore them around calls. This is only possible if called functions are part of same compilation unit as current function and they are compiled before it.

Enabled at levels -O2, -O3, -Os, however the option is disabled if generated code will be instrumented for profiling (-p, or -pg) or if callee’s register usage cannot be known exactly (this happens on targets that do not expose prologues and epilogues in RTL).

and it's enabled by default for -O2.

  • CFLAGS="-O2 -fno-ipa-ra -ggdb" prevents the issue.
  • CFLAGS="-O1 -fipa-ra -ggdb" causes the issue.

The following GitHub Action workflow tests against both gcc and clang and different optimization flags:

name: Test

on:
  push:
  pull_request:

permissions:
  contents: read

jobs:
  Test:
    name: Test
    runs-on: ubuntu-22.04
    steps:
    - uses: actions/checkout@v4
    - uses: actions/checkout@v4
      with:
        repository: HdrHistogram/HdrHistogram_c
        path: HdrHistogram_c
    - run: |
        sudo apt-get update
        sudo apt-get install -y clang
    - run: |
        for CC in clang gcc; do
          for optimization in "-O0" "-O1" "-O2" "-O1 -fipa-ra" "-O2 -fno-ipa-ra"; do
          
            pushd HdrHistogram_c > /dev/null
            
            git clean -fdx > /dev/null
            if ! env CFLAGS="$optimization -ggdb" "CC=$CC" cmake . > /dev/null 2> /dev/null; then
              popd > /dev/null
              continue
            fi
            make > /dev/null
            sudo make install > /dev/null
            
            popd > /dev/null
            
            gcc -shared -g -O0 lib.c -lhdr_histogram_static -lm -o lib.so
            
            gcc app.c -o app

    
            printf "CC=%s O=%s -> " "$CC" "$optimization"
            ./app
          done
        done

An example run can be found here: https://github.com/TimWolla/hdrhistogram-gcc/actions/runs/8643179941/job/23695657623. For posterity, the output is as follows:

CC=clang O=-O0 -> 6
CC=clang O=-O1 -> 6
CC=clang O=-O2 -> 6
CC=gcc O=-O0 -> 6
CC=gcc O=-O1 -> 6
CC=gcc O=-O2 -> 140067486107456
CC=gcc O=-O1 -fipa-ra -> 140009274693666
CC=gcc O=-O2 -fno-ipa-ra -> 6

For me observed_total_count is stored in the R10 register. Watching it in gdb exposes what is happening:

(gdb) break hdr_reset_internal_counters
Function "hdr_reset_internal_counters" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (hdr_reset_internal_counters) pending.
(gdb) run
Starting program: /pwd/app 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:281
281	    for (i = 0; i < h->counts_len; i++)
(gdb) watch $r10
Watchpoint 2: $r10
(gdb) cont
Continuing.

Watchpoint 2: $r10

Old value = 137376174487648
New value = 0
0x00007cf16214c8b5 in hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:277
277	    int max_index = -1;
(gdb) cont
Continuing.

Watchpoint 2: $r10

Old value = 0
New value = 5
hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:289
289	            if (min_non_zero_index == -1 && i != 0)
(gdb) cont
Continuing.

Watchpoint 2: $r10

Old value = 5
New value = 6
hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:289
289	            if (min_non_zero_index == -1 && i != 0)
(gdb) cont
Continuing.

Watchpoint 2: $r10

Old value = 6
New value = 137376174486472
_dl_fixup (l=0x576dd996a2c0, reloc_arg=14) at ../sysdeps/generic/ldsodefs.h:81
81	../sysdeps/generic/ldsodefs.h: No such file or directory.
(gdb) bt
#0  _dl_fixup (l=0x576dd996a2c0, reloc_arg=14) at ../sysdeps/generic/ldsodefs.h:81
#1  0x00007cf16234e02a in _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:130
#2  0x00007cf16214c91c in highest_equivalent_value (value=<optimized out>, h=0x576dd996a2c0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:256
#3  hdr_reset_internal_counters (h=0x576dd996a2c0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:303
#4  0x00007cf16214c27d in hdr_test (dummy=0x7ffe295437d4, dummy2=0x7ffe295437d0) at lib.c:12
#5  0x0000576dd95551a7 in main ()

The dl-trampoline changes the value and gcc did not save the register on the stack due to the optimization and it not realizing that the call to hdr_next_non_equivalent_value in highest_equivalent_value will clobber the register due to the trampoline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant