Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otel: Add OpenTelemetry functionality to NGINX Unit #1463

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

javorszky
Copy link
Contributor

@javorszky javorszky commented Oct 18, 2024

Adds OpenTelemetry implementation via a Rust crate compiled to a static C library and linked into the existing Unit codebase with the necessary build and configuration steps.

  • Each commit in order builds, runs, and works as intended, each export traces to grafana from 290fffd (configuration items and their validation) onwards
  • The --otel build argument works from fb15df7 (add build tooling to include otel code) onwards.
  • Adding settings.telemetry to Unit's config.json works from 290fffd (configuration items and their validation) onwards
  • Each commit has the Signed-off-by trailers, and all of them are co-authored between @avahahn and myself.

Closes #1283

@avahahn
Copy link
Contributor

avahahn commented Oct 21, 2024

Opentelemetry performance test results!

The tests use wrk, were ran on a 12c/24t system with 64gb ram. wrk invocation as follows:
./wrk -t22 -c880 -d30s http://127.0.0.1:80/
Here, Unit is serving the welcome page.

With otel change, tracing enabled (sample rate 1.0, batch size 20, HTTP Transport)

Latency:

  • Avg: 1.55ms
  • Stdev: 2.67ms, 95.46%

Requests per second:

  • Avg: 31.70k
  • Stdev: 5.87k, 79.66%

With otel change, tracing NOT enabled:

Latency:

  • Avg: 1.00ms
  • Stdev: 777.66us 75.06%

Requests per second:

  • Avg: 41.76k
  • Stdev: 11.33k 55.51%

Built without OpenTelemetry support:

Latency:

  • Avg: 0.97ms
  • Stdev: 673.74us, 77.44%

Requests per second:

  • Avg: 42.25k
  • Stdev: 4.88k, 76.60%

I conclude that the enclosed change does not measurably impact Unit's performance when not configured. The differences in numbers between the 2nd and 3rd dataset is far smaller than either of their deviance values, and I find the data is always similar across test runs.

I also conclude that in a very high throughput deployment opentelemetry tracing will have a small impact. This test represents a high throughput deployment with the sample ratio cranked up unreasonably high. Best practices in any kind of situation in which unit will be receiving this kind of traffic are to reduce the sampling ratio to minimize the load of opentelemetry span processing and generation. Even with the sampling ratio cranked up to maximum (100%) opentelemetry processing for every single request (over 30k requests per second) merely adds half a millisecond on average.

For more information you can reproduce my tests with these scripts:
https://github.com/avahahn/unit-perf-test

@avahahn
Copy link
Contributor

avahahn commented Oct 21, 2024

Just wanted to offer a note for configuration. The settings object has been extended with a new telemetry field. This field contains currently 4 items:

  • endpoint: The endpoint for the OpenTelemetry Collector
    This is a required field.
    It takes either a URL to either a gRPC or HTTP API.
    Example: http://lgtm:4318/v1/traces
  • protocol: Determines the protocol used to communicate with the endpoint
    This is a required field.
    Can either be specified in all caps or all lowercase.
    One of either "HTTP" or "GRPC"
  • batch_size: Number of spans to cache before triggering a transaction with configured endpoint
    This allows the user to cache up to N spans before the opentelemetry background thread sends spans over network to the collector. It is expected that this scales up with the throughput of the user's environment as well as given any bandwidth constraints the user might have for their network.
    Must be a positive integer.
  • sampling_ratio: percent of requests to trace
    This allows the user to only trace anywhere from 0% to 100% of requests that hit Unit. In high throughput environments this percentage will be lower. This allows the user to save space in storing span data, and to collect request metrics like time to decode headers and whatnot without storing massive amounts of duplicate superfluous data.
    Must be a positive floating point number.

Example configuration:

curl -X PUT 127.0.0.1:8080/config -d '{
    "settings": {
        "telemetry": {
            "batch_size": 20,
            "endpoint": "http://lgtm:4318/v1/traces",
            "protocol": "http",
            "sampling_ratio": 1
        }
    },
    "listeners": {
        "*:80": {
            "pass": "routes"
        }
    },
    "routes": [
        {
            "match": {
                "headers": {
                    "accept": "*text/html*"
                }
            },
            "action": {
                "share": "/usr/share/unit/welcome/welcome.html"
            }
        },
        {
            "action": {
                "share": "/usr/share/unit/welcome/welcome.md"
            }
        }
    ]
}'

Copy link
Member

@ac000 ac000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the commits with Ava's Co-authored-by, the commit tags should look like

Co-developed-by: Ava Hahn <a.hahn@f5.com>
Signed-off-by: Ava Hahn <a.hahn@f5.com>
Signed-off-by: Gabor Javorszky <g.javorszky@f5.com>

I.e. A c-[ad]-b tag should be immediately followed by a s-o-b from the same person.

Also note we use Co-developed-by.

For the

tools: fix bracket balance of editorconfig file

commit. tools is really for the tools/ directory. Here i'd just use the filename as the prefix, i.e.

.editorconfig: fix bracket imbalance

Copy link
Member

@ac000 ac000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit

    otel: add build tooling to include otel code

I think the following would be better in ./configure

diff --git ./auto/make ./auto/make
index f21a2dfc..e7d29ba3 100644
--- ./auto/make
+++ ./auto/make
@@ -7,6 +7,11 @@
 
 $echo "creating $NXT_MAKEFILE"
 
+if [ $NXT_OTEL = "NO" ];  then
+    NXT_OTEL_LIB_LOC=
+    NXT_OTEL_BUILD_FLAG=
+    NXT_OTEL_LIB_DIR=
+fi
 
 cat << END > $NXT_MAKEFILE
 
@@ -138,14 +143,14 @@ cat << END >> $NXT_MAKEFILE
 
 libnxt:        $NXT_BUILD_DIR/lib/$NXT_LIB_SHARED $NXT_BUILD_DIR/lib/$NXT_LIB_STATIC
 
-$NXT_BUILD_DIR/lib/$NXT_LIB_SHARED: \$(NXT_LIB_OBJS)
+$NXT_BUILD_DIR/lib/$NXT_LIB_SHARED: \$(NXT_LIB_OBJS) $NXT_OTEL_LIB_LOC
        \$(PP_LD) \$@
        \$(v)\$(NXT_SHARED_LOCAL_LINK) -o \$@ \$(NXT_LIB_OBJS) \\
-               $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS
+               $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS $NXT_OTEL_LIB_LOC
 
-$NXT_BUILD_DIR/lib/$NXT_LIB_STATIC: \$(NXT_LIB_OBJS)
+$NXT_BUILD_DIR/lib/$NXT_LIB_STATIC: \$(NXT_LIB_OBJS) $NXT_OTEL_LIB_LOC
        \$(PP_AR) \$@
-       \$(v)$NXT_STATIC_LINK \$@ \$(NXT_LIB_OBJS)
+       \$(v)$NXT_STATIC_LINK \$@ \$(NXT_LIB_OBJS) $NXT_OTEL_LIB_LOC
 
 $NXT_BUILD_DIR/lib/$NXT_LIB_UNIT_STATIC: \$(NXT_LIB_UNIT_OBJS) \\
                $NXT_BUILD_DIR/share/pkgconfig/unit.pc \\
@@ -359,11 +364,11 @@ $echo >> $NXT_MAKEFILE
 cat << END >> $NXT_MAKEFILE
 
 $NXT_BUILD_DIR/sbin/$NXT_DAEMON:       $NXT_BUILD_DIR/lib/$NXT_LIB_STATIC \\
-                               \$(NXT_OBJS)
+                               \$(NXT_OBJS) $NXT_OTEL_LIB_LOC
        \$(PP_LD) \$@
        \$(v)\$(NXT_EXEC_LINK) -o \$@ \$(CFLAGS) \\
                \$(NXT_OBJS) $NXT_BUILD_DIR/lib/$NXT_LIB_STATIC \\
-               $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS
+               $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS $NXT_OTEL_LIB_LOC
 
 END
 

Where you only need to adjust the NXT_LIB_AUX_CFLAGS and NXT_LIB_AUX_LIBS variables.

E.g. in my http compression patches I do

diff --git ./configure ./configure
index 6929d41d..f33134b7 100755
--- ./configure
+++ ./configure
@@ -127,6 +127,7 @@ NXT_LIBRT=
 . auto/unix
 . auto/os/conf
 . auto/ssltls
+. auto/compression
 
 if [ $NXT_REGEX = YES ]; then
     . auto/pcre
@@ -169,11 +170,13 @@ END
 
 NXT_LIB_AUX_CFLAGS="$NXT_OPENSSL_CFLAGS $NXT_GNUTLS_CFLAGS \\
                     $NXT_CYASSL_CFLAGS $NXT_POLARSSL_CFLAGS \\
-                    $NXT_PCRE_CFLAGS"
+                    $NXT_PCRE_CFLAGS $NXT_ZLIB_CFLAGS $NXT_ZSTD_CFLAGS \\
+                    $NXT_BROTLI_CFLAGS"
 
 NXT_LIB_AUX_LIBS="$NXT_OPENSSL_LIBS $NXT_GNUTLS_LIBS \\
                     $NXT_CYASSL_LIBS $NXT_POLARSSL_LIBS \\
-                    $NXT_PCRE_LIB"
+                    $NXT_PCRE_LIB $NXT_ZLIB_LIBS $NXT_ZSTD_LIBS \\
+                    $NXT_BROTLI_LIBS"
 
 if [ $NXT_NJS != NO ]; then
     . auto/njs

auto/make Outdated Show resolved Hide resolved
auto/otel Outdated Show resolved Hide resolved
auto/otel Outdated Show resolved Hide resolved
auto/otel Outdated Show resolved Hide resolved
auto/otel Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Show resolved Hide resolved
src/nxt_otel.c Show resolved Hide resolved
src/nxt_otel.c Show resolved Hide resolved
auto/otel Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
@ac000
Copy link
Member

ac000 commented Oct 22, 2024

For

tools/openapi: update OpenAPI references

I would make the prefix docs/openapi as docs/unit-openapi.yaml is the source of truth.

Copy link
Member

@ac000 ac000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otel: configuration items and their validation

Adds code responsible for users to apply the `telemetry` configuration
options.

I would like to see example Unit configuration here...

Comment on lines +244 to +257
#if (NXT_HAVE_OTEL)
nxt_inline nxt_int_t nxt_otel_validate_endpoint(nxt_conf_validation_t *vldt,
nxt_conf_value_t *value,
void *data);
nxt_int_t nxt_otel_validate_batch_size(nxt_conf_validation_t *vldt,
nxt_conf_value_t *value,
void *data);
nxt_int_t nxt_otel_validate_sample_ratio(nxt_conf_validation_t *vldt,
nxt_conf_value_t *value,
void *data);
nxt_int_t nxt_otel_validate_protocol(nxt_conf_validation_t *vldt,
nxt_conf_value_t *value,
void *data);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be static

@@ -1465,6 +1515,73 @@ nxt_conf_validate(nxt_conf_validation_t *vldt)
"a number, a string, an array, or an object"



#if (NXT_HAVE_OTEL)
inline nxt_int_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the inline, or simply remove this function altogether is probably better...

double batch_size;
batch_size = nxt_conf_get_number(value);
if (batch_size <= 0) {
return NXT_ERROR;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, indentation...

Comment on lines +1535 to +1536
double batch_size;
batch_size = nxt_conf_get_number(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line after variable declarations please.


return NXT_ERROR;

happy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't indent goto labels...

Comment on lines +1567 to +1568
if (nxt_str_eq(&proto, "HTTP", 4) ||
nxt_str_eq(&proto, "http", 4)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Unit we put the operator on the next line and when we have multi-line if's the { goes on it its own line.

@@ -1613,6 +1617,12 @@ static nxt_conf_map_t nxt_router_websocket_conf[] = {
};


#if (NXT_HAVE_OTEL)
static void nxt_otel_log_callback(u_char *arg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening { of functions always on their own line please...

nxt_conf_get_string(otel_endpoint, &telemetry_endpoint);
nxt_conf_get_string(otel_proto, &telemetry_proto);
telemetry_batching = otel_batching ? nxt_conf_get_number(otel_batching) : NXT_OTEL_BATCH_DEFAULT;
telemetry_sample_fraction = otel_sampling ? nxt_conf_get_number(otel_sampling) : NXT_OTEL_SAMPLING_DEFAULT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to break this line up...

Comment on lines +2210 to +2214
nxt_otel_rs_init(&nxt_otel_log_callback,
&telemetry_endpoint,
&telemetry_proto,
telemetry_sample_fraction,
telemetry_batching);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with a wee re-alignment to the opening (

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder also if these are just asking to be put in a struct?

src/nxt_router.c Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
This is purely the source code of the rust end of opentelemetry. It does
not have build tooling wired up yet, nor is this used from the C code.

Signed-off-by: Ava Hahn <a.hahn@f5.com>
Signed-off-by: Gabor Javorszky <g.javorszky@f5.com>

Co-authored-by: Ava Hahn <a.hahn@f5.com>
Comment on lines +194 to +192
body_key = (nxt_str_t){
.start = body_size_buf,
.length = nxt_length(body_size_buf),
};
body_val = (nxt_str_t){
.start = body_buf,
.length = nxt_length(body_buf),
};

This comment was marked as duplicate.

src/nxt_otel.c Show resolved Hide resolved
src/nxt_otel.h Outdated Show resolved Hide resolved
@ac000

This comment was marked as duplicate.

javorszky and others added 5 commits October 23, 2024 16:33
Adds the --otel flag to the configure command and the various build time
variables and checks that are needed in this flow.

It also includes the nxt_otel.c and nxt_otel.h files that are needed for
the rest of Unit to talk to the compiled static library that's generated
from the rust crate.

Signed-off-by: Ava Hahn <a.hahn@f5.com>
Signed-off-by: Gabor Javorszky <g.javorszky@f5.com>

Co-authored-by: Ava Hahn <a.hahn@f5.com>
Enables Unit to parse the tracestate and traceparent headers and add it
to the list, as well as calls to nxt_otel_test_and_call_state.

Signed-off-by: Ava Hahn <a.hahn@f5.com>
Adds code responsible for users to apply the `telemetry` configuration
options.

configuration snippet as follows:
{
    "settings": {
        "telemetry": {
            "batch_size": 20,
            "endpoint": "http://lgtm:4318/v1/traces",
            "protocol": "http",
            "sampling_ratio": 1
        }
    },
    "listeners": {
        "*:80": {
            "pass": "routes"
        }
    },
    "routes": [
        {
            "match": {
                "headers": {
                    "accept": "*text/html*"
                }
            },
            "action": {
                "share": "/usr/share/unit/welcome/welcome.html"
            }
        },
        {
            "action": {
                "share": "/usr/share/unit/welcome/welcome.md"
            }
        }
    ]
}

Signed-off-by: Ava Hahn <a.hahn@f5.com>
Signed-off-by: Gabor Javorszky <g.javorszky@f5.com>

Co-authored-by: Ava Hahn <a.hahn@f5.com>
Tiny bracket balance fix.

Signed-off-by: Ava Hahn <a.hahn@f5.com>
Signed-off-by: Gabor Javorszky <g.javorszky@f5.com>
These changes are generated by the openapi generator through a make
command.

Signed-off-by: Ava Hahn <a.hahn@f5.com>
Signed-off-by: Gabor Javorszky <g.javorszky@f5.com>

#[repr(C)]
pub struct nxt_str_t {
pub len: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this length please?, so it matches the C code (and wasm-wasi-component).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenTelemetry: Implement distributed tracing for requests
3 participants