Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Errors and Unhealthy Upstreams during Health Checks #130

Open
surenraju-careem opened this issue Jul 6, 2023 · 1 comment
Open

Comments

@surenraju-careem
Copy link

We are experiencing frequent timeout errors during the health checks of our services. While the health APIs work fine when invoked directly from the nodes, we encounter issues during the health checks performed by Kong's lua-resty-healthcheck library.

The timeout errors are logged as follows:

Unhealthy TIMEOUT increment (10/3) for 'my-service.my-domain.com(10.123.321.234:443)', context: ngx.timer
Failed to receive status line from 'my-service.my-domain.com(10.123.321.234:443)': timeout, context: ngx.timer
Failed SSL handshake with 'my-service.my-domain.com(10.123.321.234:443)': handshake failed, context: ngx.timer

It is important to note that this issue affects specific upstreams, and only one or two pods at a time experience this problem. The upstreams remain in an unhealthy state and do not recover automatically. The issue is resolved temporarily by restarting the affected Kong pod, which sets the upstream to a healthy state again.

Upon investigating the code used by Kong's lua-resty-healthcheck library, it appears that the health check query is performed using HTTP/1.0. The relevant code snippet is as follows:

local request = ("GET %s HTTP/1.0\r\n%sHost: %s\r\n\r\n"):format(path, headers, hostheader or hostname or ip)

Considering this, we suspect that the timeouts might be related to the usage of HTTP/1.0 instead of HTTP/1.1. We believe that updating the health check query to use HTTP/1.1 might help mitigate these timeout errors.

We kindly request to make the necessary changes to the lua-resty-healthcheck library to use HTTP/1.1 for health checks. This update should help improve the reliability of the health checks and prevent the upstreams from getting stuck in an unhealthy state.

@nowNick
Copy link
Contributor

nowNick commented Nov 8, 2023

Hi @surenraju-careem

Thanks for this deep investigation and kind request. Just to make sure we're not having paralel conversations I'd like to link this one to: #128 as it seems to be that one's duplicate 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants