You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to deploy our project with a gems upgrade that included upgrading health_check from v3.0.0 to v3.1.0, our new instances failed the health check with the message "Cache is returning garbage. ". As a result, our deploy failed.
Digging in to why, it seems that e3f5f14 changed the value written to the health check cache. Before, the health chekc:
writes "ok"
checks that read value is "ok"
Now, it:
writes "ok #{Time.now.to_i}"
checks that read value matches /^ok (\d+)$/
The problem is that these are incompatible and both the old and new code use the same check. When doing a rolling deploy, if a load balancer is constantly hitting the health check, old and new code is going to conflict with each other. That is, in our case, machines with the old health_check code was writing "ok" and the new health_check code thought that was "garbage".
We fixed this simply by temporarily disabling the cache check, deploying the new gem version, then re-enabling the cache check in the next deploy. But I think if the new code just used a different cache key (something other than __health_check_cache_test__), this would be unnecessary. Alternatively, the key could be made instance-dependent somehow (although the implementation might vary by application).
The text was updated successfully, but these errors were encountered:
When trying to deploy our project with a gems upgrade that included upgrading health_check from v3.0.0 to v3.1.0, our new instances failed the health check with the message "Cache is returning garbage. ". As a result, our deploy failed.
Digging in to why, it seems that e3f5f14 changed the value written to the health check cache. Before, the health chekc:
Now, it:
The problem is that these are incompatible and both the old and new code use the same check. When doing a rolling deploy, if a load balancer is constantly hitting the health check, old and new code is going to conflict with each other. That is, in our case, machines with the old health_check code was writing "ok" and the new health_check code thought that was "garbage".
We fixed this simply by temporarily disabling the cache check, deploying the new gem version, then re-enabling the cache check in the next deploy. But I think if the new code just used a different cache key (something other than
__health_check_cache_test__
), this would be unnecessary. Alternatively, the key could be made instance-dependent somehow (although the implementation might vary by application).The text was updated successfully, but these errors were encountered: