Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Circuit-breaker assistance for recovery of a failing Akka Cluster #13

Open
arunkpatra opened this issue Jun 19, 2020 · 0 comments
Labels
enhancement New feature or request research Requires specialized research

Comments

@arunkpatra
Copy link
Owner

Implement Circuit-breaker at appropriate inter-service interactions
The two micro-services, the API and the Backend talk over gRPC. Typically, Thingverse would be installed in production on to a Kubernetes cluster. We leverage the Linkerd service mesh to do TLS offloading, Retry and Load balancing.

As discussed in #12 the network partition is a dreaded situation where things spiral out of control. So, we need to be very calculated and conservative with Retry logic and not further degrade an already precarious situation.

What could be done
I believe, we could start with conservative retry mechanisms to deal with the situation. Beyond that, we should cut off inbound traffic to the failing nodes altogether, and see if the remaining nodes can handle requests if the cluster is still in an healthy state. At the moment, we are focussing on dealing with retry at the service mesh layer, but we need to design keeping in mind the Omnibus release which would run outside of K8s (2.x release train?)

@arunkpatra arunkpatra added enhancement New feature or request research Requires specialized research labels Jun 19, 2020
@arunkpatra arunkpatra added this to the Thingverse - v1.0.0.M3 milestone Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research Requires specialized research
Projects
None yet
Development

No branches or pull requests

1 participant