Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark self(backend) as defunct if cannot communicate with leader and notify associated recorder #61

Open
anvinjain opened this issue Mar 14, 2017 · 2 comments

Comments

@anvinjain
Copy link

anvinjain commented Mar 14, 2017

The sequence to be ensured here is:
t0=Failure to report load to leader as observed by backend, tick is not incremented in next report if failure observed
t1.a1=Failures continue to happen, leader marks backend as defunct, stops assigning new process groups to it
t1.a2=Reports succeed, back to state before t0
t2.a1.b1=Failures continue to happen, backend marks self as defunct, sends 5xx for /poll when existing assigned recorders make requests. Ensure t2>t1, i.e. backend marks self as defunct only after leader marks it as defunct
t3.a1.b1=Backend starts sending exploratory reports (tick=0)
t4.a1.b1=Recorder gives up on backend since it keeps receiving 5xx, calls /associate to get a new backend, leader assigns a new backend to recorder's process group, de-associates the defunct backend
t5.a1.b1=N successive exploratory ticks succeed, backend marks self as available again, sends usual reports to leader
t6.a1.b1=Reports succeed, eventually leader marks backend as available again

Description:
If backend is not able to talk to leader (in case of n/w partition) for some successive load reports, it should mark self as defunct and respond to all /poll requests with 5xx. /profile calls should not be errored.
The above is required so that recorder detects something to be wrong with backend and calls /association again to get a healthy backend via leader.
Backend should mark self as available again if it can communicate with leader for some successive report intervals.

@janmejay
Copy link

Some pictorial description of sequence of events will help.

@anvinjain
Copy link
Author

Updated description with a textual timeline description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants