Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator tooling: cluster repair #17

Open
sanmiguel opened this issue Aug 8, 2016 · 1 comment
Open

Operator tooling: cluster repair #17

sanmiguel opened this issue Aug 8, 2016 · 1 comment

Comments

@sanmiguel
Copy link
Contributor

There have been scheduler bugs in the past that have led to orphaned executors: the only way currently to deal with these is to forcibly kill them off (e.g. using riak-mesos framework teardown) and start over.

We should investigate how we can provide tooling for an operator to manually bring a node back under control of a scheduler.

/cc @seanjensengrey

@seanjensengrey
Copy link

Not only scheduler bugs, but ZK corruption, etc. If we are to support running Riak clusters on Mesos with the same kind of uptime and longevity we see on bare metal we need to have ways to transition a node back to a normal operating state w/o killing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants