Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better cluster management and discovery #189

Open
ian-r-rose opened this issue May 20, 2021 · 3 comments
Open

Better cluster management and discovery #189

ian-r-rose opened this issue May 20, 2021 · 3 comments

Comments

@ian-r-rose
Copy link
Collaborator

AKA Let's delete dask-labextension's ClusterManager

Goal

For a few years now this extension has had a built-in cluster manager, which is what powers the user interface on the frontend. The reasons for having this are basically twofold:

  1. When the labextension has a handle on the Cluster object it is easier to start, stop, and scale those clusters, and build a user interface to do so.
  2. If the labextension knows about what clusters are running it is easier to set up proxies for the dashboards.

However, the uptake of the labextension cluster management facility has been limited. Some people use it, but more often I see people create clusters in their notebooks and then use the magnifying glass icon to connect to the dashboard. I think there are several reasons why the cluster manager hasn't been particularly popular:

  • Clusters are launched within the Jupyter server process. This can be nice because they can outlive your notebook kernel, but it can also be annoying to have it be tied to your JupyterLab process (killing your server is pretty common, and you may not want it to kill your cluster connections as well).
  • Related to the above, even if your clusters outlive restart of the server process, there is no real discovery mechanism to re-connect to them. They might as well be gone, except now they might be costing you money
  • Using the dask config to customize cluster creation args is annoying, and changes only take place if you restart your server.
  • It doesn't allow you to create more than one type of cluster (e.g., a KubeCluster and a LocalCluster).

Proposal

@jacobtomlinson has a new project dask-ctl, which has many of the same goals (and similar API) to the ClusterManager in this project. However, it also has some benefits that address some of the above problems:

  • It has an extensible entrypoints-based system for registering new kinds of clusters with the discovery mechanism
  • It can handle multiple cluster types
  • Cluster discovery can survive Jupyter server restarts
  • I would get to delete some code here :)

Let's explore replacing the ClusterManager in this package with dask-ctl!

Possible stumbling blocks

  1. dask-ctl is still being incubated as a contrib package. How cautious should we be about adopting it as a dependency?
  2. There would likely be some features to upstream to that package (e.g., I would want it to also be able to produce a code snippet for connecting to an existing cluster that can be used for insertion into notebooks). Fortunately, I suspect the maintainers would be amenable to such things.
@jacobtomlinson
Copy link
Member

Yay for all of this! I am very keen to improve dask-ctl to a point where it can replace the cluster manager here.

I would want it to also be able to produce a code snippet for connecting to an existing cluster that can be used for insertion into notebooks

That code sounds pretty specific to the lab extension. Maybe we can add some things to dask-ctl which yield enough information for the lab extension to put that snippet together.

@ian-r-rose
Copy link
Collaborator Author

I would want it to also be able to produce a code snippet for connecting to an existing cluster that can be used for insertion into notebooks

That code sounds pretty specific to the lab extension. Maybe we can add some things to dask-ctl which yield enough information for the lab extension to put that snippet together.

Yeah, I certainly don't need a literal code snippet, but something that could be sent over the wire would be very helpful. Also, this doesn't have to live in dask-ctl, but it feels like it might be appropriate. Basically, it would be nice to have a JSON-serializable version of from_name that allows us to reconstruct a cluster in the client.

On thing I haven't really wrapped my mind around is what the relationship of this effort to dask-gateway should be. That project also has some lifecycle management, as well as multi-cluster-type capabilities. There is some discussion on how to bring them closer together in #135, but we never really landed on anything.

@jacobtomlinson
Copy link
Member

The lifecycle management in Dask Gateway is great, but only is you use Dask gateway. I'm trying to bring these features to everyone else.

I could see somewhere down the line Dask Gateway supporting dask-ctl as a backend. This would open Dask Gateway up to other platforms like the cloud without having to reimplement all of dask-cloudprovider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants