Help connecting to existing KubeCluster using the build-in Discovery Mechanism #255

jerrygb · 2023-01-10T15:00:01Z

Describe the issue:

I am able to create clusters, connect using dask clients and perform Dask operations without issues using KubeCluster Operator on a Notebook. I am also able to connect to the status dashboard using port-forwarding to the scheduler.

However, I am not able to connect to these clusters when using the lab extensions. When I try to move to an active notebook and click search on the Dask Lab-extension, it does picks up a remote cluster address. The Dashboard URLs that are picked up the extension code look like,

http://internal-scheduler.namespace:8787/

But, I think the extension is not able to connect to it. I do not see any logs pertaining to this action.

Do these dashboards need to be external (meaning are these connections made from browser or backend service)?
Since I was not sure about this, I tried setting up AWS NLB. I tried connecting to the NLB address using the Client as seen in the second snippet below.

Minimal Complete Verifiable Example:

All of the following code snippets work fine from the notebook.

# Create a cluster
from dask_kubernetes.operator import make_cluster_spec, make_worker_spec
from dask_kubernetes.operator import KubeCluster
from dask.distributed import Client
import dask.dataframe as dd
import os
profile_name = namespace_name

custom_spec = make_cluster_spec(name=profile_name, image='ghcr.io/dask/dask:latest', resources={"requests": {"memory": "512Mi"}, "limits": {"cpu": "4","memory": "8Gi"}})

custom_spec['spec']['scheduler']['spec']['serviceAccount'] = 'default-editor'
custom_spec['spec']['worker']['spec']['serviceAccount'] = 'default-editor'


custom_worker_spec = make_worker_spec(image='ghcr.io/dask/dask:latest', n_workers=6, resources={"requests": {"memory": "512Mi"}, "limits": {"memory": "12Gi"}})
custom_worker_spec['spec']['serviceAccount'] = 'default-editor'
custom_worker_spec
cluster = KubeCluster(custom_cluster_spec=custom_spec, n_workers=0)
cluster.add_worker_group(name='highmem', custom_spec=custom_worker_spec)

As mentioned, let's assume that I have AWS NLB type LoadBalancer/Ingress Service. Then the Dask Client is able to successfully interact against 8787 and 8786 ports on the scheduler in order to manage the workers and jobs, externally.

# Connect to external endpoint works fine
import dask; from dask.distributed import Client
dask.config.set({'scheduler-address': 'tcp://nlb-address.region.elb.amazonaws.com:8786'})
client = Client()

Anything else we need to know?:

Another thing noticed was that the dask-extension relies on testDaskDashboard function to pick up the URL info (defined in https://github.com/dask/dask-labextension/blob/main/src/dashboard.tsx#L588),

In the console, I can see,

Found the dashboard link at 'http://internal-scheduler.namespace:8787/status'

However, the consequent dashboard-check request to the backend is replacing an extra / from protocol.

See the GET request below,

GET https://website/notebook/internal/test-dask-1/dask/dashboard-check/http%3A%2Finternal-scheduler.namespace%3A8787%2F?1673363416491

To be a bit more verbose,

http%3A%2Finternal-scheduler.namespace%3A8787%2F?1673363416491 translates to http:/internal-scheduler.namespace:8787/?1673363416491

I am not sure if this is expected or a bug.

Environment:

Dask version: 2022.12.1
Dask Kubernetes: 2022.12.0
@dask-labextension: v6.0.0
@jupyterlab/server-proxy: v3.2.2
Python version: 3.8.10
Platform: Kubeflow
Install method (conda, pip, source): pip

The text was updated successfully, but these errors were encountered:

thedeg123 · 2024-02-15T19:00:00Z

I'm having this same issue. I can connect to the dashboard fine from the notebook, but not from the lab extension. Looking at the Network request failure I see ERR_NAME_NOT_RESOLVED for a GET request to my-dask-scheduler.namespace/cluster-map. Did you ever solve this issue?

jacobtomlinson · 2024-04-26T08:02:43Z

You will need to configure the dashboard address to use the Jupyter proxy, this varies between setups so it's hard for us to set a sane default for.

If you create the cluster with KubeCluster then the dashboard port will be proxied to the node where Jupyter is running. You likely need to set DASK_DISTRIBUTED__DASHBOARD__LINK="proxy/{host}:{port}/status" for this to work correctly, but this will vary depending on how you've configured your Jupyter environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help connecting to existing KubeCluster using the build-in Discovery Mechanism #255

Help connecting to existing KubeCluster using the build-in Discovery Mechanism #255

jerrygb commented Jan 10, 2023 •

edited

Loading

thedeg123 commented Feb 15, 2024

jacobtomlinson commented Apr 26, 2024

Help connecting to existing KubeCluster using the build-in Discovery Mechanism #255

Help connecting to existing KubeCluster using the build-in Discovery Mechanism #255

Comments

jerrygb commented Jan 10, 2023 • edited Loading

thedeg123 commented Feb 15, 2024

jacobtomlinson commented Apr 26, 2024

jerrygb commented Jan 10, 2023 •

edited

Loading