Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access dashboard from scheduler with no public IP? #194

Open
valpesendorfer opened this issue May 31, 2021 · 9 comments
Open

Access dashboard from scheduler with no public IP? #194

valpesendorfer opened this issue May 31, 2021 · 9 comments

Comments

@valpesendorfer
Copy link

Is it possible to connect to a scheduler without a public IP running in the same AWS VPC?

What happened:

I'm running a jupyter lab on an AWS EC2 instance and I'm trying to connect to a scheduler with a private IP that was created with dask-cloudprovider running in the same VPC as the lab.

The dashboard gets picked up by the extension as expected, all buttons become orange:

image

But if I open any of them, they remain blank for a while until they error out with a "took too long to respond" message:

image

image

What you expected to happen:

The extension showing the requested plots and metrics.

Anything else we need to know?:

Accessing the dashboard from the lab terminal with curl works, showing that it's not a basic networking issue:

image

Nothing apparent in the console:

image

Extensions:

jovyan@201435007e31:~$ jupyter serverextension list
config dir: /usr/local/etc/jupyter
    dask_labextension  enabled 
    - Validating...
      dask_labextension 5.0.1 OK
    jupyter_server_proxy  enabled 
    - Validating...
      jupyter_server_proxy  OK
    jupyterlab  enabled 
    - Validating...
      jupyterlab 3.0.16 OK

Environment:

  • Dask version: 2021.5.0
  • Python version: 3.8.8
  • Operating System: Ubuntu Bionic
  • Install method (conda, pip, source): pip
@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jun 1, 2021

Sure that makes sense. The dashboards are being accessed by your browser, so the scheduler needs a public IP for this to work. The address discovered by the extension is where the scheduler thinks the dashboard is, which in this case is correct, it's just not accessible to you.

It looks like you have the jupyter proxy extension installed so you should be able to use that to access it that way. You may need to add a little config to allow access but my guess is your dashboard will be available at

http://<Jupyter IP>/proxy/10.50.172.202/8787/status

@valpesendorfer
Copy link
Author

thanks @jacobtomlinson, makes sense indeed. Maybe I got a bit too excited with the orange buttons and was hoping the extension would proxy itself ...

I've tried to access the dashboard, but looks like it's being blocked ... digging through the code and docs, it looks like I need to add the host to host_allowlist - does that go into the jupyter lab config file? Or can you point me to some example?

The downside is that I'd need to add / change the IP for every cluster, maybe not a permanent solution

@ian-r-rose
Copy link
Collaborator

thanks @jacobtomlinson, makes sense indeed. Maybe I got a bit too excited with the orange buttons and was hoping the extension would proxy itself ...

In the medium-term I would like to make this possible, it requires some internal plumbing, but is doable. I would probably do this in concert with some other proxy-related work, cf #190. But for the time being @jacobtomlinson is correct, it needs to be visible to your browser, not just the JupyterLab server.

I've tried to access the dashboard, but looks like it's being blocked ... digging through the code and docs, it looks like I need to add the host to host_allowlist - does that go into the jupyter lab config file? Or can you point me to some example?

Yes, this should go in the jupyter config file, something like c.ServerProxy.host_allowlist = ["10.x.x.x"].

The downside is that I'd need to add / change the IP for every cluster, maybe not a permanent solution

Yeah, a better long term solution would be to allow this extension to dynamically. But I think that host_allowlist can take a callable, so you may be able to do something like

c.ServerProxy.host_allowlist = lambda ip: ip.split(".")[0] == "10"

(I have not tried this myself)

@valpesendorfer
Copy link
Author

thanks @ian-r-rose ... required some fiddeling but it works now! Not going into details as this is more a proxy thing. I'd leave this issue open since this might be integrated into the extension. Feel free to close. Thanks!

@rmcsqrd
Copy link

rmcsqrd commented Jun 7, 2022

@valpesendorfer Do you mind sharing what steps you took to get it working? I have my notebook and scheduler deployed as containers on ECS and am able to successfully curl http://Dask-Scheduler:8787/status but am having trouble getting it to proxy so I can use the dask lab extension. (Dask-Scheduler is based on the docker network. This also returns the expected response if I run curl 34.xx.xx.xx/status where 34/xx.xx.xx.xx is the external IP address from the ECS task).

Based on discussion above I:

  • added 34.xx.xx.xx to ~/.jupyter/jupyter_lab_config.py. I am unclear if modifications to this file will automatically be loaded? It is difficult to include the IP address to the container before launching it because I think it is automatically assigned. I was unable to find ways online to reload this configuration file while jupyter lab is actively running.
  • Tried accessing the following URL from my browser http://sagem-loadb-xxx.elb.us-west-2.amazonaws.com:8888/proxy/34.xx.xx.xx/8787/status where
    • http://sagem-loadb-xxx.elb.us-west-2.amazonaws.com:8888 is the URL of the loadbalancer that my notebook is accessible through.
    • 34.xx.xx.xx was looked up within the AWS console.

Trying to access the URL above resulted in a 404 error. I've tried several other URL combinations without luck as well. Thanks.

@valpesendorfer
Copy link
Author

@rmcsqrd sorry, while I did make it work, I abandoned this idea and haven't used anything like it since, so I forgot all the details. But essentially it's like the steps outline above: add the jupyter proxy extension, generate a config file if not already present, set the host allowlist so it allows only IPs from your internal CIDR and that's it. After that, your dashboard should be available through the proxy. But haven't used it with a containerized scheduler or sagemaker, not sure if it makes any difference (shouldn't though)

@rmcsqrd
Copy link

rmcsqrd commented Jun 9, 2022

@valpesendorfer Thanks for the response and the outline of steps.

RE generating the config file, I am assuming you did that by running jupyter lab --generate-config then adding a line similar to c.ServerProxy.host_allowlist = ["10.x.x.x"].

Do you know if jupyter lab will "hot reload" config changes if it is currently running or if it needs to be restarted? I am running into an issue where my jupyter lab container entrypoint immediately starts running jupyter lab; I tried to generate the config file externally then build it into the container but am unclear if that worked. I tried googling about the "hot reload" config thing but couldn't find anything. Thanks in advance for any insight you might have.

@valpesendorfer
Copy link
Author

@rmcsqrd If I remember well, I used a callable as the comment suggests. If you specify a string like you do in the example, it'd probably take the xs literally.

RE hot-reload, I have no idea, sorry. But my gut feeling is no.

@ian-r-rose
Copy link
Collaborator

Thanks for the additional detail @valpesendorfer.

I don't expect hot-reloading to work. You will typically need to make sure that the configuration is present before the serve starts up (some hosted systems allow you to configure a start script or similar, though I'm not familiar with how sagemaker does it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants