-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DigitalOcean] droplet integration #3832
base: master
Are you sure you want to change the base?
Conversation
@asaiacai I would be really interested in using this work -- do you know when you might be able to land this PR? |
Hello! I’m aiming for this to land in next two weeks providing testing goes
smoothly. Stay tuned!
|
@Michaelvll i think this PR is finally ready, once you have a free moment! I've currently disabled tests for gpu droplets as they are still in early access, but I think I got the remaining tests to work on normal CPU droplets fine. Let me know if you find any issues. |
also this catalog update is required to pass the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this amazing work @asaiacai ! This would be very helpful. Left some discussions ;)
sky/clouds/do.py
Outdated
clouds.CloudImplementationFeatures.DOCKER_IMAGE: | ||
'Docker container images as runtime environments' | ||
f' are not supported in {_REPR}. Try using in `run`', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we support it? I saw relevant code snippet in the ray yaml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included a startup script that installs docker, but ssh comes up in parallel before the startup script finishes running resulting in the docker environment setup failing. Docker does eventually get setup on the machine. If I for example, reattempt to launch on the same cluster after waiting for like 10 seconds, I can successfully use the docker environment, but the times i tested it never worked on freshly provisioned instances. Let me know what would make sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use a base image with docker installed? Or, can we wait for the docker to finish setup for DO cloud?
skypilot/sky/provision/docker_utils.py
Lines 336 to 343 in e870839
def _check_docker_installed(self): | |
no_exist = 'NoExist' | |
cleaned_output = self._run( | |
f'command -v {self.docker_cmd} || echo {no_exist!r}') | |
if no_exist in cleaned_output or 'docker' not in cleaned_output: | |
logger.error( | |
f'{self.docker_cmd.capitalize()} not installed. Please use an ' | |
f'image with {self.docker_cmd.capitalize()} installed.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added waiting for docker
to _check_docker_installed
but let me know if this should only be specific to DO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that it will take too long to error out for an image without docker installation. Is it possible that we somehow detect if the docker installation is in progress, and immediately return if we found no installation?
Co-authored-by: Tian Xia <cblmemo@gmail.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
@cblmemo reran smoke, but if you have a moment to review again, thanks in advance! |
This adds digital ocean droplets to the sky.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py --do