Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] HTTPS Support #3380

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

[Serve] HTTPS Support #3380

wants to merge 23 commits into from

Conversation

cblmemo
Copy link
Collaborator

@cblmemo cblmemo commented Mar 28, 2024

Support HTTPS (SSL) Certificate on SkyServe Load Balancer.

Resolves (partially) #3198 . This PR only add TLS on load balancer - still need to figure out how to encrypt between LB and replica. #3458 might help this situation.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • The example in this PR
  • All Sky Serve smoke tests: pytest tests/test_smoke.py --serve
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_skyserve_https
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@cblmemo cblmemo force-pushed the serve-https-example-prototype branch from 030f39b to 56f2c04 Compare June 5, 2024 01:47
@cblmemo cblmemo marked this pull request as ready for review June 5, 2024 02:08
@cblmemo
Copy link
Collaborator Author

cblmemo commented Sep 5, 2024

Revamping this as a user is requesting the feature. Based on our previous discussion I disabled the update of the SSL, as it is not likely to update this - the most possible situation is the ssl file is not working and thus the service is failed anyway, so it should be fine to let users to restart the service.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the example @cblmemo! Left several comments. Could we also add a smoke test for the HTTPS support?

We should also add a doc for this in another PR.

Also, does this mean the communication between load balancer and replicas is still going through non-secured HTTP, which might be fine for now, but would be good to mention it in comments or the doc.

examples/serve/https/cert.pem Outdated Show resolved Hide resolved
examples/serve/https/service.yaml Outdated Show resolved Hide resolved
examples/serve/https/service.yaml Outdated Show resolved Hide resolved
sky/serve/load_balancer.py Outdated Show resolved Hide resolved
@cblmemo cblmemo changed the title [Serve] HTTPS example [Serve] HTTPS Support Sep 17, 2024
@cblmemo
Copy link
Collaborator Author

cblmemo commented Sep 17, 2024

TODO: Remove http:// schema in smoke test.

Done!

@cblmemo
Copy link
Collaborator Author

cblmemo commented Sep 18, 2024

All Sky Serve smoke test passed and this is ready for another look @Michaelvll !

@cblmemo
Copy link
Collaborator Author

cblmemo commented Sep 23, 2024

Thanks for adding the example @cblmemo! Left several comments. Could we also add a smoke test for the HTTPS support?

We should also add a doc for this in another PR.

Also, does this mean the communication between load balancer and replicas is still going through non-secured HTTP, which might be fine for now, but would be good to mention it in comments or the doc.

Doc added in #3972

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the support of HTTPS @cblmemo! Left several comments : )

Also, we should do some backward compatibility test to make sure existing service won't be affected by this change.

examples/serve/https/service.yaml Outdated Show resolved Hide resolved
@@ -127,6 +159,8 @@ def up(
controller_utils.maybe_translate_local_file_mounts_and_sync_up(task,
path='serve')

tls_template_vars = _rewrite_tls_credential_paths(service_name, task)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the function? Seems not actually rewrite?

Suggested change
tls_template_vars = _rewrite_tls_credential_paths(service_name, task)
tls_template_vars = _get_tls_template_vars(service_name, task)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we are rewriting the task.tls_credential in the function. Renamed to _rewrite_tls_credential_paths_and_get_tls_env_vars now.

sky/serve/core.py Outdated Show resolved Hide resolved
sky/serve/load_balancer.py Outdated Show resolved Hide resolved
@@ -337,6 +341,7 @@ def _get_service_from_row(row) -> Dict[str, Any]:
'requested_resources': pickle.loads(requested_resources)
if requested_resources is not None else None,
'requested_resources_str': requested_resources_str,
'tls_encrypted': bool(tls_encrypted),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing service, this return value will be None, and bool(tls_encrypted) can fail?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, bool(None) will return False:

python
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> bool(None)
False

But I agree that it is confusing. Changed it to 0 🫡

@@ -68,6 +68,9 @@ def create_table(cursor: 'sqlite3.Cursor', conn: 'sqlite3.Connection') -> None:
db_utils.add_column_to_table(_DB.cursor, _DB.conn, 'services',
'active_versions',
f'TEXT DEFAULT {json.dumps([])!r}')
# Whether the service's load balancer is encrypted with TLS.
db_utils.add_column_to_table(_DB.cursor, _DB.conn, 'services', 'tls_encrypted',
'INTEGER DEFAULT NULL')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add default value to existing entries?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a NULL default? Though i changed this to 0 to avoid confusion 🫡

sky/serve/serve_utils.py Outdated Show resolved Hide resolved
tests/skyserve/https/service.yaml.j2 Outdated Show resolved Hide resolved
@cblmemo
Copy link
Collaborator Author

cblmemo commented Oct 24, 2024

@Michaelvll All comments resolved and now running the smoke test. PTAL agin!

@cblmemo
Copy link
Collaborator Author

cblmemo commented Oct 24, 2024

HTTPS smoke test passed!

@cblmemo cblmemo added this to the v0.7 milestone Oct 25, 2024
@romilbhardwaj romilbhardwaj removed this from the v0.7 milestone Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants