CLI: Deprecate cpunode/gpunode/tpunode, hide admin (#2800)

* CLI: deprecate + hide interactive node commands and `admin` * Purge interactive node mentions in docs. * Update docs/source/examples/gpu-jupyter.rst Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * updates * add TODO --------- Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
skypilot-org · Nov 18, 2023 · 3a7c858 · 3a7c858
1 parent 77a32d2
commit 3a7c858
Show file tree

Hide file tree

Showing 10 changed files with 79 additions and 213 deletions.
diff --git a/docs/source/examples/auto-failover.rst b/docs/source/examples/auto-failover.rst
@@ -16,11 +16,9 @@ searching for regions (or clouds) that can provide the requested resources.
 
 .. tip::
 
-  No action is required to use this feature.
-
-  Auto-failover is automatically enabled whenever a new cluster is to be
-  provisioned, such as during :code:`sky launch` or the :ref:`interactive node
-  commands <interactive-nodes>` :code:`sky {gpunode,cpunode,tpunode}`.
+  No action is required to use this feature.  Auto-failover is automatically
+  enabled whenever a new cluster is to be provisioned, such as during :code:`sky
+  launch`.
 
   If specific :code:`cloud`, ``region``, or ``zone`` are requested for a
   task, auto-failover retries only within the specified location.
@@ -36,16 +34,8 @@ provisioner handles such a request:
 
 .. code-block::
 
-  $ sky gpunode -c gpu --gpus V100
-  I 02-11 21:17:43 optimizer.py:211] Defaulting estimated time to 1 hr. Call Task.set_time_estimator() to override.
-  I 02-11 21:17:43 optimizer.py:317] Optimizer - plan minimizing cost (~$3.0):
-  I 02-11 21:17:43 optimizer.py:332]
-  I 02-11 21:17:43 optimizer.py:332] TASK     BEST_RESOURCE
-  I 02-11 21:17:43 optimizer.py:332] gpunode  GCP(n1-highmem-8, {'V100': 1.0})
-  I 02-11 21:17:43 optimizer.py:332]
-  I 02-11 21:17:43 optimizer.py:285] Considered resources -> cost
-  I 02-11 21:17:43 optimizer.py:286] {AWS(p3.2xlarge): 3.06, GCP(n1-highmem-8, {'V100': 1.0}): 2.953212}
-  I 02-11 21:17:43 optimizer.py:286]
+  $ sky launch -c gpu --gpus V100
+  ...  # optimizer output
   I 02-11 21:17:43 cloud_vm_ray_backend.py:1034] Creating a new cluster: "gpu" [1x GCP(n1-highmem-8, {'V100': 1.0})].
   I 02-11 21:17:43 cloud_vm_ray_backend.py:1034] Tip: to reuse an existing cluster, specify --cluster-name (-c) in the CLI or use sky.launch(.., cluster_name=..) in the Python API. Run `sky status` to see existing clusters.
   I 02-11 21:17:43 cloud_vm_ray_backend.py:614] To view detailed progress: tail -n100 -f sky_logs/sky-2022-02-11-21-17-43-171661/provision.log
@@ -78,17 +68,9 @@ AWS, where it succeeded after two regions:
 
 .. code-block::
 
-  $ sky gpunode --gpus V100:8
-  I 02-23 16:39:59 optimizer.py:213] Defaulting estimated time to 1 hr. Call Task.set_time_estimator() to override.
-  I 02-23 16:39:59 optimizer.py:323] Optimizer - plan minimizing cost (~$20.3):
-  I 02-23 16:39:59 optimizer.py:337]
-  I 02-23 16:39:59 optimizer.py:337] TASK     BEST_RESOURCE
-  I 02-23 16:39:59 optimizer.py:337] gpunode  GCP(n1-highmem-8, {'V100': 8.0})
-  I 02-23 16:39:59 optimizer.py:337]
-  I 02-23 16:39:59 optimizer.py:290] Considered resources -> cost
-  I 02-23 16:39:59 optimizer.py:292] {GCP(n1-highmem-8, {'V100': 8.0}): 20.313212, AWS(p3.16xlarge): 24.48}
-  I 02-23 16:39:59 optimizer.py:292]
-  I 02-23 16:39:59 cloud_vm_ray_backend.py:1010] Creating a new cluster: "sky-gpunode-zongheng" [1x GCP(n1-highmem-8, {'V100': 8.0})].
+  $ sky launch -c v100-8 --gpus V100:8
+  ...  # optimizer output
+  I 02-23 16:39:59 cloud_vm_ray_backend.py:1010] Creating a new cluster: "v100-8" [1x GCP(n1-highmem-8, {'V100': 8.0})].
   I 02-23 16:39:59 cloud_vm_ray_backend.py:1010] Tip: to reuse an existing cluster, specify --cluster-name (-c) in the CLI or use sky.launch(.., cluster_name=..) in the Python API. Run `sky status` to see existing clusters.
   I 02-23 16:39:59 cloud_vm_ray_backend.py:658] To view detailed progress: tail -n100 -f sky_logs/sky-2022-02-23-16-39-58-577551/provision.log
   I 02-23 16:39:59 cloud_vm_ray_backend.py:668]
@@ -112,14 +94,7 @@ AWS, where it succeeded after two regions:
   E 02-23 16:41:50 cloud_vm_ray_backend.py:746] Failed to acquire resources in all regions/zones (requested GCP(n1-highmem-8, {'V100': 8.0})). Try changing resource requirements or use another cloud.
   W 02-23 16:41:50 cloud_vm_ray_backend.py:891]
   W 02-23 16:41:50 cloud_vm_ray_backend.py:891] Provision failed for GCP(n1-highmem-8, {'V100': 8.0}). Trying other launchable resources (if any)...
-  I 02-23 16:41:50 optimizer.py:213] Defaulting estimated time to 1 hr. Call Task.set_time_estimator() to override.
-  I 02-23 16:41:50 optimizer.py:323] Optimizer - plan minimizing cost (~$24.5):
-  I 02-23 16:41:50 optimizer.py:337]
-  I 02-23 16:41:50 optimizer.py:337] TASK     BEST_RESOURCE
-  I 02-23 16:41:50 optimizer.py:337] gpunode  AWS(p3.16xlarge)
-  I 02-23 16:41:50 optimizer.py:337]
-  I 02-23 16:41:50 cloud_vm_ray_backend.py:658] To view detailed progress: tail -n100 -f sky_logs/sky-2022-02-23-16-39-58-577551/provision.log
-  I 02-23 16:41:50 cloud_vm_ray_backend.py:668]
+  ...
   I 02-23 16:41:50 cloud_vm_ray_backend.py:668] Launching on AWS us-east-1 (us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f)
   W 02-23 16:42:15 cloud_vm_ray_backend.py:477] Got error(s) in all zones of us-east-1:
   W 02-23 16:42:15 cloud_vm_ray_backend.py:479]   create_instances: Attempt failed with An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 0): We currently do not have sufficient p3.16xlarge capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get p3.16xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1b, us-east-1d, us-east-1f., retrying.

diff --git a/docs/source/examples/gpu-jupyter.rst b/docs/source/examples/gpu-jupyter.rst
@@ -5,22 +5,20 @@ Jupyter notebooks are a useful tool for interactive development, debugging, and
 visualization. SkyPilot makes the process of running a GPU-backed Jupyter notebook
 simple by automatically managing provisioning and port forwarding.
 
-To get a machine with a GPU attached, we recommend using an interactive **GPU node**.
-You can read more about interactive nodes :ref:`here <interactive-nodes>`.
+To get a machine with a GPU attached, use:
 
 .. code-block:: bash
 
    # Launch a VM with 1 NVIDIA GPU and forward port 8888 to localhost
-   sky gpunode -p 8888 -c jupyter-vm --gpus K80:1
+   sky launch -c jupyter-vm --gpus K80:1
+   ssh -L 8888:localhost:8888 jupyter-vm
 
 .. note::
 
   View the supported GPUs with the :code:`sky show-gpus` command.
 
-
-The above command will automatically log in to the cluster once the cluster is provisioned (or re-use an existing one).
-
-Inside the VM, you can run the following commands to start a Jupyter session:
+Use ``ssh jupyter-vm`` to SSH into the VM. Inside the VM, you can run the
+following commands to start a Jupyter session:
 
 .. code-block:: bash
 

diff --git a/docs/source/getting-started/quickstart.rst b/docs/source/getting-started/quickstart.rst
@@ -123,7 +123,7 @@ This may show multiple clusters, if you have created several:
 .. code-block::
 
   NAME       LAUNCHED     RESOURCES             COMMAND                            STATUS
-  gcp        1 day ago    1x GCP(n1-highmem-8)  sky cpunode -c gcp --cloud gcp     STOPPED
+  mygcp      1 day ago    1x GCP(n1-highmem-8)  sky launch -c mygcp --cloud gcp    STOPPED
   mycluster  4 mins ago   1x AWS(p3.2xlarge)    sky exec mycluster hello_sky.yaml  UP
 
 
@@ -152,6 +152,9 @@ Simply run :code:`ssh <cluster_name>` to log into a cluster:
 
 The above are achieved by adding appropriate entries to ``~/.ssh/config``.
 
+Because SkyPilot exposes SSH access to clusters, this means clusters can be easily used inside
+tools such as `Visual Studio Code Remote <https://code.visualstudio.com/docs/remote/remote-overview>`_.
+
 Transfer files
 ===============
 
@@ -178,6 +181,16 @@ To terminate a cluster instead, run :code:`sky down`:
 
   $ sky down mycluster
 
+.. note::
+
+    Stopping a cluster does not lose data on the attached disks (billing for the
+    instances will stop while the disks will still be charged).  Those disks
+    will be reattached when restarting the cluster.
+
+    Terminating a cluster will delete all associated resources (all billing
+    stops), and any data on the attached disks will be lost.  Terminated
+    clusters cannot be restarted.
+
 Find more commands that manage the lifecycle of clusters in the :ref:`CLI reference <cli>`.
 
 Scaling out
@@ -186,7 +199,7 @@ Scaling out
 So far, we have used SkyPilot's CLI to submit work to and interact with a single cluster.
 When you are ready to scale out (e.g., run 10s or 100s of jobs), SkyPilot supports two options:
 
-- Queue jobs on one or more clusters with ``sky exec`` (see :ref:`Job Queue <job-queue>`); or
+- Queue many jobs on your cluster(s) with ``sky exec`` (see :ref:`Job Queue <job-queue>`);
 - Use :ref:`Managed Spot Jobs <spot-jobs>` to run on auto-managed spot instances
   (users need not interact with the underlying clusters)
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -132,7 +132,6 @@ Documentation
    examples/docker-containers
    examples/ports
    reference/tpu
-   reference/interactive-nodes
    reference/logging
    reference/faq
 

diff --git a/docs/source/reference/cli.rst b/docs/source/reference/cli.rst
@@ -69,23 +69,6 @@ Managed Spot Jobs CLI
    :prog: sky spot logs
    :nested: full
 
-Interactive Node CLI
------------------------
-
-.. click:: sky.cli:cpunode
-   :prog: sky cpunode
-   :nested: full
-
-.. _sky-gpunode:
-.. click:: sky.cli:gpunode
-   :prog: sky gpunode
-   :nested: full
-
-.. click:: sky.cli:tpunode
-   :prog: sky tpunode
-   :nested: full
-
-
 Storage CLI
 ------------
 

diff --git a/docs/source/reference/interactive-nodes.rst b/docs/source/reference/interactive-nodes.rst
diff --git a/docs/source/reference/tpu.rst b/docs/source/reference/tpu.rst
@@ -16,15 +16,17 @@ ML researchers and students are encouraged to apply for free TPU access through
 Getting TPUs in one command
 ===========================
 
-Like :ref:`GPUs <interactive-nodes>`, SkyPilot provides a simple command to quickly get TPUs for development:
+Use one command to quickly get TPU nodes for development:
 
 .. code-block:: bash
 
-   sky tpunode                                # By default TPU v2-8 is used
-   sky tpunode --use-spot                     # Preemptible TPUs
-   sky tpunode --tpus tpu-v3-8                # Change TPU type to tpu-v3-8
-   sky tpunode --instance-type n1-highmem-16  # Change the host VM type to n1-highmem-16
-   sky tpunode --tpu-vm                       # Use TPU VM (instead of TPU Node)
+   sky launch --gpus tpu-v2-8
+   # Preemptible TPUs:
+   sky launch --gpus tpu-v2-8 --use-spot
+   # Change TPU type to tpu-v3-8:
+   sky launch --gpus tpu-v3-8
+   # Change the host VM type to n1-highmem-16:
+   sky launch --gpus tpu-v3-8 -t n1-highmem-16
 
 After the command finishes, you will be dropped into a TPU host VM and can start developing code right away.
 
@@ -48,7 +50,7 @@ More details can be found on GCP `documentation <https://cloud.google.com/tpu/do
 TPU VMs
 -------
 
-To use TPU VMs, set the following in a task YAML's ``resources`` field: 
+To use TPU VMs, set the following in a task YAML's ``resources`` field:
 
 .. code-block:: yaml
 
@@ -223,7 +225,7 @@ To use a TPU Pod, simply change the ``accelerators`` field in the task YAML  (e.
    :emphasize-lines: 2-2
 
    resources:
-      accelerators: tpu-v2-32  # Pods have > 8 cores (the last number) 
+      accelerators: tpu-v2-32  # Pods have > 8 cores (the last number)
       accelerator_args:
          runtime_version: tpu-vm-base
          tpu_vm: True