Skip to content

Commit

Permalink
fix
Browse files Browse the repository at this point in the history
  • Loading branch information
Michaelvll committed Oct 28, 2024
1 parent bea7fe0 commit 987df3d
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions docs/source/examples/managed-jobs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ By default, SkyPilot will try to recover a job when its underlying cluster is pr

In some cases, you may want a job to automatically restart on its own failures, e.g., when a training job crashes due to a Nvidia driver issue or NCCL timeouts. To specify this, you
can set :code:`max_restarts_on_failure` in :code:`resources.job_recovery` in the job YAML file.

.. code-block:: yaml
resources:
Expand Down

0 comments on commit 987df3d

Please sign in to comment.