Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] On-demand single-host TPU support on GKE #3947

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
a929474
initial version of TPU support on GKE
landscapepainter Sep 16, 2024
80e1877
revert unnecesary change
landscapepainter Sep 16, 2024
70a07ab
revert
landscapepainter Sep 16, 2024
0cba9a5
use TPU_LABEL_KEY constant
landscapepainter Sep 17, 2024
17bcbd8
nit
landscapepainter Sep 17, 2024
9233bf5
nit
landscapepainter Sep 17, 2024
12e62c0
update detect_gpu_label_formatter() to use match_label_key()
landscapepainter Sep 17, 2024
c795fe7
tidy get_gpu_label_key_value
landscapepainter Sep 17, 2024
1c895f0
nit
landscapepainter Sep 17, 2024
a8f5b6b
update method name
landscapepainter Sep 17, 2024
bdb3469
update get_gke_accelerator_name to support TPU
landscapepainter Sep 17, 2024
1d2d243
add support for get_label_keys method due to TPU label key
landscapepainter Sep 17, 2024
92f4f38
syntax
landscapepainter Sep 17, 2024
2662ec8
update get_tpu_topology_label_key_value
landscapepainter Sep 17, 2024
58f8ad6
nit
landscapepainter Sep 20, 2024
1cf82b6
refactor error surfacing methods to have it work with TPU support
landscapepainter Sep 20, 2024
7b551c9
update toleration comment
landscapepainter Sep 21, 2024
81a05ee
support listing available TPUs and show-gpus for TPUs
landscapepainter Sep 21, 2024
e8764f1
nit
landscapepainter Sep 21, 2024
3497aee
update help message
landscapepainter Sep 21, 2024
724806a
Update /tmp/tpu_logs dir's write permission
landscapepainter Sep 22, 2024
e8d73fe
nit
landscapepainter Sep 22, 2024
7ac5036
nit
landscapepainter Sep 22, 2024
4470dbe
comment update on TPU resource lackage error handling
landscapepainter Sep 22, 2024
0860e45
Update to use global constant instead of hard coded string of nvidia.…
landscapepainter Sep 22, 2024
35f3c80
add smoke test and make exec work on TPU pods
landscapepainter Sep 23, 2024
2b56a9e
update smoke test to check if TPU is reachable.
landscapepainter Sep 24, 2024
305705c
add comment
landscapepainter Sep 24, 2024
c2b5bfc
nit
landscapepainter Sep 24, 2024
2ba5537
Comment on number of requested TPU chips for multi- and single- host …
landscapepainter Sep 24, 2024
92cd77d
update method to check GKE supported TPU name
landscapepainter Sep 24, 2024
d085a5b
nit
landscapepainter Sep 24, 2024
7860679
move is_tpu_pod_slice to kubernetes_utils
landscapepainter Sep 25, 2024
96924a7
update get_accelerator_from_label_value to use is_tpu_pod_slice method
landscapepainter Sep 25, 2024
1bbac21
nit
landscapepainter Sep 25, 2024
4f7ea03
format
landscapepainter Sep 25, 2024
16b6c29
nit
landscapepainter Sep 25, 2024
ad5089f
Merge branch 'master' of https://github.com/landscapepainter/skypilot
landscapepainter Sep 26, 2024
aa8efc3
Merge branch 'master' into k8s-tpu-support-on-gke
landscapepainter Sep 26, 2024
e390843
check acc count support
landscapepainter Oct 18, 2024
884f0a2
preemptive TPU check
landscapepainter Oct 18, 2024
ee28466
Merge branch 'master' into k8s-tpu-support-on-gke
landscapepainter Oct 19, 2024
11142e5
update check_tpu_fits
landscapepainter Oct 19, 2024
de55663
error msg update
landscapepainter Oct 19, 2024
a500555
merge get_tpu_topology_label_key_value into get_gpu_label_key_value
landscapepainter Oct 19, 2024
bce8731
Update sky/provision/kubernetes/utils.py
landscapepainter Oct 19, 2024
0e8366c
nit fixes
landscapepainter Oct 20, 2024
f67ad0f
format
landscapepainter Oct 20, 2024
05c37aa
nit
landscapepainter Oct 20, 2024
06d3879
Implement method for reading acc counts from node/pod object
landscapepainter Oct 20, 2024
9a2046c
assertion update for is_tpu_vm
landscapepainter Oct 20, 2024
62b235f
Exclude multi-host TPUs to displayed from show-gpus
landscapepainter Oct 21, 2024
4db1e63
Notify users that multi-host TPUs are not supported from 'sky show-gpus'
landscapepainter Oct 21, 2024
5923f10
format
landscapepainter Oct 21, 2024
fa2e670
nit
landscapepainter Oct 21, 2024
c1ee117
display warning message from show-gpus conditionally
landscapepainter Oct 21, 2024
cbce4d5
update sky show-gpus
landscapepainter Oct 23, 2024
241efc0
update get_accelerator_label_key_value
landscapepainter Oct 25, 2024
61b01d1
Merge branch 'master' into k8s-tpu-support-on-gke
landscapepainter Oct 25, 2024
2fbb4eb
format
landscapepainter Oct 25, 2024
5dc92f3
Merge branch 'master' into k8s-tpu-support-on-gke
landscapepainter Oct 26, 2024
9e8d53d
format
landscapepainter Oct 26, 2024
932e073
Merge branch 'master' into k8s-tpu-support-on-gke
landscapepainter Nov 1, 2024
0a0eac2
format
landscapepainter Nov 1, 2024
3bc95b9
Merge branch 'k8s-tpu-support-on-gke' of https://github.com/landscape…
landscapepainter Nov 1, 2024
9dbaa72
update comment
landscapepainter Nov 1, 2024
f5e1d37
resolve review comments
landscapepainter Nov 1, 2024
688c0b4
update tpuvm_mnist.yaml
landscapepainter Nov 2, 2024
2dec7f9
resolve comments
landscapepainter Nov 3, 2024
dc23e88
update display message for show-gpus
landscapepainter Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions sky/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -3086,8 +3086,8 @@ def _get_kubernetes_node_info_table():
for node_name, node_info in node_info_dict.items():
node_table.add_row([
node_name, node_info.gpu_type,
node_info.total['nvidia.com/gpu'],
node_info.free['nvidia.com/gpu']
node_info.total['accelerator_count'],
node_info.free['accelerators_available']
])
return node_table

Expand Down
13 changes: 12 additions & 1 deletion sky/clouds/kubernetes.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,11 +261,19 @@ def make_deploy_resources_variables(

k8s_acc_label_key = None
k8s_acc_label_value = None
k8s_tpu_topology_label_key = None
k8s_tpu_topology_label_value = None
tpu_requested = False

# If GPUs are requested, set node label to match the GPU type.
# If GPU/TPUs are requested, set node label to match the GPU/TPU type.
if acc_count > 0 and acc_type is not None:
k8s_acc_label_key, k8s_acc_label_value = \
kubernetes_utils.get_gpu_label_key_value(acc_type)
if (k8s_acc_label_key ==
kubernetes_utils.GKELabelFormatter.TPU_LABEL_KEY):
tpu_requested = True
k8s_tpu_topology_label_key, k8s_tpu_topology_label_value = (
kubernetes_utils.get_tpu_topology_label_key_value(acc_type))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets change the get_gpu_label_key_value function to include ability for get_tpu_topology_label_key_value? i.e. return 4 values or a dict of values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved at a500555


port_mode = network_utils.get_port_mode(None)

Expand Down Expand Up @@ -330,6 +338,9 @@ def make_deploy_resources_variables(
'k8s_skypilot_system_namespace': _SKYPILOT_SYSTEM_NAMESPACE,
'k8s_spot_label_key': spot_label_key,
'k8s_spot_label_value': spot_label_value,
'tpu_requested': tpu_requested,
'k8s_tpu_topology_label_key': k8s_tpu_topology_label_key,
'k8s_tpu_topology_label_value': k8s_tpu_topology_label_value,
'image_id': image_id,
}

Expand Down
117 changes: 64 additions & 53 deletions sky/clouds/service_catalog/kubernetes_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,12 @@ def list_accelerators_realtime(
if not has_gpu:
return {}, {}, {}

label_formatter, _ = kubernetes_utils.detect_gpu_label_formatter()
if not label_formatter:
lf, _ = kubernetes_utils.detect_gpu_label_formatter()
if not lf:
return {}, {}, {}

accelerators_qtys: Set[Tuple[str, int]] = set()
key = label_formatter.get_label_key()
keys = lf.get_label_keys()
nodes = kubernetes_utils.get_kubernetes_nodes()
# Get the pods to get the real-time GPU usage
pods = kubernetes_utils.get_kubernetes_pods()
Expand All @@ -95,56 +95,67 @@ def list_accelerators_realtime(
min_quantity_filter = quantity_filter if quantity_filter else 1

for node in nodes:
if key in node.metadata.labels:
allocated_qty = 0
accelerator_name = label_formatter.get_accelerator_from_label_value(
node.metadata.labels.get(key))

# Check if name_filter regex matches the accelerator_name
regex_flags = 0 if case_sensitive else re.IGNORECASE
if name_filter and not re.match(
name_filter, accelerator_name, flags=regex_flags):
continue

accelerator_count = int(
node.status.allocatable.get('nvidia.com/gpu', 0))

# Generate the GPU quantities for the accelerators
if accelerator_name and accelerator_count > 0:
for count in range(1, accelerator_count + 1):
accelerators_qtys.add((accelerator_name, count))

for pod in pods:
# Get all the pods running on the node
if (pod.spec.node_name == node.metadata.name and
pod.status.phase in ['Running', 'Pending']):
# Iterate over all the containers in the pod and sum the
# GPU requests
for container in pod.spec.containers:
if container.resources.requests:
allocated_qty += int(
container.resources.requests.get(
'nvidia.com/gpu', 0))

accelerators_available = accelerator_count - allocated_qty

if accelerator_count >= min_quantity_filter:
quantized_count = (min_quantity_filter *
(accelerator_count // min_quantity_filter))
if accelerator_name not in total_accelerators_capacity:
total_accelerators_capacity[
accelerator_name] = quantized_count
else:
total_accelerators_capacity[
accelerator_name] += quantized_count

if accelerator_name not in total_accelerators_available:
total_accelerators_available[accelerator_name] = 0
if accelerators_available >= min_quantity_filter:
quantized_availability = min_quantity_filter * (
accelerators_available // min_quantity_filter)
total_accelerators_available[
accelerator_name] += quantized_availability
for key in keys:
if key in node.metadata.labels:
allocated_qty = 0
accelerator_name = lf.get_accelerator_from_label_value(
node.metadata.labels.get(key))

# Check if name_filter regex matches the accelerator_name
regex_flags = 0 if case_sensitive else re.IGNORECASE
if name_filter and not re.match(
name_filter, accelerator_name, flags=regex_flags):
continue

accelerator_count = 0
if kubernetes_utils.GPU_RESOURCE_KEY in node.status.allocatable:
accelerator_count = int(node.status.allocatable[
kubernetes_utils.GPU_RESOURCE_KEY])
elif (kubernetes_utils.TPU_RESOURCE_KEY
in node.status.allocatable):
accelerator_count = int(node.status.allocatable[
kubernetes_utils.TPU_RESOURCE_KEY])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have a function to get value from a dictionary with default value? sth like this:

def get_node_attribute(attribute_dict: dict, default_value = None) -> Any:
    assert not (kubernetes_utils.GPU_RESOURCE_KEY in attribute_dict and kubernetes_utils.TPU_RESOURCE_KEY in attribute_dict), 'Cannot have both GPU and TPU resources on the same node.'
    if kubernetes_utils.GPU_RESOURCE_KEY in attribute_dict:
        return attribute_dict[kubernetes_utils.GPU_RESOURCE_KEY]
    if kubernetes_utils.TPU_RESOURCE_KEY in attribute_dict:
        return attribute_dict[kubernetes_utils.TPU_RESOURCE_KEY]
    return default_value

Copy link
Collaborator Author

@landscapepainter landscapepainter Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented at 06d3879


# Generate the GPU quantities for the accelerators
if accelerator_name and accelerator_count > 0:
for count in range(1, accelerator_count + 1):
accelerators_qtys.add((accelerator_name, count))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a quick way of addressing the show-gpus issue is to change this logic to show only the exact count, not the range if it is type TPU:

                if accelerator_name and accelerator_count > 0:
                  if accelerator is TPU:
                    accelerators_qtys.add((accelerator_name, accelerator_count))
                  else:
                    for count in range(1, accelerator_count + 1):
                        accelerators_qtys.add((accelerator_name, count))

Copy link
Collaborator Author

@landscapepainter landscapepainter Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cblmemo @romilbhardwaj fixed at cbce4d5

$ sky show-gpus --cloud kubernetes
Kubernetes GPUs (context: gke_skypilot-375900_us-south1-a_mix-tpu-dy)
GPU                   QTY_PER_NODE  TOTAL_GPUS  TOTAL_FREE_GPUS
tpu-v5-lite-podslice  1, 4          5           5

Kubernetes per node accelerator availability (Note: Multi-host TPUs are not supported.)
NODE_NAME                                  GPU_NAME              TOTAL_GPUS  FREE_GPUS
gke-mix-tpu-dy-default-pool-439ab6e7-7vk4  None                  0           0
gke-mix-tpu-dy-default-pool-439ab6e7-fjdh  None                  0           0
gke-tpu-18503f8f-v441                      tpu-v5-lite-podslice  4           4
gke-tpu-5af36f0c-q74l                      tpu-v5-lite-podslice  1           1


for pod in pods:
# Get all the pods running on the node
if (pod.spec.node_name == node.metadata.name and
pod.status.phase in ['Running', 'Pending']):
# Iterate over all the containers in the pod and sum the
# GPU requests
for container in pod.spec.containers:
if container.resources.requests:
allocated_qty += int(
container.resources.requests.get(
kubernetes_utils.GPU_RESOURCE_KEY, 0))
allocated_qty += int(
container.resources.requests.get(
kubernetes_utils.TPU_RESOURCE_KEY, 0))
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved

accelerators_available = accelerator_count - allocated_qty

if accelerator_count >= min_quantity_filter:
quantized_count = (
min_quantity_filter *
(accelerator_count // min_quantity_filter))
if accelerator_name not in total_accelerators_capacity:
total_accelerators_capacity[
accelerator_name] = quantized_count
else:
total_accelerators_capacity[
accelerator_name] += quantized_count

if accelerator_name not in total_accelerators_available:
total_accelerators_available[accelerator_name] = 0
if accelerators_available >= min_quantity_filter:
quantized_availability = min_quantity_filter * (
accelerators_available // min_quantity_filter)
total_accelerators_available[
accelerator_name] += quantized_availability

result = []

Expand Down
4 changes: 4 additions & 0 deletions sky/clouds/utils/gcp_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from sky import sky_logging
from sky import skypilot_config
from sky.provision.gcp import constants
from sky.provision.kubernetes import utils as kubernetes_utils
from sky.utils import subprocess_utils

if typing.TYPE_CHECKING:
Expand All @@ -36,6 +37,9 @@ def is_tpu_vm(resources: Optional['resources_lib.Resources']) -> bool:
if not is_tpu(resources):
return False
assert resources is not None
acc, _ = list(resources.accelerators.items())[0]
cblmemo marked this conversation as resolved.
Show resolved Hide resolved
if kubernetes_utils.is_tpu_pod_slice(acc):
return False
if resources.accelerator_args is None:
return True
return resources.accelerator_args.get('tpu_vm', True)
Expand Down
155 changes: 103 additions & 52 deletions sky/provision/kubernetes/instance.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Kubernetes instance provisioning."""
import copy
import time
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, Union
import uuid

from sky import exceptions
Expand Down Expand Up @@ -43,59 +43,79 @@ def head_service_selector(cluster_name: str) -> Dict[str, str]:
return {'component': f'{cluster_name}-head'}


def _formatted_resource_requirements(pod_or_spec: Union[Any, dict]) -> str:
# Returns a formatted string of resource requirements for a pod.
resource_requirements = {}
cblmemo marked this conversation as resolved.
Show resolved Hide resolved

if isinstance(pod_or_spec, dict):
containers = pod_or_spec.get('spec', {}).get('containers', [])
else:
containers = pod_or_spec.spec.containers

for container in containers:
if isinstance(container, dict):
resources = container.get('resources', {})
requests = resources.get('requests', {})
else:
resources = container.resources
requests = resources.requests or {}

for resource, value in requests.items():
if resource not in resource_requirements:
resource_requirements[resource] = 0
cblmemo marked this conversation as resolved.
Show resolved Hide resolved
if resource == 'memory':
int_value = kubernetes_utils.parse_memory_resource(value)
else:
int_value = kubernetes_utils.parse_cpu_or_gpu_resource(value)
resource_requirements[resource] += int(int_value)
return ', '.join(f'{resource}={value}'
for resource, value in resource_requirements.items())


def _formatted_node_selector(pod_or_spec: Union[Any, dict]) -> Optional[str]:
# Returns a formatted string of node selectors for a pod.
node_selectors = []

if isinstance(pod_or_spec, dict):
selectors = pod_or_spec.get('spec', {}).get('nodeSelector', {})
else:
selectors = pod_or_spec.spec.node_selector

if not selectors:
return None

for label_key, label_value in selectors.items():
node_selectors.append(f'{label_key}={label_value}')
return ', '.join(node_selectors)


def _lack_resource_msg(resource: str,
pod_or_spec: Union[Any, dict],
extra_msg: Optional[str] = None,
details: Optional[str] = None) -> str:
resource_requirements = _formatted_resource_requirements(pod_or_spec)
node_selectors = _formatted_node_selector(pod_or_spec)
node_selector_str = f' and labels ({node_selectors})' if (
node_selectors) else ''
msg = (f'Insufficient {resource} capacity on the cluster. '
f'Required resources ({resource_requirements}){node_selector_str} '
'were not found in a single node. Other SkyPilot tasks or pods may '
'be using resources. Check resource usage by running '
'`kubectl describe nodes`.')
if extra_msg:
msg += f' {extra_msg}'
if details:
msg += f'\nFull error: {details}'
return msg


def _raise_pod_scheduling_errors(namespace, context, new_nodes):
"""Raise pod scheduling failure reason.

When a pod fails to schedule in Kubernetes, the reasons for the failure
are recorded as events. This function retrieves those events and raises
descriptive errors for better debugging and user feedback.
"""

def _formatted_resource_requirements(pod):
# Returns a formatted string of resource requirements for a pod.
resource_requirements = {}
for container in pod.spec.containers:
for resource, value in container.resources.requests.items():
if resource not in resource_requirements:
resource_requirements[resource] = 0
if resource == 'memory':
int_value = kubernetes_utils.parse_memory_resource(value)
else:
int_value = kubernetes_utils.parse_cpu_or_gpu_resource(
value)
resource_requirements[resource] += int_value
return ', '.join(f'{resource}={value}'
for resource, value in resource_requirements.items())

def _formatted_node_selector(pod) -> Optional[str]:
# Returns a formatted string of node selectors for a pod.
node_selectors = []
if pod.spec.node_selector is None:
return None
for label_key, label_value in pod.spec.node_selector.items():
node_selectors.append(f'{label_key}={label_value}')
return ', '.join(node_selectors)

def _lack_resource_msg(resource: str,
pod,
extra_msg: Optional[str] = None,
details: Optional[str] = None) -> str:
resource_requirements = _formatted_resource_requirements(pod)
node_selectors = _formatted_node_selector(pod)
node_selector_str = f' and labels ({node_selectors})' if (
node_selectors) else ''
msg = (
f'Insufficient {resource} capacity on the cluster. '
f'Required resources ({resource_requirements}){node_selector_str} '
'were not found in a single node. Other SkyPilot tasks or pods may '
'be using resources. Check resource usage by running '
'`kubectl describe nodes`.')
if extra_msg:
msg += f' {extra_msg}'
if details:
msg += f'\nFull error: {details}'
return msg

for new_node in new_nodes:
pod = kubernetes.core_api(context).read_namespaced_pod(
new_node.metadata.name, namespace)
Expand Down Expand Up @@ -144,8 +164,8 @@ def _lack_resource_msg(resource: str,
'`kubectl delete pods -n skypilot-system -l name=smarter-device-manager`.' # pylint: disable=line-too-long
f' Full error: {event_message}')
gpu_lf_keys = [
lf.get_label_key()
for lf in kubernetes_utils.LABEL_FORMATTER_REGISTRY
key for lf in kubernetes_utils.LABEL_FORMATTER_REGISTRY
for key in lf.get_label_keys()
]
if pod.spec.node_selector:
for label_key in pod.spec.node_selector.keys():
Expand Down Expand Up @@ -497,7 +517,7 @@ def _create_pods(region: str, cluster_name_on_cloud: str,
'For more details, refer to https://skypilot.readthedocs.io/en/latest/reference/config.html') # pylint: disable=line-too-long

needs_gpus = (pod_spec['spec']['containers'][0].get('resources', {}).get(
'limits', {}).get('nvidia.com/gpu', 0) > 0)
'limits', {}).get(kubernetes_utils.GPU_RESOURCE_KEY, 0) > 0)
if nvidia_runtime_exists and needs_gpus:
pod_spec['spec']['runtimeClassName'] = 'nvidia'
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved

Expand Down Expand Up @@ -542,8 +562,39 @@ def _create_pods(region: str, cluster_name_on_cloud: str,
}
}

pod = kubernetes.core_api(context).create_namespaced_pod(
namespace, pod_spec)
# TPU slice nodes are given a taint, google.com/tpu=present:NoSchedule.
# This is to prevent from non-TPU workloads from being scheduled on TPU
# slice nodes. We need this toleration to allow the pod to be scheduled
# on TPU nodes.
# Reference: https://cloud.google.com/kubernetes-engine/docs/concepts/tpus#how_tpus_work # pylint: disable=line-too-long
tpu_label = kubernetes_utils.GKELabelFormatter.TPU_LABEL_KEY
if tpu_label in config.node_config.get('spec',
{}).get('nodeSelector', {}):
tpu_toleration = {
'key': kubernetes_utils.TPU_RESOURCE_KEY,
'operator': 'Equal',
'value': 'present',
'effect': 'NoSchedule'
}
pod_spec['spec']['tolerations'] = [tpu_toleration]

try:
pod = kubernetes.core_api(context).create_namespaced_pod(
namespace, pod_spec)
except kubernetes.api_exception() as e:
error_msg = str(e)
# Unlike other errors from resource lackage on CPU/GPU/Memory, TPU
# lackage error is raised when pod is attemtped to be created.
if 'Invalid resource requests for google.com/tpu.' in error_msg:
extra_msg = ('Verify if the cluster has a TPU slice node with '
'a topology matching the number of TPU(s) '
'requested.')
raise config_lib.KubernetesError(
_lack_resource_msg('TPU',
pod_spec,
details=error_msg,
extra_msg=extra_msg))
raise
created_pods[pod.metadata.name] = pod
if head_pod_name is None:
head_pod_name = pod.metadata.name
Expand Down
Loading
Loading