Skip to content

Commit

Permalink
[Op] Canonicalize SaveV2 Op device spec in distributed training. (#925)
Browse files Browse the repository at this point in the history
Signed-off-by: 泊霆 <hujunqi.hjq@alibaba-inc.com>
  • Loading branch information
Mesilenceki authored Nov 9, 2023
1 parent fc4f9f5 commit 29d9b46
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions tensorflow/python/training/saver.py
Original file line number Diff line number Diff line change
Expand Up @@ -550,8 +550,12 @@ def _GroupByDevices(self, saveables):
"""
per_device = collections.defaultdict(lambda: [])
for saveable in saveables:
canonical_device = set(
pydev.canonical_name(spec.tensor.device) for spec in saveable.specs)
canonical_device = set()
for spec in saveable.specs:
device_name = pydev.canonical_name(spec.tensor.device)
device_spec = pydev.DeviceSpec.from_string(device_name)
device_spec.device_type = "CPU"
canonical_device.add(device_spec.to_string())
if len(canonical_device) != 1:
raise ValueError("All tensors of a saveable object must be "
"on the same device: %s" % saveable.name)
Expand Down

0 comments on commit 29d9b46

Please sign in to comment.