Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing nested children via consolidated metadata fails #2358

Open
jhamman opened this issue Oct 14, 2024 · 1 comment · May be fixed by #2363
Open

Accessing nested children via consolidated metadata fails #2358

jhamman opened this issue Oct 14, 2024 · 1 comment · May be fixed by #2363
Labels
bug Potential issues with the zarr-python library
Milestone

Comments

@jhamman
Copy link
Member

jhamman commented Oct 14, 2024

Zarr version

3.0.0.beta

Numcodecs version

0.13

Python Version

3.11

Operating System

Mac

Installation

pip

Description

In pydata/xarray#9552, I noticed that accessing nested children fails when using consolidated metadata.

Steps to reproduce

import zarr

store = zarr.storage.MemoryStore(mode='w')

# create hierarchy root + foo/bar
root = zarr.open_group(store=store, attributes={'a': 'b'}, mode='w')
root.create_array('foo/bar', shape=(2, 2), attributes={'d': 4})

# consolidate metadata
out = zarr.consolidate_metadata(store)

out['foo/bar']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:670](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=669), in AsyncGroup._getitem_consolidated(self, store_path, key, prefix)
    669 try:
--> 670     metadata = self.metadata.consolidated_metadata.metadata[key]
    671 except KeyError as e:
    672     # The Group Metadata has consolidated metadata, but the key
    673     # isn't present. We trust this to mean that the key isn't in
    674     # the hierarchy, and *don't* fall back to checking the store.

KeyError: 'foo[/bar](http://localhost:8888/bar)'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[20], line 12
      9 # consolidate metadata
     10 out = zarr.consolidate_metadata(store)
---> 12 out['foo[/bar](http://localhost:8888/bar)']

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:1330](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=1329), in Group.__getitem__(self, path)
   1329 def __getitem__(self, path: str) -> Array | Group:
-> 1330     obj = self._sync(self._async_group.getitem(path))
   1331     if isinstance(obj, AsyncArray):
   1332         return Array(obj)

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py:185](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py#line=184), in SyncMixin._sync(self, coroutine)
    182 def _sync(self, coroutine: Coroutine[Any, Any, T]) -> T:
    183     # TODO: refactor this to to take *args and **kwargs and pass those to the method
    184     # this should allow us to better type the sync wrapper
--> 185     return sync(
    186         coroutine,
    187         timeout=config.get("async.timeout"),
    188     )

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py:141](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py#line=140), in sync(coro, loop, timeout)
    138 return_result = next(iter(finished)).result()
    140 if isinstance(return_result, BaseException):
--> 141     raise return_result
    142 else:
    143     return return_result

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py:100](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py#line=99), in _runner(coro)
     95 """
     96 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
     97 exception, the exception will be returned.
     98 """
     99 try:
--> 100     return await coro
    101 except Exception as ex:
    102     return ex

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:608](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=607), in AsyncGroup.getitem(self, key)
    606 # Consolidated metadata lets us avoid some I[/O](http://localhost:8888/O) operations so try that first.
    607 if self.metadata.consolidated_metadata is not None:
--> 608     return self._getitem_consolidated(store_path, key, prefix=self.name)
    610 # Note:
    611 # in zarr-python v2, we first check if `key` references an Array, else if `key` references
    612 # a group,using standalone `contains_array` and `contains_group` functions. These functions
    613 # are reusable, but for v3 they would perform redundant I[/O](http://localhost:8888/O) operations.
    614 # Not clear how much of that strategy we want to keep here.
    615 elif self.metadata.zarr_format == 3:

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:676](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=675), in AsyncGroup._getitem_consolidated(self, store_path, key, prefix)
    671 except KeyError as e:
    672     # The Group Metadata has consolidated metadata, but the key
    673     # isn't present. We trust this to mean that the key isn't in
    674     # the hierarchy, and *don't* fall back to checking the store.
    675     msg = f"'{key}' not found in consolidated metadata."
--> 676     raise KeyError(msg) from e
    678 # update store_path to ensure that AsyncArray[/Group.name](http://localhost:8888/Group.name) is correct
    679 if prefix != "[/](http://localhost:8888/)":

KeyError: "'foo[/bar](http://localhost:8888/bar)' not found in consolidated metadata."

Additional output

No response

@jhamman jhamman added the bug Potential issues with the zarr-python library label Oct 14, 2024
@jhamman jhamman added this to the 3.0.0 milestone Oct 14, 2024
@TomAugspurger
Copy link
Contributor

Oh, I didn't know that was valid. I'll push a fix up today.

TomAugspurger added a commit to TomAugspurger/zarr-python that referenced this issue Oct 14, 2024
This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently. This is fine for consolidated
metadata which doesn't need to do IO.

There's a complication around unconsolidated metadata, though. What
if we encounter a node where `Group.getitem` returns a sub Group
without consolidated metadata. Then we need to fall back to
non-consolidated metadata. We've written _getitem_consolidated
as a regular (non-async) function so we need to pop back up to
the async caller and have *it* fall back.

Closes zarr-developers#2358
TomAugspurger added a commit to TomAugspurger/zarr-python that referenced this issue Oct 14, 2024
This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently. This is fine for consolidated
metadata which doesn't need to do IO.

There's a complication around unconsolidated metadata, though. What
if we encounter a node where `Group.getitem` returns a sub Group
without consolidated metadata. Then we need to fall back to
non-consolidated metadata. We've written _getitem_consolidated
as a regular (non-async) function so we need to pop back up to
the async caller and have *it* fall back.

Closes zarr-developers#2358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants