Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed consolidated Group getitem with multi-part key #2363

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Contributor

This fixes Group.__getitem__ when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as group['subgroup']['array'] by splitting the key
and doing each operation independently. This is fine for consolidated
metadata which doesn't need to do IO.

There's a complication around unconsolidated metadata, though. What
if we encounter a node where Group.getitem returns a sub Group
without consolidated metadata. Then we need to fall back to
non-consolidated metadata. We've written _getitem_consolidated
as a regular (non-async) function so we need to pop back up to
the async caller and have it fall back.

Closes #2358

This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently. This is fine for consolidated
metadata which doesn't need to do IO.

There's a complication around unconsolidated metadata, though. What
if we encounter a node where `Group.getitem` returns a sub Group
without consolidated metadata. Then we need to fall back to
non-consolidated metadata. We've written _getitem_consolidated
as a regular (non-async) function so we need to pop back up to
the async caller and have *it* fall back.

Closes zarr-developers#2358
@@ -97,6 +97,24 @@ def _parse_async_node(
raise TypeError(f"Unknown node type, got {type(node)}")


class _MixedConsolidatedMetadataException(Exception):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this MixConsolidatedMetadataException stuff makes sense. The crux of the issue is that Group.__getitem__ on a Group with consolidated metadata can be IO-free. To make that explicit, we define it as a plain (non-async) function.

But this group['subgroup']['subarray'] presents a new challenge. What if group has consolidated metadata, but subgroup doesn't? Its consolidated metadata might be None, meaning we need to fall back to the non-consolidated metadata. We discover this fact in AsyncGroup._getitem_consolidated, which isn't an async function, so it can't call the async non-consolidated getitem.

To break through that tangle, I've added this custom exception. The expectation is that users never see it; we require (through docs & being careful?) that all the callers of _getitem_consolidated handle this case by catching the error and falling back if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention: this "mixed" case isn't going to be common. Under normal operation, where users call consolidate_metadata and don't mutate the group afterwards we consolidate everything and so won't hit this. I believe the only time this can happen through the public API is when users add a new group to group with consolidated metadata, and don't then re-consolidate.

@TomAugspurger TomAugspurger marked this pull request as ready for review October 14, 2024 16:16
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here @TomAugspurger - The (somewhat messy) handling of mixed consolidated / non-consolidated metadata makes me wonder if we want to revisit supporting it at all. It would be pretty reasonable to ask callers to switch to consolidated=False if they go looking for a key outside of the range of the consolidated metadata.

But we can have that conversation another day.

@jhamman jhamman changed the base branch from v3 to main October 14, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accessing nested children via consolidated metadata fails
2 participants