-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility for zarr-python 3.x #9552
base: main
Are you sure you want to change the base?
Conversation
1ed4ef1
to
bb2bb6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of changes should be backwards compatible and work with zarr-python 2.x (so reading and writing zarr v2 data).
I'll work through zarr-python 3.x now. I think we might want to parametrize most of these tests by zarr_version=[2, 3]
to confirm that we can read / write zarr v2 data with zarr-python 3.x
@@ -75,8 +89,10 @@ def __init__(self, zarr_array): | |||
self.shape = self._array.shape | |||
|
|||
# preserve vlen string object dtype (GH 7328) | |||
if self._array.filters is not None and any( | |||
[filt.codec_id == "vlen-utf8" for filt in self._array.filters] | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarr-developers/zarr-python#2036 is probably relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm whether we need any logic on the branch where we do have Zarr V3.
xarray/backends/zarr.py
Outdated
|
||
if _zarr_v3() and zarr_array.metadata.zarr_format == 3: | ||
encoding["codec_pipeline"] = [ | ||
x.to_dict() for x in zarr_array.metadata.codecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this instead?
x.to_dict() for x in zarr_array.metadata.codecs | |
zarr_array.metadata.to_dict()["codecs"] |
A bit wasteful since everything has to be serialized, but presumably zarr knows better how to serialize the codec pipeline than we do here?
9f2cb2f
to
d11d593
Compare
* removed open_consolidated workarounds * removed _store_version check * pass through zarr_version
a324329
to
6087e5e
Compare
- skip write_empty_chunks on 3.x - update patch targets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress here @TomAugspurger. I'm impressed by how little you've changed in the backend itself and I'm noting the pain around testing (I felt some of that w/ dask as well).
if consolidated is None: | ||
try: | ||
zarr_group = zarr.open_consolidated(store, **open_kwargs) | ||
except KeyError: | ||
except (ValueError, KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the Zarr side, it may be nice to raise a a custom exception when consolidated metadata is not found. Something like:
class ConsolidatedMetadataNotFound(FileNotFoundError):
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to spend some time today working on the last few failures here. Some things I'm noticing that will need some attention:
|
set use_consolidated to false when user provides consolidated=False
Fix/cm import
fix: relax instrumented store checks for v3
@TomAugspurger - a thought that may help limit the scope of this PR. We may consider punting on full datatree support for zarr-v3 in this PR and fix that up in follow-on work. What do you think about adding |
I haven't looked at the datatree side of things yet, so that sounds good to me :) |
Another idea to simplify things would be to disallow (or at least discourage) consolidated metadata with V3 data, given the uncertain status of consolidated metadata in the V3 spec. Maybe default to |
What exactly is the scenario/version where this doesn't work? I would like to release a version of Xarray where you can still use DataTree to open a Zarr V2 store. Then I wouldn't mind us fixing other cases later. |
@TomNicholas - my proposal would maintain existing datatree functionality for Zarr-Python 2 but would postpone doing the integration work for Zarr-Python 3 for another PR. The specific issues are mostly upstream and may take a few days to sort out. |
That sounds fine to me! |
This might be a bit tricky to implement. The current default behavior is to try consolidated metadata and emit a warning and fall back to non-consolidated metadata. However, we might not know whether we have V2 or V3 data until after we've read the data, so we couldn't warn until after we've fallen back to non-consolidated and discovered what we have. I'm not sure about the write side. IMO, the downsides of lack of consolidated metadata, and my confidence that something like consolidated metadata will end up in v3 pushes me to try to support it with the current API. If we do need to adjust anything to comply with the spec I think we'll be able to paper over it in code and not have to change the user-facing API. |
I'll have a fix for the failing TestInstrumentedStore tests soon. |
skip datatree zarr tests w/ zarr 3 for now
Currently, the only zarr store that supports storage options is |
Let's skip this test with v3. |
This is a WIP for compatibility with zarr-python 3.x. It's intended to be run against zarr-python v3 + the open PRs referenced in #9515.
All of the zarr test cases should be parameterized by
zarr_format=[2, 3]
with zarr-python 3.x to exercise reading and writing both formats.This is currently passing with zarr-python==2.18.3. zarr-python 3.x has about 61 failures, all of which are related to data types that aren't yet implemented in zarr-python 3.x.
I'll also note that #5475 is going to become a larger issue once people start writing Zarr-V3 datasets.
whats-new.rst
api.rst