You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Over in pydata/xarray#5475, we've been discussing an issue that's affecting xarray with Zarr v3. xarray currently interprets values equal to fill_value as "missing" and casts them to NaN (apparently in NetCDF (or CF conventions?) there's some understanding that its _FillValue is understood to be outside the "valid range" of the data).
There's lots of discussion there, but one thing zarr-python could do to help would be to choose default fill values that are less likely overlap with valid data. Exactly what's valid is domain / application / dataset specific, but I think that 0 (or the equivalent for some dtype) is slightly more likely to be valid than many others, and so might be a worse default.
What do people thing about the following kinds of rules?
(signed) integer: intmin / np.iinfo(dtype).min
unsigned integer: intmax / np.iinfo(dtype).max
float: nan
complex: nan+nan0j
Steps to reproduce
na
Additional output
No response
The text was updated successfully, but these errors were encountered:
I'm not opposed to moving the defaults, but it's probably worth hearing from people in other domains. In my experience in bioimaging, for both raw images and segmentations, 0 is conventionally used as a background label and people very often rely on application default values (which is probably where the convention came from in the first place).
Zarr version
v3
Numcodecs version
na
Python Version
na
Operating System
na
Installation
na
Description
Over in pydata/xarray#5475, we've been discussing an issue that's affecting xarray with Zarr v3. xarray currently interprets values equal to
fill_value
as "missing" and casts them to NaN (apparently in NetCDF (or CF conventions?) there's some understanding that its_FillValue
is understood to be outside the "valid range" of the data).There's lots of discussion there, but one thing zarr-python could do to help would be to choose default fill values that are less likely overlap with valid data. Exactly what's valid is domain / application / dataset specific, but I think that
0
(or the equivalent for some dtype) is slightly more likely to be valid than many others, and so might be a worse default.What do people thing about the following kinds of rules?
intmin
/np.iinfo(dtype).min
np.iinfo(dtype).max
nan
nan+nan0j
Steps to reproduce
na
Additional output
No response
The text was updated successfully, but these errors were encountered: