Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default fill values #2265

Open
TomAugspurger opened this issue Sep 27, 2024 · 2 comments
Open

Change default fill values #2265

TomAugspurger opened this issue Sep 27, 2024 · 2 comments
Labels
bug Potential issues with the zarr-python library V3 Affects the v3 branch
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Zarr version

v3

Numcodecs version

na

Python Version

na

Operating System

na

Installation

na

Description

Over in pydata/xarray#5475, we've been discussing an issue that's affecting xarray with Zarr v3. xarray currently interprets values equal to fill_value as "missing" and casts them to NaN (apparently in NetCDF (or CF conventions?) there's some understanding that its _FillValue is understood to be outside the "valid range" of the data).

There's lots of discussion there, but one thing zarr-python could do to help would be to choose default fill values that are less likely overlap with valid data. Exactly what's valid is domain / application / dataset specific, but I think that 0 (or the equivalent for some dtype) is slightly more likely to be valid than many others, and so might be a worse default.

What do people thing about the following kinds of rules?

  • (signed) integer: intmin / np.iinfo(dtype).min
  • unsigned integer: intmax / np.iinfo(dtype).max
  • float: nan
  • complex: nan+nan0j

Steps to reproduce

na

Additional output

No response

@TomAugspurger TomAugspurger added the bug Potential issues with the zarr-python library label Sep 27, 2024
@jhamman
Copy link
Member

jhamman commented Sep 27, 2024

I'm a big +1 on changing the defaults. I don't really care about the values but will note NetCDF4 has also defined defaults:

In [1]: from netCDF4 import default_fillvals

In [2]: default_fillvals
Out[2]:
{'S1': '\x00',
 'i1': -127,
 'u1': 255,
 'i2': -32767,
 'u2': 65535,
 'i4': -2147483647,
 'u4': 4294967295,
 'i8': -9223372036854775806,
 'u8': 18446744073709551614,
 'f4': 9.969209968386869e+36,
 'f8': 9.969209968386869e+36}

@d-v-b
Copy link
Contributor

d-v-b commented Sep 28, 2024

I'm not opposed to moving the defaults, but it's probably worth hearing from people in other domains. In my experience in bioimaging, for both raw images and segmentations, 0 is conventionally used as a background label and people very often rely on application default values (which is probably where the convention came from in the first place).

cc @jni

Maybe in zarr v4 we can have proper support for nullable types to avoid this problem alltogether ;)

@jhamman jhamman added the V3 Affects the v3 branch label Sep 28, 2024
@jhamman jhamman added this to the 3.0.0 milestone Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library V3 Affects the v3 branch
Projects
Status: Todo
Development

No branches or pull requests

3 participants