Change default fill values #2265

TomAugspurger · 2024-09-27T17:28:07Z

Zarr version

v3

Numcodecs version

na

Python Version

na

Operating System

na

Installation

na

Description

Over in pydata/xarray#5475, we've been discussing an issue that's affecting xarray with Zarr v3. xarray currently interprets values equal to fill_value as "missing" and casts them to NaN (apparently in NetCDF (or CF conventions?) there's some understanding that its _FillValue is understood to be outside the "valid range" of the data).

There's lots of discussion there, but one thing zarr-python could do to help would be to choose default fill values that are less likely overlap with valid data. Exactly what's valid is domain / application / dataset specific, but I think that 0 (or the equivalent for some dtype) is slightly more likely to be valid than many others, and so might be a worse default.

What do people thing about the following kinds of rules?

(signed) integer: intmin / np.iinfo(dtype).min
unsigned integer: intmax / np.iinfo(dtype).max
float: nan
complex: nan+nan0j

Steps to reproduce

na

Additional output

No response

The text was updated successfully, but these errors were encountered:

jhamman · 2024-09-27T21:31:13Z

I'm a big +1 on changing the defaults. I don't really care about the values but will note NetCDF4 has also defined defaults:

In [1]: from netCDF4 import default_fillvals

In [2]: default_fillvals
Out[2]:
{'S1': '\x00',
 'i1': -127,
 'u1': 255,
 'i2': -32767,
 'u2': 65535,
 'i4': -2147483647,
 'u4': 4294967295,
 'i8': -9223372036854775806,
 'u8': 18446744073709551614,
 'f4': 9.969209968386869e+36,
 'f8': 9.969209968386869e+36}

d-v-b · 2024-09-28T01:48:49Z

I'm not opposed to moving the defaults, but it's probably worth hearing from people in other domains. In my experience in bioimaging, for both raw images and segmentations, 0 is conventionally used as a background label and people very often rely on application default values (which is probably where the convention came from in the first place).

cc @jni

Maybe in zarr v4 we can have proper support for nullable types to avoid this problem alltogether ;)

TomAugspurger added the bug Potential issues with the zarr-python library label Sep 27, 2024

jhamman added the V3 Affects the v3 branch label Sep 28, 2024

jhamman added this to the 3.0.0 milestone Sep 28, 2024

TomAugspurger mentioned this issue Sep 29, 2024

Zarr Python 3 tracking issue pydata/xarray#9515

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default fill values #2265

Change default fill values #2265

TomAugspurger commented Sep 27, 2024

jhamman commented Sep 27, 2024

d-v-b commented Sep 28, 2024

Change default fill values #2265

Change default fill values #2265

Comments

TomAugspurger commented Sep 27, 2024

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Steps to reproduce

Additional output

jhamman commented Sep 27, 2024

d-v-b commented Sep 28, 2024