to_unstacked_dataset unable to reconstruct original dimensions #9541

aFarchi · 2024-09-24T10:19:15Z

What is your issue?

Hello,

I am trying to stack/unstack a dataset. According to the doc, I am supposed to recover the original dataset, but this is not what I observe.

>>> import xarray as xr
>>> import numpy as np
>>> ds = xr.Dataset(
...     data_vars=dict(
...         var_a=(('sample', 'dim_a'), np.random.randn(2, 1)),
...         var_b=(('sample', 'dim_b'), np.random.randn(2, 4)),
...     ),
... )
>>> ds
<xarray.Dataset> Size: 80B
Dimensions:  (sample: 2, dim_a: 1, dim_b: 4)
Dimensions without coordinates: sample, dim_a, dim_b
Data variables:
    var_a    (sample, dim_a) float64 16B -0.5696 -0.8579
    var_b    (sample, dim_b) float64 64B 0.0585 -1.219 1.702 ... 1.244 0.7397

Stacking the dataset looks correct:

>>> stacked = ds.to_stacked_array('output_feature', sample_dims=('sample',))
>>> stacked
<xarray.DataArray 'var_a' (sample: 2, output_feature: 5)> Size: 80B
array([[-0.56958696,  0.058498  , -1.21899832,  1.70180735, -0.06674016],
       [-0.85787833,  1.86201164, -1.71474761,  1.24400992,  0.73965765]])
Coordinates:
  * output_feature  (output_feature) object 40B MultiIndex
  * variable        (output_feature) <U5 100B 'var_a' 'var_b' ... 'var_b'
  * dim_a           (output_feature) object 40B 0 nan nan nan nan
  * dim_b           (output_feature) object 40B nan 0 1 2 3
Dimensions without coordinates: sample

But unstacking seems incorrect:

>>> stacked.to_unstacked_dataset('output_feature')
<xarray.Dataset> Size: 176B
Dimensions:         (sample: 2, output_feature: 4)
Coordinates:
  * output_feature  (output_feature) object 32B MultiIndex
  * dim_a           (output_feature) object 32B nan nan nan nan
  * dim_b           (output_feature) object 32B 0 1 2 3
Dimensions without coordinates: sample
Data variables:
    var_a           (sample) float64 16B -0.5696 -0.8579
    var_b           (sample, output_feature) float64 64B 0.0585 ... 0.7397

var_a should have dimensions (sample, dim_a) and var_b should have (sample, dim_b).

The issue seems even worse when len(dim_a)>1:

>>> import xarray as xr
>>> import numpy as np
>>> ds = xr.Dataset(
...     data_vars=dict(
...         var_a=(('sample', 'dim_a'), np.random.randn(2, 2)),
...         var_b=(('sample', 'dim_b'), np.random.randn(2, 4)),
...     ),
... )
>>> stacked = ds.to_stacked_array('output_feature', sample_dims=('sample',))
>>> stacked.to_unstacked_dataset('output_feature', level=0)
<xarray.Dataset> Size: 336B
Dimensions:         (output_feature: 6, sample: 2)
Coordinates:
  * output_feature  (output_feature) object 48B MultiIndex
  * dim_a           (output_feature) object 48B 0 1 nan nan nan nan
  * dim_b           (output_feature) object 48B nan nan 0 1 2 3
Dimensions without coordinates: sample
Data variables:
    var_a           (sample, output_feature) float64 96B 0.6215 -1.72 ... nan
    var_b           (sample, output_feature) float64 96B nan nan ... 0.2421

Could it be related to the level argument of to_unstacked_dataset()?

Note that I have been using the last version for this test:

>>> xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:07:06) [Clang 17.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.9.0
pandas: 2.2.3
numpy: 2.1.1
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 74.1.2
pip: 24.2
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

The text was updated successfully, but these errors were encountered:

welcome · 2024-09-24T10:19:18Z

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

aFarchi added the needs triage Issue that has not been reviewed by xarray team member label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_unstacked_dataset unable to reconstruct original dimensions #9541

to_unstacked_dataset unable to reconstruct original dimensions #9541

aFarchi commented Sep 24, 2024

welcome bot commented Sep 24, 2024

to_unstacked_dataset unable to reconstruct original dimensions #9541

to_unstacked_dataset unable to reconstruct original dimensions #9541

Comments

aFarchi commented Sep 24, 2024

What is your issue?

welcome bot commented Sep 24, 2024