Skip to content

Rechunking retrospective runs to more approachable chunks in Zarr output.

Notifications You must be signed in to change notification settings

NCAR/rechunk_retro_nwm_v21

Repository files navigation

Rechunking National Water Model v2.1 retrospective simulations to more (cloud-)approachable chunks in Zarr format.

Authors: James McCreight (NCAR), Ishita Srivastava (NCAR), Rich Signell (USGS), and Yongxin Zhang (NCAR)

Overview

The National Water Model (NWM) version 2.1 retrospective simulation spans 42-years (Feb 1979 - Dec 2020). The model domain is the continential US. Inputs are hourly and outputs are provided at hourly or 3-hourly resolution. Additional details are provided below and in this retrospective overview document.

The model writes separate files at each output time. Within those individual files the data are not chunked in space. In the use case of opening a full timeseries at a single point or a sub-region, the user would be required to read in the entire data set: a very inefficient data access pattern for a very common use case.

Enter rechunking. The goal of rechunking this model dataset is to provide chunks (data pieces partitioning the dimnensions of of the data) that support efficient data access for most use cases. When a specific, intensive use case would benefit from a different chunk scheme than that provided, the provided datasets can be rechunked to accomodate that pattern. Examples of use cases are supplied, including re-rechunking.

Data overview

Six separate zarr stores have been created, corresponding closely to the model output files. The time resolution is noted for each product.

  • lakeout: Output from the lake model (hourly, 5.5GB)
  • gwout: Output from the groundwater model (hourly, 1.7TB)
  • chrtout: Output from the streamflow model(hourly, 1.4TB)
  • precip: Input precipitation fields from the OWP AORC forcing data set (hourly, 2.0TB)
  • ldasout: Output from the NoahMP land surface model (3-hourly, pending)
  • rtout: Output from the overland and subsurface terrain routing model (3-hourly, pending)

Additonal detail on these stores (variables contained and space-time information) is provided in the data description section below and via accompanying notebooks.

Data Access

Cloud on AWS

Landing page
NWM v2.1 Zarr Bucket

NCAR glade

For those with access to NCAR computing resources, these can alternatively be found at the following paths:

/glade/campaign/ncar/USGS_Water/NWMV21_retro_zarr/lakeout.zarr
/glade/campaign/ncar/USGS_Water/NWMV21_retro_zarr/gwout.zarr
/glade/campaign/ncar/USGS_Water/NWMV21_retro_zarr/chrtout.zarr
/glade/campaign/ncar/USGS_Water/NWMV21_retro_zarr/precip.zarr
/glade/campaign/ncar/USGS_Water/NWMV21_retro_zarr/ldasout.zarr
/glade/campaign/ncar/USGS_Water/NWMV21_retro_zarr/rtout.zarr

Data Description

Data as accessed by xarray.open_zarr can be found in the accompanying notebook (html) (jupyter_notebook). This includes metadata, chunking schemes, and data types for all variables and coordinates.

Further details in this accompanying notebook (html) (jupyter notebook) including the xarray dataset reports and also xarray and Zarr details for each variable showing storage data types, levels of compression and other information. Note that the difference in the data types between xarray and zarr result from the use of scale_factor and add_offset metadata in the underlying Zarr data set which xarray uses to recover floating point variables from the stored integers.

Use Cases

  • Example of retrieving and plotting a single timeserires from the chrtout store (html) (jupyter notebook)

  • Example of subsetting and rechunking the store to optimize data access pattern: selecting only streamflow gages from chrtout (html) (jupyter notebook)

Code overview

An overview of the code used can be found in README_code.md.

About

Rechunking retrospective runs to more approachable chunks in Zarr output.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages