Fix how forcing data is read in #56

mnlevy1981 · 2023-08-31T19:58:37Z

Using open_mfdataset() in data_wrangling.py caused our forcing dataset to be chunked in time. This didn't play nicely with xr.map_blocks(), resulting in the wrong forcing data being available when trying to read multiple netcdf files (such as the 0.1 degree POP time series files). Using xr.open_dataset() and then merging all the datasets does not introduce chunking in the time dimension, so xr.map_blocks() receives the entire forcing dataset.

Note that this increases the memory footprint, especially in the Run Multiple Years (highres) notebook. I've had trouble getting enough resources on casper to run two years at a time.

@rmshkv -- do you want to play with this branch and see if you can get two years per run with the 0.1 degree forcing? Or should we bring it in as-is and then figure out how to update the notebook later?

Using open_mfdataset() in data_wrangling.py caused our forcing dataset to be chunked in time. This didn't play nicely with xr.map_blocks(), resulting in the wrong forcing data being available when trying to read multiple netcdf files (such as the 0.1 degree POP time series files). Using xr.open_dataset() and then merging all the datasets does not introduce chunking in the time dimension, so xr.map_blocks() receives the entire forcing dataset. Note that this increases the memory footprint, especially in the Run Multiple Years (highres) notebook. I've had trouble getting enough resources on casper to run two years at a time.

To decrease memory usage, create temporary forcing stream files that only include years that might be used by the current run (based on start_year and nyears). That's start_year-1, start_year, ... start_year + nyears -1 (end_year), and start_year + nyears

Also update Run Multiple Years (highres).ipynb to only use one year of forcing at a time (by creating temporary forcing stream files)

mnlevy1981 requested a review from rmshkv August 31, 2023 19:58

mnlevy1981 added 3 commits September 19, 2023 11:22

Include modify_forcing_dict()

c4d0bd6

To decrease memory usage, create temporary forcing stream files that only include years that might be used by the current run (based on start_year and nyears). That's start_year-1, start_year, ... start_year + nyears -1 (end_year), and start_year + nyears

Revert xr.merge back to xr.open_mfdataset

d66e12a

Also update Run Multiple Years (highres).ipynb to only use one year of forcing at a time (by creating temporary forcing stream files)

Merge branch 'main' into fix_highres_forcing

1b1bf1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix how forcing data is read in #56

Fix how forcing data is read in #56

mnlevy1981 commented Aug 31, 2023

Fix how forcing data is read in #56

Are you sure you want to change the base?

Fix how forcing data is read in #56

Conversation

mnlevy1981 commented Aug 31, 2023