Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix how forcing data is read in #56

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mnlevy1981
Copy link
Collaborator

Using open_mfdataset() in data_wrangling.py caused our forcing dataset to be chunked in time. This didn't play nicely with xr.map_blocks(), resulting in the wrong forcing data being available when trying to read multiple netcdf files (such as the 0.1 degree POP time series files). Using xr.open_dataset() and then merging all the datasets does not introduce chunking in the time dimension, so xr.map_blocks() receives the entire forcing dataset.

Note that this increases the memory footprint, especially in the Run Multiple Years (highres) notebook. I've had trouble getting enough resources on casper to run two years at a time.

@rmshkv -- do you want to play with this branch and see if you can get two years per run with the 0.1 degree forcing? Or should we bring it in as-is and then figure out how to update the notebook later?

Using open_mfdataset() in data_wrangling.py caused our forcing dataset to be
chunked in time. This didn't play nicely with xr.map_blocks(), resulting in the
wrong forcing data being available when trying to read multiple netcdf files
(such as the 0.1 degree POP time series files). Using xr.open_dataset() and
then merging all the datasets does not introduce chunking in the time
dimension, so xr.map_blocks() receives the entire forcing dataset.

Note that this increases the memory footprint, especially in the Run Multiple
Years (highres) notebook. I've had trouble getting enough resources on casper
to run two years at a time.
@mnlevy1981 mnlevy1981 requested a review from rmshkv August 31, 2023 19:58
To decrease memory usage, create temporary forcing stream files that only
include years that might be used by the current run (based on start_year and
nyears). That's start_year-1, start_year, ... start_year + nyears -1
(end_year), and start_year + nyears
Also update Run Multiple Years (highres).ipynb to only use one year of forcing
at a time (by creating temporary forcing stream files)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant