Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sfc_climo_gen - use scale parameter to read scaled datasets #958

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

sanatcumar
Copy link
Collaborator

@sanatcumar sanatcumar commented Jun 3, 2024

DESCRIPTION OF CHANGES:

Update sfc_climo_gen to use optional scaling parameter to read surface fields when used.

TESTS CONDUCTED:

If there are changes to the build or source code, the tests below must be conducted. Contact a repository manager if you need assistance.

  • Compile branch on all Tier 1 machines using Intel (Orion, Jet, Hera, Hercules and WCOSS2).
  • Compile branch on Hera using GNU.
  • Compile branch in 'Debug' mode on WCOSS2.
  • Run unit tests locally on any Tier 1 machine.
  • Run relevant consistency tests locally on all Tier 1 machines.

Describe any additional tests performed.

DEPENDENCIES:

Add any links to pending PRs that are required prior to merging this PR. For example:

ufs-community/UFS_UTILS/pull/<pr_number>

DOCUMENTATION:

If this PR is contributing new capabilities that need to be documented, please also include updates to the RST files in the docs/source directory as supporting material.

ISSUE:

Fixes #957.

@GeorgeGayno-NOAA
Copy link
Collaborator

@sanatcumar - there should be a list of tests in the description section (automatically added via by this file: https://github.com/ufs-community/UFS_UTILS/blob/develop/.github/PULL_REQUEST_TEMPLATE)

I will add them manually.

@GeorgeGayno-NOAA GeorgeGayno-NOAA changed the title use scale parameter to read scaled datasets sfc_climo_gen - use scale parameter to read scaled datasets Jun 3, 2024
@GeorgeGayno-NOAA GeorgeGayno-NOAA self-requested a review June 3, 2024 19:02
@GeorgeGayno-NOAA
Copy link
Collaborator

@sanatcumar - what is the status of this PR?

@sanatcumar
Copy link
Collaborator Author

sanatcumar commented Jun 24, 2024

@sanatcumar - what is the status of this PR?

Hi George , I was waiting on using this new data sets for some test runs before I do this. This also involves adding a new data set to the fix directories. Could you refresh my memory on whom to contact or to raise a new dependency issue ?

I see you have added from template. Thanks. I will do some quick checks and request a review soon.

Cheers and thanks again

@GeorgeGayno-NOAA
Copy link
Collaborator

@sanatcumar - what is the status of this PR?

Hi George , I was waiting on using this new data sets for some test runs before I do this. This also involves adding a new data set to the fix directories. Could you refresh my memory on whom to contact or to raise a new dependency issue ?

I see you have added from template. Thanks. I will do some quick checks and request a review soon.

Cheers and thanks again

If you want new 'fixed' data hosted to the official baselines, make your request here (fixed file update): https://github.com/NOAA-EMC/global-workflow/issues/new/choose

Copy link
Collaborator

@GeorgeGayno-NOAA GeorgeGayno-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your branch looks out of date. Merge the latest updates from 'develop'.

Also, I see a branch called 'use_scale' in the authoritative fork. What is that? All branches should be in user forks.

@sanatcumar
Copy link
Collaborator Author

Your branch looks out of date. Merge the latest updates from 'develop'.

Also, I see a branch called 'use_scale' in the authoritative fork. What is that? All branches should be in user forks.

Hi George, Just synced to the latest develop and also deleted the branch in the authoritative fork. Thanks for pointing it out .

@GeorgeGayno-NOAA
Copy link
Collaborator

@sanatcumar - did you perform any of the tests under "TEST CONDUCTED". If you don't have access to a machine listed, let me know and I can run the tests for you.

@sanatcumar
Copy link
Collaborator Author

sanatcumar commented Aug 22, 2024

@sanatcumar - did you perform any of the tests under "TEST CONDUCTED". If you don't have access to a machine listed, let me know and I can run the tests for you.

Hi @GeorgeGayno-NOAA , thanks,
I have run it on Hera and can on Orion . I do not have access to others.
I changed my Phone and now my RSA needs to be setup again for access to Orion though. I am working on it.

@GeorgeGayno-NOAA
Copy link
Collaborator

@sanatcumar - what is the status of this PR?

@sanatcumar
Copy link
Collaborator Author

@sanatcumar - what is the status of this PR?

Hi George,
I am running into out of memory issues when I ran this on Orion. I tried to set the bigmem option and tried to increase the nodes and decrease the tasks per nodes to no avail. Do you have any suggestions ?.
Sanath

@GeorgeGayno-NOAA
Copy link
Collaborator

@sanatcumar - what is the status of this PR?

Hi George, I am running into out of memory issues when I ran this on Orion. I tried to set the bigmem option and tried to increase the nodes and decrease the tasks per nodes to no avail. Do you have any suggestions ?. Sanath

I would need to see your test script. How big is this new LAI data?

@sanatcumar
Copy link
Collaborator Author

@sanatcumar - what is the status of this PR?

Hi George, I am running into out of memory issues when I ran this on Orion. I tried to set the bigmem option and tried to increase the nodes and decrease the tasks per nodes to no avail. Do you have any suggestions ?. Sanath

I would need to see your test script. How big is this new LAI data?

Hi @GeorgeGayno-NOAA, the file size is about 10 Gigs . If you can see the log file at /home/skumar/UFS_UTILS/driver_scripts/log.fv3_grid_driver

"
/home/skumar/UFS_UTILS/driver_scripts/../fix/sfc_climo/LAI_climo_pnnl.nc

  • CALL FieldScatter FOR SOURCE GRID DATA.
  • CALL FieldScatter FOR SOURCE GRID DATA.
    ..........
  • CALL FieldRegridStore.
    slurmstepd: error: Detected 1 oom_kill event in StepId=19020013.0. Some of the step tasks have been OOM Killed.
    srun: error: orion-25-65: task 0: Out Of Memory
    "

I used
#SBATCH --nodes=6 --ntasks-per-node=12
#SBATCH --partition=bigmem
I tried with reducing the nodes to 2 and ntasks to 6 to no avail

Hope you can help me resolve this.
Cheers and thanks
Sanath

@GeorgeGayno-NOAA
Copy link
Collaborator

@sanatcumar - what is the status of this PR?

Hi George, I am running into out of memory issues when I ran this on Orion. I tried to set the bigmem option and tried to increase the nodes and decrease the tasks per nodes to no avail. Do you have any suggestions ?. Sanath

I would need to see your test script. How big is this new LAI data?

Hi @GeorgeGayno-NOAA, the file size is about 10 Gigs . If you can see the log file at /home/skumar/UFS_UTILS/driver_scripts/log.fv3_grid_driver

" /home/skumar/UFS_UTILS/driver_scripts/../fix/sfc_climo/LAI_climo_pnnl.nc

  • CALL FieldScatter FOR SOURCE GRID DATA.
  • CALL FieldScatter FOR SOURCE GRID DATA.
    ..........
  • CALL FieldRegridStore.
    slurmstepd: error: Detected 1 oom_kill event in StepId=19020013.0. Some of the step tasks have been OOM Killed.
    srun: error: orion-25-65: task 0: Out Of Memory
    "

I used #SBATCH --nodes=6 --ntasks-per-node=12 #SBATCH --partition=bigmem I tried with reducing the nodes to 2 and ntasks to 6 to no avail

Hope you can help me resolve this. Cheers and thanks Sanath

You can try increasing the nodes - try 8 nodes and 12 tasks per node. The LAI uses conservative interpolation, which is likely very expensive. If increasing the nodes does not work, you may have to upscale LAI dataset to 0.03 or 0.05-degree as we have done with the soil and vegetation type datasets. Or you can try a different interpolation method.

Since we are running into these memory issues more frequently, perhaps we should replace sfc_climo_gen with a program that does not rely on ESMF. But that is a 'day two' project. ESMF is convenient when interpolating from tiles to tiles. But interpolating from lat/lon to tiles could be done using the NCEPLIBS IPOLATES library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update sfc_climo gen to read scaled data sets
2 participants