Optimize the reading of ensembles and setup for global multiscale runs #594

jderber-NOAA · 2023-07-26T14:10:23Z

This update improves the efficiency of the GSI, especially for multiscale runs. Details can be found in issue#585

The runs produce identical results except when ensembles are used. Identical results can be produced with ensembles as well with changes to 3 lines of code. These lines zero out negative moistures before creating virtual temperatures and use the original sensible temperatures rather than ones created from virtual temperatures (which were created from the original sensible temperatures).

All regression tests passed except due to the above reason. When those 3 lines were changed back, all regression tests passed. Changes due to the above 3 lines were very minor.

All testing was performed by myself on Hera.

See Issue to see examples of speed-ups of the code that resulted from this change.

Fixes #585

Checklist

[x ] My code follows the style guidelines of this project
[x ] I have performed a self-review of my own code
[ x] I have commented my code, particularly in hard-to-understand areas
[x ] New and existing tests pass with my changes
Any dependent changes have been merged and published

DUE DATE for this PR is 9/6/2023. If this PR is not merged into develop by this date, the PR will be closed and returned to the developer.

…FV3 DA (FV3LAMDA)

jderber-NOAA · 2023-09-03T20:25:02Z

Updated to head of trunk and remove commented out line from build.sh.

jderber-NOAA · 2023-09-04T19:45:02Z

Regression tests were rerun with this updated version of the code. All regression tests passed except 4denvar. The last update to the trunk appears to have introduced a very small change into the 4denvar. This difference is certainly at the scale of round-off. No difference in the initial penalty. First 3 iterations of the control and update are given below.

< cost,grad,step,b,step? = 1 0 6.592319865747931181E+05 1.700824233239468640E+03 1.057506532852620307E+00 0.000000000000000000E+00 good
< cost,grad,step,b,step? = 1 1 6.555948920937654329E+05 2.114103169411071576E+03 1.927472355822770655E+00 1.302893266057407962E+00 good
< cost,grad,step,b,step? = 1 2 6.469553380174754420E+05 1.349404725181090953E+03 2.810796397253191525E+00 1.212777155871386014E+00 good

cost,grad,step,b,step? = 1 0 6.592319865747931181E+05 1.700824233239468640E+03 1.057506532852620751E+00 0.000000000000000000E+00 good
cost,grad,step,b,step? = 1 1 6.555948920937654329E+05 2.114103169411072486E+03 1.927472355822761774E+00 1.302893266057409294E+00 good
cost,grad,step,b,step? = 1 2 6.469553380174754420E+05 1.349404725181082085E+03 2.810796397253162660E+00 1.212777155871367807E+00 good

Trying to find reason for small difference.

RussTreadon-NOAA · 2023-09-07T13:41:13Z

WCOSS2 ctests
Install jderber-NOAA:optimize3 at 3e918e1 on Dogwood. Run ctests with following results

russ.treadon@dlogin08:/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr594/build> ctest -j 9
Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr594/build
    Start 1: global_3dvar
    Start 2: global_4dvar
    Start 3: global_4denvar
    Start 4: hwrf_nmm_d2
    Start 5: hwrf_nmm_d3
    Start 6: rtma
    Start 7: rrfs_3denvar_glbens
    Start 8: netcdf_fv3_regional
    Start 9: global_enkf
1/9 Test #8: netcdf_fv3_regional ..............***Failed  483.81 sec
2/9 Test #7: rrfs_3denvar_glbens ..............   Passed  605.92 sec
3/9 Test #9: global_enkf ......................   Passed  678.08 sec
4/9 Test #5: hwrf_nmm_d3 ......................   Passed  797.90 sec
5/9 Test #4: hwrf_nmm_d2 ......................   Passed  1026.16 sec
6/9 Test #6: rtma .............................   Passed  1272.35 sec
7/9 Test #3: global_4denvar ...................***Failed  1503.70 sec
8/9 Test #2: global_4dvar .....................   Passed  1744.93 sec
9/9 Test #1: global_3dvar .....................   Passed  1983.04 sec

78% tests passed, 2 tests failed out of 9

Total Test time (real) = 1983.04 sec

The following tests FAILED:
          3 - global_4denvar (Failed)
          8 - netcdf_fv3_regional (Failed)
Errors while running CTest

The netcdf_fv3_regional failure is due to

The memory for netcdf_fv3_regional_loproc_updat is 352488 KBs.  This has exceeded maximum allowable memory of 238774 KBs, resulting in Failure memthresh of the regression test.

A check of the task 0 maximum resident set sizes for the updat (jderber-NOAA:optimize3) and contrl (develop) confirms that the loproc_updat uses more memory than loproc_contrl

netcdf_fv3_regional_hiproc_contrl/stdout:The maximum resident set size (KB)                   = 364344
netcdf_fv3_regional_hiproc_updat/stdout:The maximum resident set size (KB)                   = 364280
netcdf_fv3_regional_loproc_contrl/stdout:The maximum resident set size (KB)                   = 217068
netcdf_fv3_regional_loproc_updat/stdout:The maximum resident set size (KB)                   = 352488

The loproc_updat maximum resident set size is more consistent with the loproc_contrl for other ctests. It's not clear why the difference is larger for netcdf_fv3_regional. This failure, however, is not viewed as a fatal fail.

The global_4denvar failure is due to

The results (penalty) between the two runs are nonreproducible,
thus the regression test has Failed on cost for global_4denvar_loproc_updat and global_4denvar_loproc_contrl analyses.

The case has Failed the scalability test.
The slope for the update (54.378945 seconds per node) is less than that for the control (59.441120 seconds per node).

A check of the wall times shows that the updat code runs faster than the contrl

global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 278.396694
global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 262.273809
global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 337.837814
global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 305.776965

This is consistent with the optimization purpose of this PR. This is not a fatal fail.

The non-reproducible results between the updat and contrl is more puzzling. The initial total penalty and gradient are identical between the two codes. Differences show up in the step size for the second iteration of the first outer loop. 15 of the 19 printed digits are identical. Differences in the last four digits are at the level of real(8) numerical roundoff

updat

Initial cost function =  6.592319865747931181E+05
Initial gradient norm =  1.700824233239470232E+03
cost,grad,step,b,step? =   1   0  6.592319865747931181E+05  1.700824233239470232E+03  1.057506532852622527E+00  0.000000000000000000E+00  good
cost,grad,step,b,step? =   1   1  6.555948920937654329E+05  2.114103169411077488E+03  1.927472355822775318E+00  1.302893266057414179E+00  good

contrl

Initial cost function =  6.592319865747931181E+05
Initial gradient norm =  1.700824233239470232E+03
cost,grad,step,b,step? =   1   0  6.592319865747931181E+05  1.700824233239470232E+03  1.057506532852622527E+00  0.000000000000000000E+00  good
cost,grad,step,b,step? =   1   1  6.555948920937654329E+05  2.114103169411077488E+03  1.927472355822779093E+00  1.302893266057414179E+00  good

John found similar behavior in his tests.

jderber-NOAA · 2023-09-07T16:51:37Z

orking on the nonreproducible issue. I have not found anything that I changed that should change the round-off or anything. Hope to find something soon. Surprised at the large increase in memory. Will look at it after finding the nonreproducible issue.

RussTreadon-NOAA · 2023-09-14T13:20:21Z

@jderber-NOAA , when you have time would you please update jderber-NOAA:optimize3 with the current head of the authoritative GSI develop? We may need to do this a few more times before this PR is merged into develop.

jderber-NOAA · 2023-09-21T13:55:58Z

Reason for reproducibility issue found. It was documented earlier in this development. With the change of 3 lines in get_gefs_ensperts_dualres.f90 (around line 190) all regression tests passed.

While looking for the reproducibility problem a few changes were made.

A major error in control_vectors.f90 was found. The results from partsum were not saved from one value of nsubwin to the next. By saving the values of partsum, this problem is eliminated.
A very minor change was made in general_spectral_transforms.f90. The lines were already in place but commented out. real(grd%nlon,r_kind) was used rather than float(grd%nlon).
In hybrid_ensemble_isotropic.F90, subroutine bkerror_a_en, the indices of the alphacvarsclgrpmat array were reversed. This is a symmetric matrix so is not a real issue, but should be made right.
In read_prepbufr.f90, subroutine read_prepbufr, the initialization of uob,vob and oelev used 0.0. This was changed to using constant zero.

Regression tests were rerun. All passed except rrfs_3denvar_glbens. Not sure why this did not pass. Results were the same.
rrfs_3denvar_glbens_hiproc_contrl/stdout:The total amount of wall time = 93.271814
rrfs_3denvar_glbens_hiproc_updat/stdout:The total amount of wall time = 81.411227
rrfs_3denvar_glbens_loproc_contrl/stdout:The total amount of wall time = 135.722101
rrfs_3denvar_glbens_loproc_updat/stdout:The total amount of wall time = 111.039343

Run times were faster.
rrfs_3denvar_glbens_hiproc_contrl/stdout:The maximum resident set size (KB) = 1136508
rrfs_3denvar_glbens_hiproc_updat/stdout:The maximum resident set size (KB) = 1136728
rrfs_3denvar_glbens_loproc_contrl/stdout:The maximum resident set size (KB) = 1793072
rrfs_3denvar_glbens_loproc_updat/stdout:The maximum resident set size (KB) = 1793692

Slightly more memory. This must have been the reason for the failure. But results are reasonable.

Updating to head of trunk.

RussTreadon-NOAA · 2023-09-21T15:06:33Z

WCOSS2 ctests
Install fresh clone of jderber-NOAA:optimize3 at 0b6bde9 on Cactus. Run ctests with following results

russ.treadon@clogin04:/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr594/build> ctest -j 9
Test project /lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/pr594/build
    Start 1: global_3dvar
    Start 2: global_4dvar
    Start 3: global_4denvar
    Start 4: hwrf_nmm_d2
    Start 5: hwrf_nmm_d3
    Start 6: rtma
    Start 7: rrfs_3denvar_glbens
    Start 8: netcdf_fv3_regional
    Start 9: global_enkf
1/9 Test #8: netcdf_fv3_regional ..............***Failed  483.92 sec
2/9 Test #5: hwrf_nmm_d3 ......................   Passed  493.34 sec
3/9 Test #7: rrfs_3denvar_glbens ..............   Passed  605.97 sec
4/9 Test #4: hwrf_nmm_d2 ......................   Passed  607.00 sec
5/9 Test #9: global_enkf ......................   Passed  610.84 sec
6/9 Test #6: rtma .............................   Passed  1210.05 sec
7/9 Test #3: global_4denvar ...................   Passed  1442.23 sec
8/9 Test #1: global_3dvar .....................   Passed  1502.22 sec
9/9 Test #2: global_4dvar .....................   Passed  1563.07 sec

89% tests passed, 1 tests failed out of 9

Total Test time (real) = 1563.08 sec

The following tests FAILED:
          8 - netcdf_fv3_regional (Failed)

The netcdf_fv3_regional failure is due to the timing scalability check.

The case has Failed the scalability test.
The slope for the update (.601893 seconds per node) is less than that for the control (1.270841 seconds per node).

Examination of the updat and contrl wall times does not show any anomalous behavior

russ.treadon@clogin04:/lfs/h2/emc/ptmp/russ.treadon/pr594/tmpreg_netcdf_fv3_regional> grep wall */stdout
netcdf_fv3_regional_hiproc_contrl/stdout:The total amount of wall time                        = 63.280665
netcdf_fv3_regional_hiproc_updat/stdout:The total amount of wall time                        = 63.970015
netcdf_fv3_regional_loproc_contrl/stdout:The total amount of wall time                        = 64.551506
netcdf_fv3_regional_loproc_updat/stdout:The total amount of wall time                        = 64.451530

This is not a fatal fail.

jderber-NOAA · 2023-09-21T15:34:39Z

After updating to the head of the trunk (both the control and updat), all regression tests passed on Hera.

Test project /scratch1/NCEPDEV/da/John.Derber/converge4/GSI/build
Start 1: global_3dvar
Start 2: global_4dvar
Start 3: global_4denvar
Start 4: hwrf_nmm_d2
Start 5: hwrf_nmm_d3
Start 6: rtma
Start 7: rrfs_3denvar_glbens
Start 8: netcdf_fv3_regional
Start 9: global_enkf
1/9 Test #5: hwrf_nmm_d3 ...................... Passed 560.00 sec
2/9 Test #9: global_enkf ...................... Passed 560.73 sec
3/9 Test #8: netcdf_fv3_regional .............. Passed 606.70 sec
4/9 Test #7: rrfs_3denvar_glbens .............. Passed 610.09 sec
5/9 Test #4: hwrf_nmm_d2 ...................... Passed 610.56 sec
6/9 Test #6: rtma ............................. Passed 1456.16 sec
7/9 Test #3: global_4denvar ................... Passed 1636.21 sec
8/9 Test #2: global_4dvar ..................... Passed 1811.06 sec
9/9 Test #1: global_3dvar ..................... Passed 1813.70 sec

100% tests passed, 0 tests failed out of 9

Total Test time (real) = 1813.71 sec

I will put back in the code the 3 lines that do give a minor difference. These lines result in fewer conversions between variables and should produce slightly more consistent results.

RussTreadon-NOAA

Approve pending peer reviews

TingLei-NOAA

Finished another a test using 3km conus domain RRFS case. This PR gives the identical results (final cost and gradients) compared with the EMC GSI trunk .
Thanks for this continual improvement over GSI

src/gsi/hybrid_ensemble_isotropic.F90

TingLei-NOAA · 2023-09-21T19:56:24Z

A note: in my RRFS run, it is found there are little differences between the GSI of this PR built with or not with debug mode.
For example ( after 1hr30min run), the final cost and grad and so on are , for gsi in debug mode:

 cost,grad,step,b,step? =   2  31  5.108161372585962818E+04  4.728171079078897776E+01  2.902662923838958076E+00  1.425154427382457012E+00  good

It is , for GSI built with "realease" mode:

 cost,grad,step,b,step? =   2  31  5.108161597988699941E+04  4.728289846049271006E+01  2.902840431607158767E+00  1.425206942448039582E+00  good

EMC GSI trunk shows the same behavior. Namely, in debug mode, this PR and EMC GSI trunk show the identical results for build type : Release or debug and the above tiny differences exist between Release mode and debug mode for both branches.
I didn't notice this behavior before. If it is an issue to be further investigated , it is not specific to this PR.

CatherineThomas-NOAA

Looks great. Thanks @jderber-NOAA!

@TingLei-NOAA Thanks for looking into that reproducibility issue. If this is also an issue in the develop branch, I don't see a need to hold up this PR.

TingLei-NOAA · 2023-09-22T13:56:30Z

@CatherineThomas-NOAA Agree!

jderber-NOAA · 2023-09-22T16:06:06Z

After latest update to head of the trunk, regression test run. Results as expected. The update did not impact changes.

RussTreadon-NOAA · 2023-09-22T16:15:02Z

Given the following

jderber-NOAA:optimize3 at 982425d is up to date with authoritative develop
approvals from two peer reviews
ctests from current head of jderber-NOAA:optimize3 pass
GSI Handling Review notified. OK received

proceed to merge jderber-NOAA:optimize3 into authoritative develop

jderber-NOAA added 30 commits August 24, 2021 13:36

GitHub Issue NOAA-EMC#175. Use the global 127L B-Matrix in regional …

b846b29

…FV3 DA (FV3LAMDA)

Merge remote-tracking branch 'upstream/master'

05f1c1f

Merge remote-tracking branch 'upstream/master'

059e402

GitHub Issue NOAA-EMC#219 Improve Minimization and fix bug in vqc

f938842

Merge remote-tracking branch 'upstream/master'

fafadac

fix setupw

52c5ae6

Merge remote-tracking branch 'upstream/master'

f00e377

Merge remote-tracking branch 'upstream/master'

7703367

Merge remote-tracking branch 'upstream/master'

9eb9606

Merge remote-tracking branch 'upstream/develop' into develop

3eb0e13

Merge remote-tracking branch 'upstream/develop' into develop

ddced98

Merge remote-tracking branch 'upstream/develop' into develop

f60343b

Merge remote-tracking branch 'upstream/develop' into develop

8dbfbd1

Merge remote-tracking branch 'upstream/develop' into develop

1554f65

Merge remote-tracking branch 'upstream/develop' into develop

bf060fd

Merge remote-tracking branch 'upstream/develop' into develop

3f073fa

Merge remote-tracking branch 'upstream/develop' into develop

7f62d1c

Merge remote-tracking branch 'upstream/develop' into develop

85cbdb1

Merge remote-tracking branch 'upstream/develop' into develop

51a444b

Merge remote-tracking branch 'upstream/develop' into develop

0be4126

Merge remote-tracking branch 'upstream/develop' into develop

57fda95

Merge remote-tracking branch 'upstream/develop' into develop

6336b79

Merge remote-tracking branch 'upstream/develop' into develop

a10841d

Merge remote-tracking branch 'upstream/develop' into develop

7261674

Optimization first step

2fc7545

optimization improvements

d50ae8d

Optimization changes 2

254c59d

Working version 1

a899e45

additional optimizations

d0b9848

Additional optimization changes.

baf7fa7

Merge branch 'develop' into optimize3

813b6bf

Remove line from build.sh

3e918e1

jderber-NOAA dismissed CatherineThomas-NOAA’s stale review via 3e918e1 September 3, 2023 20:30

RussTreadon-NOAA mentioned this pull request Sep 7, 2023

Cads for andrew #616

Merged

6 tasks

jderber-NOAA added 3 commits September 21, 2023 13:56

Final update of branch. Fixes error in control_vectors.

299a1e5

Merge remote-tracking branch 'upstream/develop' into develop

7e4f656

Merge branch 'develop' into optimize3

0b6bde9

switch back to slightly non-reproducible version

4639bfe

jderber-NOAA requested review from CatherineThomas-NOAA and TingLei-NOAA September 21, 2023 15:44

RussTreadon-NOAA self-requested a review September 21, 2023 18:46

RussTreadon-NOAA approved these changes Sep 21, 2023

View reviewed changes

TingLei-NOAA approved these changes Sep 21, 2023

View reviewed changes

CatherineThomas-NOAA reviewed Sep 21, 2023

View reviewed changes

src/gsi/hybrid_ensemble_isotropic.F90 Show resolved Hide resolved

CatherineThomas-NOAA approved these changes Sep 22, 2023

View reviewed changes

jderber-NOAA added 2 commits September 22, 2023 15:01

Merge remote-tracking branch 'upstream/develop' into develop

f8f1bbf

Merge branch 'develop' into optimize3

982425d

RussTreadon-NOAA merged commit 2f4e7fe into NOAA-EMC:develop Sep 22, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the reading of ensembles and setup for global multiscale runs #594

Optimize the reading of ensembles and setup for global multiscale runs #594

jderber-NOAA commented Jul 26, 2023 •

edited by RussTreadon-NOAA

Loading

jderber-NOAA commented Sep 3, 2023 •

edited

Loading

jderber-NOAA commented Sep 4, 2023 •

edited

Loading

RussTreadon-NOAA commented Sep 7, 2023

jderber-NOAA commented Sep 7, 2023 via email •

edited

Loading

RussTreadon-NOAA commented Sep 14, 2023

jderber-NOAA commented Sep 21, 2023 •

edited

Loading

RussTreadon-NOAA commented Sep 21, 2023

jderber-NOAA commented Sep 21, 2023 •

edited

Loading

RussTreadon-NOAA left a comment

TingLei-NOAA left a comment

TingLei-NOAA commented Sep 21, 2023 •

edited

Loading

CatherineThomas-NOAA left a comment

TingLei-NOAA commented Sep 22, 2023

jderber-NOAA commented Sep 22, 2023 •

edited

Loading

RussTreadon-NOAA commented Sep 22, 2023

Optimize the reading of ensembles and setup for global multiscale runs #594

Optimize the reading of ensembles and setup for global multiscale runs #594

Conversation

jderber-NOAA commented Jul 26, 2023 • edited by RussTreadon-NOAA Loading

jderber-NOAA commented Sep 3, 2023 • edited Loading

jderber-NOAA commented Sep 4, 2023 • edited Loading

RussTreadon-NOAA commented Sep 7, 2023

jderber-NOAA commented Sep 7, 2023 via email • edited Loading

RussTreadon-NOAA commented Sep 14, 2023

jderber-NOAA commented Sep 21, 2023 • edited Loading

RussTreadon-NOAA commented Sep 21, 2023

jderber-NOAA commented Sep 21, 2023 • edited Loading

RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

TingLei-NOAA left a comment

Choose a reason for hiding this comment

TingLei-NOAA commented Sep 21, 2023 • edited Loading

CatherineThomas-NOAA left a comment

Choose a reason for hiding this comment

TingLei-NOAA commented Sep 22, 2023

jderber-NOAA commented Sep 22, 2023 • edited Loading

RussTreadon-NOAA commented Sep 22, 2023

jderber-NOAA commented Jul 26, 2023 •

edited by RussTreadon-NOAA

Loading

jderber-NOAA commented Sep 3, 2023 •

edited

Loading

jderber-NOAA commented Sep 4, 2023 •

edited

Loading

jderber-NOAA commented Sep 7, 2023 via email •

edited

Loading

jderber-NOAA commented Sep 21, 2023 •

edited

Loading

jderber-NOAA commented Sep 21, 2023 •

edited

Loading

TingLei-NOAA commented Sep 21, 2023 •

edited

Loading

jderber-NOAA commented Sep 22, 2023 •

edited

Loading