-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased gsi.x and enkf.x executable wall times following Orion Rocky 9 upgrade #771
Comments
Add Orion: total
Hercules: total
The largest difference in Orion and Hercules timings occurs when reading bufr observations in parallel. Orion took 407 seconds longer for all tasks to process bufr observations than did Hercules. Orion
Hercules
Timers can be added to |
Add timers to Orion
Hercules
Comparison of the wall times for the same observation type show that Orion times are 3 to 4 times greater than Hercules. This increase is unexpected. |
As a test compile same GSI code on Hera and Dogwood (WCOSS2) and run
The same comment has been added to spack-stack issue #1166 |
debufr test Use bufr utility |
@AlexanderRichert-NOAA suggested compiling the GSI and ENKF on Hercules then running the ctests on Orion with those executables. After compiling, the runtime.global_4denvar_hiproc_contrl.txt: The total amount of wall time = 1394.165947
runtime.global_4denvar_hiproc_updat.txt: The total amount of wall time = 751.449358
runtime.global_4denvar_loproc_contrl.txt: The total amount of wall time = 1059.320163
runtime.global_4denvar_loproc_updat.txt: The total amount of wall time = 986.416513
runtime.global_enkf_hiproc_contrl.txt: The total amount of wall time = 159.291133
runtime.global_enkf_hiproc_updat.txt: The total amount of wall time = 160.180497
runtime.global_enkf_loproc_contrl.txt: The total amount of wall time = 204.069604
runtime.global_enkf_loproc_updat.txt: The total amount of wall time = 186.185513
runtime.hafs_3denvar_hybens_hiproc_contrl.txt: The total amount of wall time = 635.210342
runtime.hafs_3denvar_hybens_hiproc_updat.txt: The total amount of wall time = 625.259733
runtime.hafs_3denvar_hybens_loproc_contrl.txt: The total amount of wall time = 719.252038
runtime.hafs_3denvar_hybens_loproc_updat.txt: The total amount of wall time = 664.469968
runtime.hafs_4denvar_glbens_hiproc_contrl.txt: The total amount of wall time = 686.031241
runtime.hafs_4denvar_glbens_hiproc_updat.txt: The total amount of wall time = 677.697564
runtime.hafs_4denvar_glbens_loproc_contrl.txt: The total amount of wall time = 804.612061
runtime.hafs_4denvar_glbens_loproc_updat.txt: The total amount of wall time = 738.414877
runtime.rtma_hiproc_contrl.txt: The total amount of wall time = 354.441046
runtime.rtma_hiproc_updat.txt: The total amount of wall time = 351.002411
runtime.rtma_loproc_contrl.txt: The total amount of wall time = 362.241694
runtime.rtma_loproc_updat.txt: The total amount of wall time = 366.629561 |
Install GSI
No timing is provided for For this ctests which did complete there is no improvement in
|
Orion and Hercules ctests Install and build GSI Orion
The rrfs_3denvar_rdasens test did not complete within the specified 3 hours, 15 minute wall clock time. The rrfs_3denvar_radsens ctest hangs. Tagging @TingLei-NOAA , @ShunLiu-NOAA , and @hu5970 for awareness. Hercules
The rrfs_3denvar_rdasens` test failed due to
This is not a fatal fail. The Orion
Hercules
The Orion wall times about 2x greater (slower) than Hercules. The run time performance of |
@RussTreadon-NOAA Thank you!
could be changed to (for popts)
Is that ok for you to make this change or I should open another PR for this? |
@TingLei-NOAA , please follow the procedure outlined under GSI wiki entry How-to-Make-Changes to submit changes to GSI I modified the Orion job configuration for rrfs_3denvar_rdasens as you indicate. Note that
will not work on Orion. The above oversubscribes the node since we are requesting 40 tasks per node with 2 threads per task. Orion nodes only have 40 slots. Attempts to run with the above settings result in a
Change the above job configuration to
With this change both the loproc and hiproc jobs run to completion on Orion.
The Hercules test ran much faster
|
@RussTreadon-NOAA Sorry for my oversight in my previous suggestions. |
@DavidHuber-NOAA , do we have any updates on this issue? |
About a month ago, I spoke with Raghu about this issue and he was going to see what he could do. I will message him about it today and see if he has any updates. |
Thank you @DavidHuber-NOAA. It would be nice to eventually get to the bottom of this. |
This issue is the GSI counterpart to spack-stack issue #1166.
gsi.x
andenkf.x
wall times significantly increased following the Orion Rocky 9 upgrade.develop
at 529bb79 was installed on Orion and Hercules. The standard suite of 6 ctests were run on both machines. Below are thegsi.x
andenkf.x
wall times for individual jobs in the ctests.Orion wall times
Hercules wall times
There is no Orion times for the rrfs_3denvar_rdasens test because this ctest hangs on Orion. See issue #766 for details.
Orion
gsi.x
wall times are 2 to 3 times greater than their Hercules counterpart. We expect Hercules wall times to be a bit faster than Orion but not by a factor of 2 to 3. The increase in Orionenkf.x
wall times with respect to Hercules is not as significant.This issue is opened to document
gsi.x
andenfk.x
wall times on Orion following the Rocky 9 upgrade.gsi.x
andenkf.x
wall times to pre-Rocky 9 valuesThe text was updated successfully, but these errors were encountered: