-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigation of unexpected behavior of ctest rrfs_3denvar_rdasens #766
Comments
@TingLei-NOAA and @RussTreadon-NOAA Thank you for the head-up. Since there is a RDHPCS ticket, we can wait for the further action from RDHPCS. |
@RussTreadon-NOAA Thanks for those info. I will study updates with those issues carefully first. |
It is confirmed the same behavior on hera (when ppn=5; nodes=4) , the rrfs_3denvar_rdasens_lopupdat became idle. |
An update: It is confirmed this ctest rrfs_3denvar_rdasens would pass using 20 mpi tasks on wcoss2. For being now, we could use the similar task numbers as on hera to let this ctest pass. But i think further investigation will be helpful. I will have more discussions (some off-line) with colleagues while I might submit a ticket for this problem). |
An ticket with orion had been opened. A self-contained test case on hera to reproduce this issue was created and sent to R. Reddy at the helpdesk (Thanks a lot!) |
@TingLei-NOAA , what is the status of this issue? |
@RussTreadon-NOAA I will follow on this and come back when I have more updates to share. |
@TingLei-daprediction , what is the status of this issue? PR #788 is a workaround, not a solution. |
@RussTreadon-NOAA Experts on RDHPCS helps desk haven't made progresses on this. We agreed that their work on this could be on hold with that ticked open and I will keep them posted if I have any new findings. |
Thank you @TingLei-NOAA . We periodically cycle through open GSI issues and PRs asking developers for updates. Developer feedback helps with planning and coordinating. Sometimes we even find issues which can be closed or PRs abandoned. |
@RussTreadon-NOAA Really appreciate your help on all those issues/problems we encountered in this "transition period"! |
As Peter Johnsen via orion help desk suggested and @RussTreadon-NOAA helped , the behavior of regional GSI after the orion upgrading is being investigate in relation to the issues, found on hercules, of the netcdf error (when I_MPI_EXTRA_FILESYSTEM) /or unproducible issues (#697),
it is found rrfs_3denvar_rdasens_loproc_updat would become idle (not finished in 1 hour 30 min) using 4 nodes and ppn=5. I have to follow the recent set up: 3 nodes , ppn=40 on hera given by @hu5970 and the job could finish successfully.
It is not clear to me what caused this and if it is an spontaneous issue (since on other complaints on this up to now) and this issue is to facilitate collaborative investigation into this issue.
In addition GSI developers mentioned in the above, I 'd also like to bring this to the attention of @ShunLiu-NOAA @DavidHuber-NOAA .
The text was updated successfully, but these errors were encountered: