Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion/enhancement: single mpas-o block exit when E3SM crashes #84

Open
alicebarthel opened this issue Mar 15, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@alicebarthel
Copy link
Collaborator

Goal: make better use of mpaso blocks

Pbl: mpas-o doesn't abort early enough for block stats to helpful to diagnose source of errors when a crash happens (as seen in the crashed during the v3 critical path crashes, e.g. v3alpha04bigrid crash investigation), requiring restarting with high freq outputs to chase the source of the crash.
It generates 20+ blocks, which I never look at and ocean_validate() exits when you have NaN in too many fields to be informative.

Current state:
I tested a couple of approaches adding a tracer check (back in Oct 2023)
ocn_validate
ocn_validate2
In the record of runs, these fail and exit with a single block -- which I find more useful.
The second one was an attempt to get a more useful error message (detailing the reason for the fail) but it needs more work.

@cbegeman you may want to add a check like that in your current HR debugging if that helps.

Note: this investigation also revealed a separate issue with the abort() call in framework/maps_log.F, which hangs instead of exiting cleanly when on a distributed layout. Flat/stacked layout is fine. @jonbob was a real help on this!

@alicebarthel alicebarthel added the enhancement New feature or request label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant