Exit code is zero even when simulator fails #125

AugustoPeres · 2024-02-20T10:06:39Z

Hi there,

I have recently started using the schism simulator and noticed that the exit code is zero even when the simulator fails:

root@a28729b6320d:/Test_Convergence_Grid1# mpirun --np 3 /schism/build/bin/pschism  
 Must have at least 1 cmd argument: # of scribes to run, or -v for version.
 Must have at least 1 cmd argument: # of scribes to run, or -v for version.
 Must have at least 1 cmd argument: # of scribes to run, or -v for version.
root@a28729b6320d:/Test_Convergence_Grid1# echo $?
0

Is there anyway we can have the exit code reflect the fact that the simulation failed?

The text was updated successfully, but these errors were encountered:

pmav99 · 2024-02-20T10:21:12Z

yeah, we've also had problems with this. As a workaround, we are parsing the stdout and stderr output and have some heuristics that determine if there was an error after all.

For the record, handling this can be even more complicated because the error codes depend on the MPI implementation, too.
For instance, on some tests we did with an older schism version (5.9):

openmpi + mpirun -n 8 schism -> error code 0
mpich + mpirun -n 8 schism -> error code 0 or 9 - about 50-50 between them

Now this might be an issue with openmpi/mpich but it could also be an issue of the way schism's MPI code has been implemented. Haven't really looked deeper into it.

AugustoPeres · 2024-02-21T16:10:13Z

@pmav99, thank you very much for your reply.

We will take a look at how to parse the stdout and stderr to detect failed simulations. Could you share a little bit more on the heuristics that you are using to catch failed simulations?

However, it you be great if this was working out-of-the-box :)

josephzhang8 · 2024-02-22T02:59:54Z

The error says you need to specify # of scribe processes; see online manual.

hpenedones · 2024-10-01T13:11:46Z

any update on this? is the way schism handles exit codes been updated in the meantime?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exit code is zero even when simulator fails #125

Exit code is zero even when simulator fails #125

AugustoPeres commented Feb 20, 2024

pmav99 commented Feb 20, 2024 •

edited

Loading

AugustoPeres commented Feb 21, 2024

josephzhang8 commented Feb 22, 2024

hpenedones commented Oct 1, 2024

Exit code is zero even when simulator fails #125

Exit code is zero even when simulator fails #125

Comments

AugustoPeres commented Feb 20, 2024

pmav99 commented Feb 20, 2024 • edited Loading

AugustoPeres commented Feb 21, 2024

josephzhang8 commented Feb 22, 2024

hpenedones commented Oct 1, 2024

pmav99 commented Feb 20, 2024 •

edited

Loading