Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit code is zero even when simulator fails #125

Open
AugustoPeres opened this issue Feb 20, 2024 · 4 comments
Open

Exit code is zero even when simulator fails #125

AugustoPeres opened this issue Feb 20, 2024 · 4 comments

Comments

@AugustoPeres
Copy link

Hi there,

I have recently started using the schism simulator and noticed that the exit code is zero even when the simulator fails:

root@a28729b6320d:/Test_Convergence_Grid1# mpirun --np 3 /schism/build/bin/pschism  
 Must have at least 1 cmd argument: # of scribes to run, or -v for version.
 Must have at least 1 cmd argument: # of scribes to run, or -v for version.
 Must have at least 1 cmd argument: # of scribes to run, or -v for version.
root@a28729b6320d:/Test_Convergence_Grid1# echo $?
0

Is there anyway we can have the exit code reflect the fact that the simulation failed?

@pmav99
Copy link
Contributor

pmav99 commented Feb 20, 2024

yeah, we've also had problems with this. As a workaround, we are parsing the stdout and stderr output and have some heuristics that determine if there was an error after all.

For the record, handling this can be even more complicated because the error codes depend on the MPI implementation, too.
For instance, on some tests we did with an older schism version (5.9):

openmpi + mpirun -n 8 schism -> error code 0
mpich + mpirun -n 8 schism -> error code 0 or 9 - about 50-50 between them

Now this might be an issue with openmpi/mpich but it could also be an issue of the way schism's MPI code has been implemented. Haven't really looked deeper into it.

@AugustoPeres
Copy link
Author

@pmav99, thank you very much for your reply.

We will take a look at how to parse the stdout and stderr to detect failed simulations. Could you share a little bit more on the heuristics that you are using to catch failed simulations?

However, it you be great if this was working out-of-the-box :)

@josephzhang8
Copy link
Member

The error says you need to specify # of scribe processes; see online manual.

@hpenedones
Copy link

any update on this? is the way schism handles exit codes been updated in the meantime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants