-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing Tempus adjoint tests #1008
Comments
@mperego Yes. I'm trying to track down what exactly happened. EMPIRE is seeing the same issue. I get different results depending on where the factorization is called... |
OK. Thanks for looking into that. |
Yes, thanks @cgcgcg ! If you know what is the problem and are working on it, I will not test this further but will wait for your fix. |
Could you pull Trilinos develop and check that it works again? |
Our nightly tests are based on Trilinos develop. So if it's OK to wait, we'll know tomorrow morning whether the problem has been fixed. |
@cgcgcg : I can test it today. It's easy enough to do. Please stay tuned. |
I have verified that the tests pass now with a new develop Trilinos. Thanks @cgcgcg ! I will close this issue tomorrow once our CDash is clean. |
Nice! For now I just reverted the offending commit. We will try to get this change in again at a later date once we understand what went wrong. |
With help from @mperego I was able to build Albany and run Here is what caused the failure:
I printed a stacktrace from the point where the factorization fails:
The matrix that is 25x25 and has 75 entries which are all identically zero. |
Thanks for digging into this @cgcgcg . I think it makes sense to reopen the issue - do you agree? Unless we want to open a separate Trilinos one. |
Sure, let's reopen. |
We need to understand why these tests set up a MueLu preconditioner for a singular matrix, and then not use the preconditioner. @ikalash do you have time to look into it? |
Perhaps I misunderstood what @cgcgcg wrote, but it seems that it is the matrix at the coarsest grid level that is singular. Is that right? If so, would that suggest that there is something wrong with the matrix problem being solved using the AMG? |
Sorry, I should have explained better. The problem is so small that this is a one-level method. The matrix is supplied by Albany. |
That's interesting. How was it working before? Was it because an iterative solve rather than a direct solve was done? Is there a branch/fork of Trilinos I can use to see the singularity / failure? |
The failure was triggered by MueLu switching the factorization of the coarse grid from first solve to setup. So it seems like Albany is constructing the preconditioner, but then doesn't use it to solve a system. I can provide a patch against Trilinos tomorrow morning that triggers the behavior. |
That would be great. I won't get to this until next week so it is no rush. |
I am very sorry but I still haven't had a chance to work on this. Unfortunately I am really swamped right now getting ready for 2 all-hands meetings after the shutdown and working on a few other time-critical things. Does someone else have the time to look at this issue? I can pass along instructions on how to reproduce it from @cgcgcg . Maybe we can discuss this at the Albany meeting tomorrow. |
I forgot to say, I am not sure when I would have a chance to look at this. |
Ok, per the discussion at today's Albany meeting, I switched the problematic tests so that they use Ifpack2 to avoid this issue, allowing @cgcgcg to merge his PR. We can look more at the cause of the issue in the new year when me / others have more time. @cgcgcg : very sorry for the delay! You should be able to merge your code now that you had reverted earlier due to these test failures. |
No problem! Thanks for letting me know! |
Sure! Again, my apologies that it took so long! |
The demoPDEs tests that use adjoints from Tempus started failing yesterday 11/7:
demoPDEs_Advection1D_Scalar_Param_Adjoint_Sens_Explicit
demoPDEs_Advection1D_with_Source_Dist_Param_Adjoint_Sens_Explicit_ConsistentM
demoPDEs_Thermal1D_with_Source_Dist_Param_Adjoint_Sens_Explicit
https://sems-cdash-son.sandia.gov/cdash/viewTest.php?onlyfailed&buildid=54052
It looks like there is an Amesos2 KLU2 error that happens after the time-integration is complete, it appears due to a messed up matrix that it is given:
I am wondering if this is related to recent changes to Tempus. Tagging @ccober6 who might have ideas about this theory.
I will investigate further.
The text was updated successfully, but these errors were encountered: