[petsc-users] error when solving a linear system with gmres + pilut/euclid
Barry Smith
bsmith at petsc.dev
Mon Aug 24 19:00:25 CDT 2020
Oh yes, it could happen with Nan.
KSPGMRESClassicalGramSchmidtOrthogonalization() calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason but the call to MAXPY() is still made before returning and hence producing the error message.
We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message.
Alfredo,
It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations.
As I suggested before run with -pc_type bjacobi to see if you get the same problem.
Barry
> On Aug 24, 2020, at 6:38 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>
> Alfredo,
>
> This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes.
>
> If you run with -pc_type bjacobi does it also happen?
>
> Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5
>
> Could be memory corruption, can you run under valgrind?
>
> Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it.
>
> Thanks,
>
> Matt
>
> Barry
>
>
> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo <ajaramillopalma at gmail.com <mailto:ajaramillopalma at gmail.com>> wrote:
> >
> > Dear PETSc developers,
> >
> > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options:
> >
> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor
> >
> > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations:
> >
> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
> >
> > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this?
> >
> > best regards
> > Alfredo
> > <Screenshot from 2020-08-24 17-57-52.png>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200824/7f65b914/attachment.html>
More information about the petsc-users
mailing list