[petsc-users] error when solving a linear system with gmres + pilut/euclid

Mon Aug 24 19:00:25 CDT 2020

   Oh yes, it could happen with Nan. 

   KSPGMRESClassicalGramSchmidtOrthogonalization() calls  KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason  but the call to MAXPY() is still made before returning and hence producing the error message.

   We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message. 

  Alfredo,

    It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations.

   As I suggested before run with -pc_type bjacobi to see if you get the same problem.

  Barry

> On Aug 24, 2020, at 6:38 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>    Alfredo,
> 
>       This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes.
> 
>        If you run with -pc_type bjacobi does it also happen?
> 
>        Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 
> 
>       Could be memory corruption, can you run under valgrind?
> 
> Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it.
> 
>   Thanks,
> 
>     Matt
>  
>     Barry
> 
> 
> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo <ajaramillopalma at gmail.com <mailto:ajaramillopalma at gmail.com>> wrote:
> > 
> > Dear PETSc developers,
> > 
> > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options:
> > 
> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor
> > 
> > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations:
> > 
> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
> > 
> > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this?
> > 
> > best regards
> > Alfredo
> > <Screenshot from 2020-08-24 17-57-52.png>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200824/7f65b914/attachment.html>