[petsc-users] error when solving a linear system with gmres + pilut/euclid

Alfredo Jaramillo ajaramillopalma at gmail.com
Mon Aug 24 20:35:42 CDT 2020


Hello Barry, Matthew, thanks for the replies !

Yes, it is our custom code, and it also happens when setting -pc_type
bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type
preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues.

Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any
problem.

To reproduce the situation on my computer, I was able to reproduce the
error for a small case and -pc_type bjacobi. For that particular case, when
running in the cluster the error appears at the very last iteration:

=====
27 KSP Residual norm 8.230378644666e-06
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
====

whereas running on my computer the error is not launched and convergence is
reached instead:

====
Linear interp_ solve converged due to CONVERGED_RTOL iterations 27
====

I will run valgrind to seek for possible memory corruptions.

thank you
Alfredo

On Mon, Aug 24, 2020 at 9:00 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>    Oh yes, it could happen with Nan.
>
>    KSPGMRESClassicalGramSchmidtOrthogonalization()
> calls  KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and
> set ksp->convergedreason  but the call to MAXPY() is still made before
> returning and hence producing the error message.
>
>    We should circuit the orthogonalization as soon as it sees a Nan/Inf
> and return immediately for GMRES to cleanup and produce a very useful error
> message.
>
>   Alfredo,
>
>     It is also possible that the hypre preconditioners are producing a Nan
> because your matrix is too difficult for them to handle, but it would be
> odd to happen after many iterations.
>
>    As I suggested before run with -pc_type bjacobi to see if you get the
> same problem.
>
>   Barry
>
>
> On Aug 24, 2020, at 6:38 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>    Alfredo,
>>
>>       This should never happen. The input to the VecMAXPY in gmres is
>> computed via VMDot which produces the same result on all processes.
>>
>>        If you run with -pc_type bjacobi does it also happen?
>>
>>        Is this your custom code or does it happen in PETSc examples also?
>> Like src/snes/tutorials/ex19 -da_refine 5
>>
>>       Could be memory corruption, can you run under valgrind?
>>
>
> Couldn't it happen if something generates a NaN? That also should not
> happen, but I was allowing that pilut might do it.
>
>   Thanks,
>
>     Matt
>
>
>>     Barry
>>
>>
>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo <
>> ajaramillopalma at gmail.com> wrote:
>> >
>> > Dear PETSc developers,
>> >
>> > I'm trying to solve a linear problem with GMRES preconditioned with
>> pilut from HYPRE. For this I'm using the options:
>> >
>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor
>> >
>> > If I use a single core, GMRES (+ pilut or euclid) converges. However,
>> when using multiple cores the next error appears after some number of
>> iterations:
>> >
>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
>> >
>> > relative to the function VecMAXPY. I attached a screenshot with more
>> detailed output. The same happens when using euclid. Can you please give me
>> some insight on this?
>> >
>> > best regards
>> > Alfredo
>> > <Screenshot from 2020-08-24 17-57-52.png>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200824/a96ff865/attachment-0001.html>


More information about the petsc-users mailing list