[petsc-users] error when solving a linear system with gmres + pilut/euclid

Alfredo Jaramillo ajaramillopalma at gmail.com
Tue Aug 25 08:03:59 CDT 2020


Yes, Barry, that is correct.



On Tue, Aug 25, 2020 at 1:02 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>   On one system you get this error, on another system with the identical
> code and test case you do not get the error?
>
>   You get it with three iterative methods but not with MUMPS?
>
> Barry
>
>
> On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo <ajaramillopalma at gmail.com>
> wrote:
>
> Hello Barry, Matthew, thanks for the replies !
>
> Yes, it is our custom code, and it also happens when setting -pc_type
> bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type
> preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues.
>
> Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any
> problem.
>
> To reproduce the situation on my computer, I was able to reproduce the
> error for a small case and -pc_type bjacobi. For that particular case, when
> running in the cluster the error appears at the very last iteration:
>
> =====
> 27 KSP Residual norm 8.230378644666e-06
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
> ====
>
> whereas running on my computer the error is not launched and convergence
> is reached instead:
>
> ====
> Linear interp_ solve converged due to CONVERGED_RTOL iterations 27
> ====
>
> I will run valgrind to seek for possible memory corruptions.
>
> thank you
> Alfredo
>
> On Mon, Aug 24, 2020 at 9:00 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>    Oh yes, it could happen with Nan.
>>
>>    KSPGMRESClassicalGramSchmidtOrthogonalization()
>> calls  KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and
>> set ksp->convergedreason  but the call to MAXPY() is still made before
>> returning and hence producing the error message.
>>
>>    We should circuit the orthogonalization as soon as it sees a Nan/Inf
>> and return immediately for GMRES to cleanup and produce a very useful error
>> message.
>>
>>   Alfredo,
>>
>>     It is also possible that the hypre preconditioners are producing a
>> Nan because your matrix is too difficult for them to handle, but it would
>> be odd to happen after many iterations.
>>
>>    As I suggested before run with -pc_type bjacobi to see if you get the
>> same problem.
>>
>>   Barry
>>
>>
>> On Aug 24, 2020, at 6:38 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>    Alfredo,
>>>
>>>       This should never happen. The input to the VecMAXPY in gmres is
>>> computed via VMDot which produces the same result on all processes.
>>>
>>>        If you run with -pc_type bjacobi does it also happen?
>>>
>>>        Is this your custom code or does it happen in PETSc examples
>>> also? Like src/snes/tutorials/ex19 -da_refine 5
>>>
>>>       Could be memory corruption, can you run under valgrind?
>>>
>>
>> Couldn't it happen if something generates a NaN? That also should not
>> happen, but I was allowing that pilut might do it.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>>     Barry
>>>
>>>
>>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo <
>>> ajaramillopalma at gmail.com> wrote:
>>> >
>>> > Dear PETSc developers,
>>> >
>>> > I'm trying to solve a linear problem with GMRES preconditioned with
>>> pilut from HYPRE. For this I'm using the options:
>>> >
>>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor
>>> >
>>> > If I use a single core, GMRES (+ pilut or euclid) converges. However,
>>> when using multiple cores the next error appears after some number of
>>> iterations:
>>> >
>>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument #
>>> 3
>>> >
>>> > relative to the function VecMAXPY. I attached a screenshot with more
>>> detailed output. The same happens when using euclid. Can you please give me
>>> some insight on this?
>>> >
>>> > best regards
>>> > Alfredo
>>> > <Screenshot from 2020-08-24 17-57-52.png>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200825/9f9737fc/attachment.html>


More information about the petsc-users mailing list