[petsc-users] sources of floating point randomness in JFNK in serial

Matthew Knepley knepley at gmail.com
Thu May 4 09:10:42 CDT 2023


On Thu, May 4, 2023 at 8:54 AM Mark Lohry <mlohry at gmail.com> wrote:

> Try -pc_type none.
>>
>
> With -pc_type none the 0 KSP residual looks identical. But *sometimes*
> it's producing exactly the same history and others it's gradually
> changing.  I'm reasonably confident my residual evaluation has no
> randomness, see info after the petsc output.
>

We can try and test this. Replace your MatMFFD with an actual matrix and
run. Do you see any variability?

If not, then it could be your routine, or it could be MatMFFD. So run a few
with -snes_view, and we can see if the
"w" parameter changes.

  Thanks,

     Matt


> solve history 1:
>
>   0 SNES Function norm 3.424003312857e+04
>     0 KSP Residual norm 3.424003312857e+04
>     1 KSP Residual norm 2.871734444536e+04
>     2 KSP Residual norm 2.490276931041e+04
> ...
>    20 KSP Residual norm 7.449686034356e+03
>   Linear solve converged due to CONVERGED_ITS iterations 20
>   1 SNES Function norm 1.085015821006e+04
>
> solve history 2, identical to 1:
>
>   0 SNES Function norm 3.424003312857e+04
>     0 KSP Residual norm 3.424003312857e+04
>     1 KSP Residual norm 2.871734444536e+04
>     2 KSP Residual norm 2.490276931041e+04
> ...
>    20 KSP Residual norm 7.449686034356e+03
>   Linear solve converged due to CONVERGED_ITS iterations 20
>   1 SNES Function norm 1.085015821006e+04
>
> solve history 3, identical KSP at 0 and 1, slight change at 2, growing
> difference to the end:
>   0 SNES Function norm 3.424003312857e+04
>     0 KSP Residual norm 3.424003312857e+04
>     1 KSP Residual norm 2.871734444536e+04
>     2 KSP Residual norm 2.490276930242e+04
> ...
>  20 KSP Residual norm 7.449686095424e+03
>   Linear solve converged due to CONVERGED_ITS iterations 20
>   1 SNES Function norm 1.085015646971e+04
>
>
> Ths is using a standard explicit 3-stage Runge-Kutta smoother for 10
> iterations, so 30 calls of the same residual evaluation, identical
> residuals every time
>
> run 1:
>
> # iteration            rho                 rhou                rhov
>          rhoE                abs_res             rel_res             umin
>              vmax                vmin                elapsed_time
> #
>
>
>           1.00000e+00  1.086860616292e+00  2.782316758416e+02
>  4.482867643761e+00  2.993435920340e+02         2.04353e+02
> 1.00000e+00        -8.23945e-15        -6.15326e-15        -1.35563e-14
>     6.34834e-01
>           2.00000e+00  2.310547487017e+00  1.079059352425e+02
>  3.958323921837e+00  5.058927165686e+02         2.58647e+02
> 1.26568e+00        -1.02539e-14        -9.35368e-15        -1.69925e-14
>     6.40063e-01
>           3.00000e+00  2.361005867444e+00  5.706213331683e+01
>  6.130016323357e+00  4.688968362579e+02         2.36201e+02
> 1.15585e+00        -1.19370e-14        -1.15216e-14        -1.59733e-14
>     6.45166e-01
>           4.00000e+00  2.167518999963e+00  3.757541401594e+01
>  6.313917437428e+00  4.054310291628e+02         2.03612e+02
> 9.96372e-01        -1.81831e-14        -1.28312e-14        -1.46238e-14
>     6.50494e-01
>           5.00000e+00  1.941443738676e+00  2.884190334049e+01
>  6.237106158479e+00  3.539201037156e+02         1.77577e+02
> 8.68970e-01         3.56633e-14        -8.74089e-15        -1.06666e-14
>     6.55656e-01
>           6.00000e+00  1.736947124693e+00  2.429485695670e+01
>  5.996962200407e+00  3.148280178142e+02         1.57913e+02
> 7.72745e-01        -8.98634e-14        -2.41152e-14        -1.39713e-14
>     6.60872e-01
>           7.00000e+00  1.564153212635e+00  2.149609219810e+01
>  5.786910705204e+00  2.848717011033e+02         1.42872e+02
> 6.99144e-01        -2.95352e-13        -2.48158e-14        -2.39351e-14
>     6.66041e-01
>           8.00000e+00  1.419280815384e+00  1.950619804089e+01
>  5.627281158306e+00  2.606623371229e+02         1.30728e+02
> 6.39715e-01         8.98941e-13         1.09674e-13         3.78905e-14
>     6.71316e-01
>           9.00000e+00  1.296115915975e+00  1.794843530745e+01
>  5.514933264437e+00  2.401524522393e+02         1.20444e+02
> 5.89394e-01         1.70717e-12         1.38762e-14         1.09825e-13
>     6.76447e-01
>           1.00000e+01  1.189639693918e+00  1.665381754953e+01
>  5.433183087037e+00  2.222572900473e+02         1.11475e+02
> 5.45501e-01        -4.22462e-12        -7.15206e-13        -2.28736e-13
>     6.81716e-01
>
> run N:
>
>
> #
>
>
> # iteration            rho                 rhou                rhov
>          rhoE                abs_res             rel_res             umin
>              vmax                vmin                elapsed_time
> #
>
>
>           1.00000e+00  1.086860616292e+00  2.782316758416e+02
>  4.482867643761e+00  2.993435920340e+02         2.04353e+02
> 1.00000e+00        -8.23945e-15        -6.15326e-15        -1.35563e-14
>     6.23316e-01
>           2.00000e+00  2.310547487017e+00  1.079059352425e+02
>  3.958323921837e+00  5.058927165686e+02         2.58647e+02
> 1.26568e+00        -1.02539e-14        -9.35368e-15        -1.69925e-14
>     6.28510e-01
>           3.00000e+00  2.361005867444e+00  5.706213331683e+01
>  6.130016323357e+00  4.688968362579e+02         2.36201e+02
> 1.15585e+00        -1.19370e-14        -1.15216e-14        -1.59733e-14
>     6.33558e-01
>           4.00000e+00  2.167518999963e+00  3.757541401594e+01
>  6.313917437428e+00  4.054310291628e+02         2.03612e+02
> 9.96372e-01        -1.81831e-14        -1.28312e-14        -1.46238e-14
>     6.38773e-01
>           5.00000e+00  1.941443738676e+00  2.884190334049e+01
>  6.237106158479e+00  3.539201037156e+02         1.77577e+02
> 8.68970e-01         3.56633e-14        -8.74089e-15        -1.06666e-14
>     6.43887e-01
>           6.00000e+00  1.736947124693e+00  2.429485695670e+01
>  5.996962200407e+00  3.148280178142e+02         1.57913e+02
> 7.72745e-01        -8.98634e-14        -2.41152e-14        -1.39713e-14
>     6.49073e-01
>           7.00000e+00  1.564153212635e+00  2.149609219810e+01
>  5.786910705204e+00  2.848717011033e+02         1.42872e+02
> 6.99144e-01        -2.95352e-13        -2.48158e-14        -2.39351e-14
>     6.54167e-01
>           8.00000e+00  1.419280815384e+00  1.950619804089e+01
>  5.627281158306e+00  2.606623371229e+02         1.30728e+02
> 6.39715e-01         8.98941e-13         1.09674e-13         3.78905e-14
>     6.59394e-01
>           9.00000e+00  1.296115915975e+00  1.794843530745e+01
>  5.514933264437e+00  2.401524522393e+02         1.20444e+02
> 5.89394e-01         1.70717e-12         1.38762e-14         1.09825e-13
>     6.64516e-01
>           1.00000e+01  1.189639693918e+00  1.665381754953e+01
>  5.433183087037e+00  2.222572900473e+02         1.11475e+02
> 5.45501e-01        -4.22462e-12        -7.15206e-13        -2.28736e-13
>     6.69677e-01
>
>
>
>
>
> On Thu, May 4, 2023 at 8:41 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> ASM is just the sub PC with one proc but gets weaker with more procs
>> unless you use jacobi. (maybe I am missing something).
>>
>> On Thu, May 4, 2023 at 8:31 AM Mark Lohry <mlohry at gmail.com> wrote:
>>
>>>  Please send the output of -snes_view.
>>>>
>>> pasted below. anything stand out?
>>>
>>>
>>> SNES Object: 1 MPI process
>>>   type: newtonls
>>>   maximum iterations=1, maximum function evaluations=-1
>>>   tolerances: relative=0.1, absolute=1e-15, solution=1e-15
>>>   total number of linear solver iterations=20
>>>   total number of function evaluations=22
>>>   norm schedule ALWAYS
>>>   Jacobian is never rebuilt
>>>   Jacobian is applied matrix-free with differencing
>>>   Preconditioning Jacobian is built using finite differences with
>>> coloring
>>>   SNESLineSearch Object: 1 MPI process
>>>     type: basic
>>>     maxstep=1.000000e+08, minlambda=1.000000e-12
>>>     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>> lambda=1.000000e-08
>>>     maximum iterations=40
>>>   KSP Object: 1 MPI process
>>>     type: gmres
>>>       restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>       happy breakdown tolerance 1e-30
>>>     maximum iterations=20, initial guess is zero
>>>     tolerances:  relative=0.1, absolute=1e-15, divergence=10.
>>>     left preconditioning
>>>     using PRECONDITIONED norm type for convergence test
>>>   PC Object: 1 MPI process
>>>     type: asm
>>>       total subdomain blocks = 1, amount of overlap = 0
>>>       restriction/interpolation type - RESTRICT
>>>       Local solver information for first block is in the following KSP
>>> and PC objects on rank 0:
>>>       Use -ksp_view ::ascii_info_detail to display information for all
>>> blocks
>>>     KSP Object: (sub_) 1 MPI process
>>>       type: preonly
>>>       maximum iterations=10000, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object: (sub_) 1 MPI process
>>>       type: ilu
>>>         out-of-place factorization
>>>         0 levels of fill
>>>         tolerance for zero pivot 2.22045e-14
>>>         matrix ordering: natural
>>>         factor fill ratio given 1., needed 1.
>>>           Factored matrix follows:
>>>             Mat Object: (sub_) 1 MPI process
>>>               type: seqbaij
>>>               rows=16384, cols=16384, bs=16
>>>               package used to perform factorization: petsc
>>>               total: nonzeros=1277952, allocated nonzeros=1277952
>>>                   block size is 16
>>>       linear system matrix = precond matrix:
>>>       Mat Object: (sub_) 1 MPI process
>>>         type: seqbaij
>>>         rows=16384, cols=16384, bs=16
>>>         total: nonzeros=1277952, allocated nonzeros=1277952
>>>         total number of mallocs used during MatSetValues calls=0
>>>             block size is 16
>>>     linear system matrix followed by preconditioner matrix:
>>>     Mat Object: 1 MPI process
>>>       type: mffd
>>>       rows=16384, cols=16384
>>>         Matrix-free approximation:
>>>           err=1.49012e-08 (relative error in function evaluation)
>>>           Using wp compute h routine
>>>               Does not compute normU
>>>     Mat Object: 1 MPI process
>>>       type: seqbaij
>>>       rows=16384, cols=16384, bs=16
>>>       total: nonzeros=1277952, allocated nonzeros=1277952
>>>       total number of mallocs used during MatSetValues calls=0
>>>           block size is 16
>>>
>>> On Thu, May 4, 2023 at 8:30 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> If you are using MG what is the coarse grid solver?
>>>> -snes_view might give you that.
>>>>
>>>> On Thu, May 4, 2023 at 8:25 AM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, May 4, 2023 at 8:21 AM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>
>>>>>> Do they start very similarly and then slowly drift further apart?
>>>>>>
>>>>>>
>>>>>> Yes, this. I take it this sounds familiar?
>>>>>>
>>>>>> See these two examples with 20 fixed iterations pasted at the end.
>>>>>> The difference for one solve is slight (final SNES norm is identical to 5
>>>>>> digits), but in the context I'm using it in (repeated applications to solve
>>>>>> a steady state multigrid problem, though here just one level) the
>>>>>> differences add up such that I might reach global convergence in 35
>>>>>> iterations or 38. It's not the end of the world, but I was expecting that
>>>>>> with -np 1 these would be identical and I'm not sure where the root cause
>>>>>> would be.
>>>>>>
>>>>>
>>>>> The initial KSP residual is different, so its the PC. Please send the
>>>>> output of -snes_view. If your ASM is using direct factorization, then it
>>>>> could be randomness in whatever LU you are using.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>     Matt
>>>>>
>>>>>
>>>>>>   0 SNES Function norm 2.801842107848e+04
>>>>>>     0 KSP Residual norm 4.045639499595e+01
>>>>>>     1 KSP Residual norm 1.917999809040e+01
>>>>>>     2 KSP Residual norm 1.616048521958e+01
>>>>>> [...]
>>>>>>    19 KSP Residual norm 8.788043518111e-01
>>>>>>    20 KSP Residual norm 6.570851270214e-01
>>>>>>   Linear solve converged due to CONVERGED_ITS iterations 20
>>>>>>   1 SNES Function norm 1.801309983345e+03
>>>>>> Nonlinear solve converged due to CONVERGED_ITS iterations 1
>>>>>>
>>>>>>
>>>>>> Same system, identical initial 0 SNES norm, 0 KSP is slightly
>>>>>> different
>>>>>>
>>>>>>   0 SNES Function norm 2.801842107848e+04
>>>>>>     0 KSP Residual norm 4.045639473002e+01
>>>>>>     1 KSP Residual norm 1.917999883034e+01
>>>>>>     2 KSP Residual norm 1.616048572016e+01
>>>>>> [...]
>>>>>>    19 KSP Residual norm 8.788046348957e-01
>>>>>>    20 KSP Residual norm 6.570859588610e-01
>>>>>>   Linear solve converged due to CONVERGED_ITS iterations 20
>>>>>>   1 SNES Function norm 1.801311320322e+03
>>>>>> Nonlinear solve converged due to CONVERGED_ITS iterations 1
>>>>>>
>>>>>> On Wed, May 3, 2023 at 11:05 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>
>>>>>>>
>>>>>>>   Do they start very similarly and then slowly drift further apart?
>>>>>>> That is the first couple of KSP iterations they are almost identical but
>>>>>>> then for each iteration get a bit further. Similar for the SNES iterations,
>>>>>>> starting close and then for more iterations and more solves they start
>>>>>>> moving apart. Or do they suddenly jump to be very different? You can run
>>>>>>> with -snes_monitor -ksp_monitor
>>>>>>>
>>>>>>> On May 3, 2023, at 9:07 PM, Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>>
>>>>>>> This is on a single MPI rank. I haven't checked the coloring, was
>>>>>>> just guessing there. But the solutions/residuals are slightly different
>>>>>>> from run to run.
>>>>>>>
>>>>>>> Fair to say that for serial JFNK/asm ilu0/gmres we should expect
>>>>>>> bitwise identical results?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 3, 2023, 8:50 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>   No, the coloring should be identical every time. Do you see
>>>>>>>> differences with 1 MPI rank? (Or much smaller ones?).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> > On May 3, 2023, at 8:42 PM, Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > I'm running multiple iterations of newtonls with an MFFD/JFNK
>>>>>>>> nonlinear solver where I give it the sparsity. PC asm, KSP gmres, with
>>>>>>>> SNESSetLagJacobian -2 (compute once and then frozen jacobian).
>>>>>>>> >
>>>>>>>> > I'm seeing slight (<1%) but nonzero differences in residuals from
>>>>>>>> run to run. I'm wondering where randomness might enter here -- does the
>>>>>>>> jacobian coloring use a random seed?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230504/5e2124dc/attachment-0001.html>


More information about the petsc-users mailing list