[petsc-users] MPI linear solver reproducibility question
Jed Brown
jed at jedbrown.org
Sun Apr 2 08:31:07 CDT 2023
Vector communication used a different code path in 3.13. If you have a reproducer with current PETSc, I'll have a look. Here's a demo that the solution is bitwise identical (the sha256sum is the same every time you run it, though it might be different on your computer from mine due to compiler version and flags).
$ mpiexec -n 8 ompi/tests/snes/tutorials/ex5 -da_refine 3 -snes_monitor -snes_view_solution binary && sha256sum binaryoutput
0 SNES Function norm 1.265943996096e+00
1 SNES Function norm 2.831564838232e-02
2 SNES Function norm 4.456686729809e-04
3 SNES Function norm 1.206531765776e-07
4 SNES Function norm 1.740255643596e-12
5410f84e91a9db3a74a2ac33603bbbb1fb48e7eaf739614192cfd53344517986 binaryoutput
Mark McClure <mark at resfrac.com> writes:
> In the typical FD implementation, you only set local rows, but with FE and
> sometimes FV, you also create values that need to be communicated and
> summed on other processors.
> Makes sense.
>
> Anyway, in this case, I am certain that I am giving the solver bitwise
> identical matrices from each process. I am not using a preconditioner,
> using BCGS, with Petsc version 3.13.3.
>
> So then, how can I make sure that I am "using an MPI that follows the
> suggestion for implementers about determinism"? I am using MPICH version
> 3.3a2, didn't do anything special when installing it. Does that sound OK?
> If so, I could upgrade to the latest Petsc, try again, and if confirmed
> that it persists, could provide a reproduction scenario.
>
>
>
> On Sat, Apr 1, 2023 at 9:53 PM Jed Brown <jed at jedbrown.org> wrote:
>
>> Mark McClure <mark at resfrac.com> writes:
>>
>> > Thank you, I will try BCGSL.
>> >
>> > And good to know that this is worth pursuing, and that it is possible.
>> Step
>> > 1, I guess I should upgrade to the latest release on Petsc.
>> >
>> > How can I make sure that I am "using an MPI that follows the suggestion
>> for
>> > implementers about determinism"? I am using MPICH version 3.3a2.
>> >
>> > I am pretty sure that I'm assembling the same matrix every time, but I'm
>> > not sure how it would depend on 'how you do the communication'. Each
>> > process is doing a series of MatSetValues with INSERT_VALUES,
>> > assembling the matrix by rows. My understanding of this process is that
>> > it'd be deterministic.
>>
>> In the typical FD implementation, you only set local rows, but with FE and
>> sometimes FV, you also create values that need to be communicated and
>> summed on other processors.
>>
More information about the petsc-users
mailing list