[petsc-users] MPI linear solver reproducibility question
Mark McClure
mark at resfrac.com
Sat Apr 1 23:31:20 CDT 2023
Thank you, I will try BCGSL.
And good to know that this is worth pursuing, and that it is possible. Step
1, I guess I should upgrade to the latest release on Petsc.
How can I make sure that I am "using an MPI that follows the suggestion for
implementers about determinism"? I am using MPICH version 3.3a2.
I am pretty sure that I'm assembling the same matrix every time, but I'm
not sure how it would depend on 'how you do the communication'. Each
process is doing a series of MatSetValues with INSERT_VALUES,
assembling the matrix by rows. My understanding of this process is that
it'd be deterministic.
On Sat, Apr 1, 2023 at 9:05 PM Jed Brown <jed at jedbrown.org> wrote:
> If you use unpreconditioned BCGS and ensure that you assemble the same
> matrix (depends how you do the communication for that), I think you'll get
> bitwise reproducible results when using an MPI that follows the suggestion
> for implementers about determinism. Beyond that, it'll depend somewhat on
> the preconditioner.
>
> If you like BCGS, you may want to try BCGSL, which has a longer memory and
> tends to be more robust. But preconditioning is usually critical and the
> place to devote most effort.
>
> Mark McClure <mark at resfrac.com> writes:
>
> > Hello,
> >
> > I have been a user of Petsc for quite a few years, though I haven't
> updated
> > my version in a few years, so it's possible that my comments below could
> be
> > 'out of date'.
> >
> > Several years ago, I'd asked you guys about reproducibility. I observed
> > that if I gave an identical matrix to the Petsc linear solver, I would
> get
> > a bit-wise identical result back if running on one processor, but if I
> ran
> > with MPI, I would see differences at the final sig figs, below the
> > convergence criterion. Even if rerunning the same exact calculation on
> the
> > same exact machine.
> >
> > Ie, with repeated tests, it was always converging to the same answer
> > 'within convergence tolerance', but not consistent in the sig figs beyond
> > the convergence tolerance.
> >
> > At the time, the response that this was unavoidable, and related to the
> > issue that machine arithmetic is not commutative, and so the timing of
> when
> > processors were recombining information (which was random, effectively a
> > race condition) was causing these differences.
> >
> > Am I remembering correctly? And, if so, is this still a property of the
> > Petsc linear solver with MPI, and is there now any option available to
> > resolve it? I would be willing to accept a performance hit in order to
> get
> > guaranteed bitwise consistency, even when running with MPI.
> >
> > I am using the solver KSPBCGS, without a preconditioner. This is the
> > selection because several years ago, I did testing, and found that on the
> > particular linear systems that I am usually working with, this solver
> (with
> > no preconditioner) was the most robust, in terms of consistently
> > converging, and in terms of performance. Actually, I also tested a
> variety
> > of other linear solvers other than Petsc (including other implementations
> > of BiCGStab), and found that the Petsc BCGS was the best performer.
> Though,
> > I'm curious, have there been updates to that algorithm in recent years,
> > where I should consider updating to a newer Petsc build and comparing?
> >
> > Best regards,
> > Mark McClure
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230401/5fe15d32/attachment.html>
More information about the petsc-users
mailing list