[petsc-users] Reaching limit number of communicator with Spectrum MPI

Matthew Knepley knepley at gmail.com
Thu Aug 19 14:14:37 CDT 2021


On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu> wrote:

> Hi Jed,
>
> In my case, I only have 2 hypre preconditioners at the same time, and
> they do not solve simultaneously, so it might not be case 1.
>
> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on
> my own machine (with OpenMPI), all the communicators are freed from my
> observation. I could not test it with Spectrum MPI on the clusters
> immediately because all the dependencies were built in release mode.
> However, as I mentioned, I haven't had this problem with OpenMPI before,
> so I'm not sure if this is really an MPI implementation problem, or just
> because Spectrum MPI has less limit for the number of communicators,
> and/or this also depends on how many MPI ranks are used, as only 2 out
> of 40 ranks reported the error.
>
> As a workaround, I replaced the MPI_Comm_dup() at
> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also
> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine
> with Spectrum MPI now, but I don't think this is a long-term solution.
>

If that runs, then it is definitely an MPI implementation problem.

  Thanks,

     Matt


> Thanks!
>
> Feimi
>
> On 8/19/21 9:01 AM, Jed Brown wrote:
> > Junchao Zhang <junchao.zhang at gmail.com> writes:
> >
> >> Hi, Feimi,
> >>    I need to consult Jed (cc'ed).
> >>    Jed, is this an example of
> >>
> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
> ?
> >> If Feimi really can not free matrices, then we just need to attach a
> >> hypre-comm to a petsc inner comm, and pass that to hypre.
> > Are there a bunch of solves as in that case?
> >
> > My understanding is that one should be able to
> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
> implementation has limits on how many communicators can co-exist at any one
> time. The many-at-once is what we encountered in that 2018 thread.
> >
> > One way to check would be to use a debugger or tracer to examine the
> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
> >
> > case 1: we'll find lots of dups without frees (until the end) because
> the user really wants lots of these existing at the same time.
> >
> > case 2: dups are unfreed because of reference counting issue/inessential
> references
> >
> >
> > In case 1, I think the solution is as outlined in the thread, PETSc can
> create an inner-comm for Hypre. I think I'd prefer to attach it to the
> outer comm instead of the PETSc inner comm, but perhaps a case could be
> made either way.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210819/bc97dd16/attachment.html>


More information about the petsc-users mailing list