[petsc-users] Reaching limit number of communicator with Spectrum MPI

Feimi Yu yuf2 at rpi.edu
Thu Aug 19 14:08:00 CDT 2021


Hi Jed,

In my case, I only have 2 hypre preconditioners at the same time, and 
they do not solve simultaneously, so it might not be case 1.

I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on 
my own machine (with OpenMPI), all the communicators are freed from my 
observation. I could not test it with Spectrum MPI on the clusters 
immediately because all the dependencies were built in release mode. 
However, as I mentioned, I haven't had this problem with OpenMPI before, 
so I'm not sure if this is really an MPI implementation problem, or just 
because Spectrum MPI has less limit for the number of communicators, 
and/or this also depends on how many MPI ranks are used, as only 2 out 
of 40 ranks reported the error.

As a workaround, I replaced the MPI_Comm_dup() at 
petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also 
removed the MPI_Comm_free() in the hypre destroyer. My code runs fine 
with Spectrum MPI now, but I don't think this is a long-term solution.

Thanks!

Feimi

On 8/19/21 9:01 AM, Jed Brown wrote:
> Junchao Zhang <junchao.zhang at gmail.com> writes:
>
>> Hi, Feimi,
>>    I need to consult Jed (cc'ed).
>>    Jed, is this an example of
>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663?
>> If Feimi really can not free matrices, then we just need to attach a
>> hypre-comm to a petsc inner comm, and pass that to hypre.
> Are there a bunch of solves as in that case?
>
> My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread.
>
> One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
>
> case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time.
>
> case 2: dups are unfreed because of reference counting issue/inessential references
>
>
> In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way.


More information about the petsc-users mailing list