[petsc-users] Reaching limit number of communicator with Spectrum MPI

Feimi Yu yuf2 at rpi.edu
Fri Aug 20 14:02:30 CDT 2021


Sorry, I forgot to destroy the matrix after the loop, but anyway, the 
in-loop preconditioners are destroyed. Updated the code here and the 
google drive.

Feimi

On 8/20/21 2:54 PM, Feimi Yu wrote:
>
> Hi Barry and Junchao,
>
> Actually I did a simple MPI "dup and free" test before with Spectrum 
> MPI, but that one did not have any problem. I'm not a PETSc programmer 
> as I mainly use deal.ii's PETSc wrappers, but I managed to write a 
> minimal program based on petsc/src/mat/tests/ex98.c to reproduce my 
> problem. This piece of code creates and destroys 10,000 instances of 
> Hypre Parasail preconditioners (for my own code, it uses Euclid, but I 
> don't think it matters). It runs fine with OpenMPI but reports the out 
> of communicator error with Sepctrum MPI. The code is attached in the 
> email. In case the attachment is not available, I also uploaded a copy 
> on my google drive:
>
> https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing
>
> Thanks!
>
> Feimi
>
> On 8/20/21 9:58 AM, Junchao Zhang wrote:
>> Feimi, if it is easy to reproduce, could you give instructions on how 
>> to reproduce that?
>>
>> PS: Spectrum MPI is based on OpenMPI.  I don't understand why it has 
>> the problem but OpenMPI does not.  It could be a bug in petsc or 
>> user's code.  For reference counting on MPI_Comm, we already have 
>> petsc inner comm. I think we can reuse that.
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <bsmith at petsc.dev 
>> <mailto:bsmith at petsc.dev>> wrote:
>>
>>
>>       It sounds like maybe the Spectrum MPI_Comm_free() is not
>>     returning the comm to the "pool" as available for future use; a
>>     very buggy MPI implementation. This can easily be checked in a
>>     tiny standalone MPI program that simply comm dups and frees
>>     thousands of times in a loop. Could even be a configure test
>>     (that requires running an MPI program). I do not remember if we
>>     ever tested this possibility; maybe and I forgot.
>>
>>       If this is the problem we can provide a "work around" that
>>     attributes the new comm (to be passed to hypre) to the old comm
>>     with a reference count value also in the attribute. When the
>>     hypre matrix is created that count is (with the new comm) is set
>>     to 1, when the hypre matrix is freed that count is set to zero
>>     (but the comm is not freed), in the next call to create the hypre
>>     matrix when the attribute is found, the count is zero so PETSc
>>     knows it can pass the same comm again to the new hypre matrix.
>>
>>     This will only allow one simultaneous hypre matrix to be created
>>     from the original comm. To allow multiply simultaneous hypre
>>     matrix one could have multiple comms and counts in the attribute
>>     and just check them until one finds an available one to reuse (or
>>     creates yet another one if all the current ones are busy with
>>     hypre matrices). So it is the same model as DMGetXXVector() where
>>     vectors are checked out and then checked in to be available
>>     later. This would solve the currently reported problem (if it is
>>     a buggy MPI that does not properly free comms), but not solve the
>>     MOOSE problem where 10,000 comms are needed at the same time.
>>
>>       Barry
>>
>>
>>
>>
>>
>>>     On Aug 19, 2021, at 3:29 PM, Junchao Zhang
>>>     <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>>
>>>
>>>
>>>
>>>     On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu
>>>     <mailto:yuf2 at rpi.edu>> wrote:
>>>
>>>         Hi Jed,
>>>
>>>         In my case, I only have 2 hypre preconditioners at the same
>>>         time, and
>>>         they do not solve simultaneously, so it might not be case 1.
>>>
>>>         I checked the stack for all the calls of
>>>         MPI_Comm_dup/MPI_Comm_free on
>>>         my own machine (with OpenMPI), all the communicators are
>>>         freed from my
>>>         observation. I could not test it with Spectrum MPI on the
>>>         clusters
>>>         immediately because all the dependencies were built in
>>>         release mode.
>>>         However, as I mentioned, I haven't had this problem with
>>>         OpenMPI before,
>>>         so I'm not sure if this is really an MPI implementation
>>>         problem, or just
>>>         because Spectrum MPI has less limit for the number of
>>>         communicators,
>>>         and/or this also depends on how many MPI ranks are used, as
>>>         only 2 out
>>>         of 40 ranks reported the error.
>>>
>>>     You can add printf around MPI_Comm_dup/MPI_Comm_free sites on
>>>     the two ranks, e.g., if (myrank == 38) printf(...), to see if
>>>     the dup/free are paired.
>>>      As a workaround, I replaced the MPI_Comm_dup() at
>>>
>>>         petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy
>>>         assignment, and also
>>>         removed the MPI_Comm_free() in the hypre destroyer. My code
>>>         runs fine
>>>         with Spectrum MPI now, but I don't think this is a long-term
>>>         solution.
>>>
>>>         Thanks!
>>>
>>>         Feimi
>>>
>>>         On 8/19/21 9:01 AM, Jed Brown wrote:
>>>         > Junchao Zhang <junchao.zhang at gmail.com
>>>         <mailto:junchao.zhang at gmail.com>> writes:
>>>         >
>>>         >> Hi, Feimi,
>>>         >>    I need to consult Jed (cc'ed).
>>>         >>    Jed, is this an example of
>>>         >>
>>>         https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
>>>         <https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663>?
>>>         >> If Feimi really can not free matrices, then we just need
>>>         to attach a
>>>         >> hypre-comm to a petsc inner comm, and pass that to hypre.
>>>         > Are there a bunch of solves as in that case?
>>>         >
>>>         > My understanding is that one should be able to
>>>         MPI_Comm_dup/MPI_Comm_free as many times as you like, but
>>>         the implementation has limits on how many communicators can
>>>         co-exist at any one time. The many-at-once is what we
>>>         encountered in that 2018 thread.
>>>         >
>>>         > One way to check would be to use a debugger or tracer to
>>>         examine the stack every time (P)MPI_Comm_dup and
>>>         (P)MPI_Comm_free are called.
>>>         >
>>>         > case 1: we'll find lots of dups without frees (until the
>>>         end) because the user really wants lots of these existing at
>>>         the same time.
>>>         >
>>>         > case 2: dups are unfreed because of reference counting
>>>         issue/inessential references
>>>         >
>>>         >
>>>         > In case 1, I think the solution is as outlined in the
>>>         thread, PETSc can create an inner-comm for Hypre. I think
>>>         I'd prefer to attach it to the outer comm instead of the
>>>         PETSc inner comm, but perhaps a case could be made either way.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/4a78aca1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hypre_precon_test.cpp
Type: text/x-c++src
Size: 3545 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/4a78aca1/attachment.bin>


More information about the petsc-users mailing list