[petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran?

Patrick Sanan patrick.sanan at gmail.com
Fri Nov 1 05:41:09 CDT 2019


*Context:* I'm trying to track down an error that (only) arises when
running a Fortran 90 code, using PETSc, on a new cluster. The code creates
and destroys a linear system (Mat,Vec, and KSP) at each of (many)
timesteps. The error message from a user looks like this, which leads me to
suspect that MPI_Comm_dup() is being called many times and this is
eventually a problem for this particular MPI implementation (Open MPI
2.1.0):


[lo-a2-058:21425] *** An error occurred in MPI_Comm_dup
[lo-a2-058:21425] *** reported by process [4222287873,2]
[lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533
[lo-a2-058:21425] *** MPI_ERR_INTERN: internal error
[lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[lo-a2-058:21425] ***    and potentially your MPI job)

*Question: *I remember some discussion recently (but can't find the thread)
about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(),
which would allow one to safely use the (admittedly not optimal) approach
used in this application code. Is that a correct understanding and would
the fixes made in that context also apply to Fortran? I don't fully
understand the details of the MPI techniques used, so thought I'd ask here.

If I hack a simple build-solve-destroy example to run several loops, I see
a notable difference between C and Fortran examples. With the attached
ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP
tutorials examples ex23.c and ex21f.F90, respectively, I see the following.
Note that in the Fortran case, it appears that communicators are actually
duplicated in each loop, but in the C case, this only happens in the first
loop:

[(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep
PetscCommDuplicate
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
max tags = 268435455
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784

[(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep
PetscCommDuplicate
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
max tags = 268435455
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
max tags = 268435455
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
max tags = 268435455
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
max tags = 268435455
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
max tags = 268435455
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
-2080374784
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191101/8eb48d50/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex221f.F90
Type: application/octet-stream
Size: 10705 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191101/8eb48d50/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex223.c
Type: application/octet-stream
Size: 7641 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191101/8eb48d50/attachment-0003.obj>


More information about the petsc-users mailing list