[petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran?

Stefano Zampini stefano.zampini at gmail.com
Fri Nov 1 06:16:59 CDT 2019


From src/sys/objects/ftn-custom/zstart.c petscinitialize_internal

PETSC_COMM_WORLD = MPI_COMM_WORLD

Which means that PETSC_COMM_WORLD is not a PETSc communicator.

The first matrix creation duplicates the PETSC_COMM_WORLD and thus can be reused for the other objects
When you finally destroy the matrix inside the loop, the ref count of this duplicated comm goes to zero and it is free
This is why you duplicate at each step

However, the C version of PetscInitialize does the same, so I’m not sure why this happens with Fortran and not with C. (Do you leak objects in the C code?)


> On Nov 1, 2019, at 1:41 PM, Patrick Sanan via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Context: I'm trying to track down an error that (only) arises when running a Fortran 90 code, using PETSc, on a new cluster. The code creates and destroys a linear system (Mat,Vec, and KSP) at each of (many) timesteps. The error message from a user looks like this, which leads me to suspect that MPI_Comm_dup() is being called many times and this is eventually a problem for this particular MPI implementation (Open MPI 2.1.0):
> 
> [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup
> [lo-a2-058:21425] *** reported by process [4222287873,2]
> [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533
> [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error
> [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [lo-a2-058:21425] ***    and potentially your MPI job)
> 
> Question: I remember some discussion recently (but can't find the thread) about not calling MPI_Comm_dup() too many times from PetscCommDuplicate(), which would allow one to safely use the (admittedly not optimal) approach used in this application code. Is that a correct understanding and would the fixes made in that context also apply to Fortran? I don't fully understand the details of the MPI techniques used, so thought I'd ask here. 
> 
> If I hack a simple build-solve-destroy example to run several loops, I see a notable difference between C and Fortran examples. With the attached ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP tutorials examples ex23.c and ex21f.F90, respectively, I see the following. Note that in the Fortran case, it appears that communicators are actually duplicated in each loop, but in the C case, this only happens in the first loop:
> 
> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep PetscCommDuplicate
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> 
> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep PetscCommDuplicate
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374784
> 
> 
> 
> <ex221f.F90><ex223.c>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191101/d5a5aa53/attachment.html>


More information about the petsc-users mailing list