[petsc-users] Do the guards against calling MPI_Comm_dup() in PetscCommDuplicate() apply with Fortran?

Patrick Sanan patrick.sanan at gmail.com
Fri Nov 1 10:09:52 CDT 2019


I don't see those interfaces, either. If there was a reason that they're
non-trivial to implement, we should at least note on the man pages in
"Fortran Note:" sections that they don't exist.

In this particular instance, we can get by without those interfaces by just
creating and destroying the KSP once (the settings are constant), thus
hanging onto a reference that way.

I'll wait for our -info run to come back and will then confirm that this
fixes things. Thanks again, Stefano!

Am Fr., 1. Nov. 2019 um 12:49 Uhr schrieb Stefano Zampini <
stefano.zampini at gmail.com>:

> It seems we don’t have a fortran wrapper for PetscCommDuplicate (or at
> least I cannot find it) Is this an oversight?
>
> If we have a Fortran wrapper for PetscComm{Duplicate~Destroy}, the proper
> fix will be to call PetscCommDuplicate(PETSC_COMM_WORLD,&user_petsc_comm)
> after PetscInitalize and PetscCommDestroy(&user_petsc_comm) right before
> PetscFinalize is called in your app
>
> On Nov 1, 2019, at 2:45 PM, Patrick Sanan <patrick.sanan at gmail.com> wrote:
>
> Ah, really interesting! In the attached ex321f.F90, I create a dummy KSP
> before the loop, and indeed the behavior is as you say - no duplications
> <ex321f.F90>
>
> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex321f -info | grep
> PetscCommDuplicate
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
> -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
>
> I've asked the user to re-run with -info, so then I'll hopefully be able
> to see whether the duplication is happening as I expect (in which case your
> insight might provide at least a workaround), and to see if it's choosing a
> new communicator number each time, somehow.
>
> Am 01.11.2019 um 12:36 schrieb Stefano Zampini <stefano.zampini at gmail.com
> >:
>
> I know why your C code does not duplicate the comm at each step. This is
> because it uses PETSC_VIEWER_STDOUT_WORLD, which basically inserts the
> duplicated comm into PETSC_COMM_WORLD as attribute. Try removing the
> KSPView call and you will see the C code behaves as the Fortran one.
>
>
> On Nov 1, 2019, at 2:16 PM, Stefano Zampini <stefano.zampini at gmail.com>
> wrote:
>
> From src/sys/objects/ftn-custom/zstart.c petscinitialize_internal
>
> PETSC_COMM_WORLD = MPI_COMM_WORLD
>
> Which means that PETSC_COMM_WORLD is not a PETSc communicator.
>
> The first matrix creation duplicates the PETSC_COMM_WORLD and thus can be
> reused for the other objects
> When you finally destroy the matrix inside the loop, the ref count of this
> duplicated comm goes to zero and it is free
> This is why you duplicate at each step
>
> However, the C version of PetscInitialize does the same, so I’m not sure
> why this happens with Fortran and not with C. (Do you leak objects in the C
> code?)
>
>
> On Nov 1, 2019, at 1:41 PM, Patrick Sanan via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> *Context:* I'm trying to track down an error that (only) arises when
> running a Fortran 90 code, using PETSc, on a new cluster. The code creates
> and destroys a linear system (Mat,Vec, and KSP) at each of (many)
> timesteps. The error message from a user looks like this, which leads me to
> suspect that MPI_Comm_dup() is being called many times and this is
> eventually a problem for this particular MPI implementation (Open MPI
> 2.1.0):
>
>
> [lo-a2-058:21425] *** An error occurred in MPI_Comm_dup
> [lo-a2-058:21425] *** reported by process [4222287873,2]
> [lo-a2-058:21425] *** on communicator MPI COMMUNICATOR 65534 DUP FROM 65533
> [lo-a2-058:21425] *** MPI_ERR_INTERN: internal error
> [lo-a2-058:21425] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
> will now abort,
> [lo-a2-058:21425] ***    and potentially your MPI job)
>
> *Question: *I remember some discussion recently (but can't find the
> thread) about not calling MPI_Comm_dup() too many times from
> PetscCommDuplicate(), which would allow one to safely use the (admittedly
> not optimal) approach used in this application code. Is that a correct
> understanding and would the fixes made in that context also apply to
> Fortran? I don't fully understand the details of the MPI techniques used,
> so thought I'd ask here.
>
> If I hack a simple build-solve-destroy example to run several loops, I see
> a notable difference between C and Fortran examples. With the attached
> ex223.c and ex221f.F90, which just add outer loops (5 iterations) to KSP
> tutorials examples ex23.c and ex21f.F90, respectively, I see the following.
> Note that in the Fortran case, it appears that communicators are actually
> duplicated in each loop, but in the C case, this only happens in the first
> loop:
>
> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex223 -info | grep
> PetscCommDuplicate
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
> -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
>
> [(arch-maint-extra-opt) tutorials (maint *$%=)]$ ./ex221f -info | grep
> PetscCommDuplicate
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
> -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
> -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
> -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
> -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
> -2080374784 max tags = 268435455
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688
> -2080374784
>
>
>
>
> <ex221f.F90><ex223.c>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191101/79ac7fc0/attachment-0001.html>


More information about the petsc-users mailing list