[petsc-dev] Is DMGetDMKSPWrite() really not collective?
Zhang, Junchao
jczhang at mcs.anl.gov
Fri Jun 14 17:58:34 CDT 2019
Discussed with Mr. Hong. The following two new lines fix the problem. Either single line also works, but I think we should have both. Similar things happen to dmsnes and dmts. But I need DM experts' feedback before changing them. Thanks.
diff --git a/src/ksp/ksp/interface/dmksp.c b/src/ksp/ksp/interface/dmksp.c
index 9ce75090..0ab69574 100644
--- a/src/ksp/ksp/interface/dmksp.c
+++ b/src/ksp/ksp/interface/dmksp.c
@@ -80,6 +80,7 @@ PetscErrorCode DMKSPCopy(DMKSP kdm,DMKSP nkdm)
nkdm->rhsctx = kdm->rhsctx;
nkdm->initialguessctx = kdm->initialguessctx;
nkdm->data = kdm->data;
+ nkdm->originaldm = kdm->originaldm;
nkdm->fortran_func_pointers[0] = kdm->fortran_func_pointers[0];
nkdm->fortran_func_pointers[1] = kdm->fortran_func_pointers[1];
@@ -156,6 +157,7 @@ PetscErrorCode DMGetDMKSPWrite(DM dm,DMKSP *kspdm)
ierr = DMKSPCopy(oldkdm,kdm);CHKERRQ(ierr);
ierr = DMKSPDestroy((DMKSP*)&dm->dmksp);CHKERRQ(ierr);
dm->dmksp = (PetscObject)kdm;
+ kdm->originaldm = dm;
}
*kspdm = kdm;
PetscFunctionReturn(0);
--Junchao Zhang
On Fri, Jun 14, 2019 at 1:07 PM Junchao Zhang <jczhang at mcs.anl.gov<mailto:jczhang at mcs.anl.gov>> wrote:
On Fri, Jun 14, 2019 at 1:01 PM Lawrence Mitchell <wence at gmx.li<mailto:wence at gmx.li>> wrote:
> On 14 Jun 2019, at 18:44, Zhang, Junchao via petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>> wrote:
>
> Hello,
> I am investigating petsc issue 306. One can produce the problem with src/snes/examples/tutorials/ex9.c and mpirun -n 3 ./ex9 -snes_grid_sequence 3 -snes_converged_reason -pc_type mg
> The program can run to finish, or crash, or hang. The error appears either in PetscGatherMessageLengths or PetscCommBuildTwoSided(). From from my debugging, the following routine is suspicious. It claims not collective, but it might call DMKSPCreate, which can indirectly call a collective MPI_Comm_dup().
I would have thought that DMKSPCreate could only call MPI_Comm_dup (via PetscCommDuplicate) if the incoming dm has a communicator which is /not/ a PETSc communicator. Given that all PETSc objects must (?) return a PETSc communicator when calling PetscObjectComm, this function is presumably incidentally not collective (although it is logically collective I would have thought), oh, and with PETSC_USE_DEBUG, there's a barrier even in the "return immediately with a PETSc comm" case.
Yes, PetscCommDuplicate not always calls MPI_Comm_dup, but it decreases MPI tag, resulting in MPI tag mismatch and crashing the code. If I turn on PETSC_USE_DEBUG, the code sometime (but not all times) hang in PetscCommDuplicate.
FWIW, the call site looks to be collective (PCSetUp_MG).
Lawrence
> Can someone familiar with the code explain it? Thanks.
>
> /*@C
> DMGetDMKSPWrite - get write access to private DMKSP context from a DM
>
> Not Collective
>
> Input Argument:
> . dm - DM to be used with KSP
>
> Output Argument:
> . kspdm - private DMKSP context
>
> Level: developer
>
> .seealso: DMGetDMKSP()
> @*/
> PetscErrorCode DMGetDMKSPWrite(DM dm,DMKSP *kspdm)
> {
> PetscErrorCode ierr;
> DMKSP kdm;
>
> PetscFunctionBegin;
> PetscValidHeaderSpecific(dm,DM_CLASSID,1);
> ierr = DMGetDMKSP(dm,&kdm);CHKERRQ(ierr);
> if (!kdm->originaldm) kdm->originaldm = dm;
> if (kdm->originaldm != dm) { /* Copy on write */
> DMKSP oldkdm = kdm;
> ierr = PetscInfo(dm,"Copying DMKSP due to write\n");CHKERRQ(ierr);
> ierr = DMKSPCreate(PetscObjectComm((PetscObject)dm),&kdm);CHKERRQ(ierr);
> ierr = DMKSPCopy(oldkdm,kdm);CHKERRQ(ierr);
> ierr = DMKSPDestroy((DMKSP*)&dm->dmksp);CHKERRQ(ierr);
> dm->dmksp = (PetscObject)kdm;
> }
> *kspdm = kdm;
> PetscFunctionReturn(0);
> }
>
> The calling stack is
> ...
> #20 0x00007fcf3e3d23e2 in PetscCommDuplicate (comm_in=comm_in at entry=-2080374782, comm_out=comm_out at entry=0x557a18304db0, first_tag=first_tag at entry=0x557a18304de4) at /home/jczhang/petsc/src/sys/objects/tagm.c:162
> #21 0x00007fcf3e3d7730 in PetscHeaderCreate_Private (h=0x557a18304d70, classid=<optimized out>, class_name=class_name at entry=0x7fcf3f7f762a "DMKSP", descr=descr at entry=0x7fcf3f7f762a "DMKSP", mansec=mansec at entry=0x7fcf3f7f762a "DMKSP", comm=comm at entry=-2080374782, destroy=0x7fcf3f350570 <DMKSPDestroy>, view=0x0) at /home/jczhang/petsc/src/sys/objects/inherit.c:64
> #22 0x00007fcf3f3504c9 in DMKSPCreate (comm=-2080374782, kdm=kdm at entry=0x7ffc1d4d00f8) at /home/jczhang/petsc/src/ksp/ksp/interface/dmksp.c:24
> #23 0x00007fcf3f35150f in DMGetDMKSPWrite (dm=0x557a18541a10, kspdm=kspdm at entry=0x7ffc1d4d01a8) at /home/jczhang/petsc/src/ksp/ksp/interface/dmksp.c:155
> #24 0x00007fcf3f1bb20e in PCSetUp_MG (pc=<optimized out>) at /home/jczhang/petsc/src/ksp/pc/impls/mg/mg.c:682
> #25 0x00007fcf3f204bea in PCSetUp (pc=0x557a17dc1860) at /home/jczhang/petsc/src/ksp/pc/interface/precon.c:894
> #26 0x00007fcf3f32ba4b in KSPSetUp (ksp=0x557a17d73500) at /home/jczhang/petsc/src/ksp/ksp/interface/itfunc.c:377
> #27 0x00007fcf3f41e43e in SNESSolve_VINEWTONRSLS (snes=0x557a17bff210) at /home/jczhang/petsc/src/snes/impls/vi/rs/virs.c:502
> #28 0x00007fcf3f3fa191 in SNESSolve (snes=0x557a17bff210, b=0x0, x=<optimized out>) at /home/jczhang/petsc/src/snes/interface/snes.c:4433
> #29 0x0000557a16432095 in main (argc=<optimized out>, argv=<optimized out>) at /home/jczhang/petsc/src/snes/examples/tutorials/ex9.c:105
>
> --Junchao Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190614/5e7e696b/attachment.html>
More information about the petsc-dev
mailing list