[petsc-users] A bad commit affects MOOSE

Satish Balay balay at mcs.anl.gov
Mon Apr 2 20:44:56 CDT 2018


We do a MPI_Comm_dup() for objects related to externalpackages.

Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is
using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think
is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7

src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
src/dm/impls/da/hypre/mhyp.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)B),&(ex->hcomm));CHKERRQ(ierr);
src/dm/impls/swarm/data_ex.c:  ierr = MPI_Comm_dup(comm,&d->comm);CHKERRQ(ierr);
src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(jac->comm_hypre));CHKERRQ(ierr);
src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
src/ksp/pc/impls/hypre/hypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ex->hcomm));CHKERRQ(ierr);
src/ksp/pc/impls/spai/ispai.c:  ierr      = MPI_Comm_dup(PetscObjectComm((PetscObject)pc),&(ispai->comm_spai));CHKERRQ(ierr);
src/mat/examples/tests/ex152.c:  ierr   = MPI_Comm_dup(MPI_COMM_WORLD, &comm);CHKERRQ(ierr);
src/mat/impls/aij/mpi/mkl_cpardiso/mkl_cpardiso.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mat_mkl_cpardiso->comm_mkl_cpardiso));CHKERRQ(ierr);
src/mat/impls/aij/mpi/mumps/mumps.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(mumps->comm_mumps));CHKERRQ(ierr);
src/mat/impls/aij/mpi/pastix/pastix.c:    ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->pastix_comm));CHKERRQ(ierr);
src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)A),&(lu->comm_superlu));CHKERRQ(ierr);
src/mat/impls/hypre/mhypre.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)B),&hB->comm);CHKERRQ(ierr);
src/mat/partition/impls/pmetis/pmetis.c:    ierr   = MPI_Comm_dup(pcomm,&comm);CHKERRQ(ierr);
src/sys/mpiuni/mpi.c:    MPI_COMM_SELF, MPI_COMM_WORLD, and a MPI_Comm_dup() of each of these (duplicates of duplicates return the same communictor)
src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
src/sys/objects/pinit.c:      ierr = MPI_Comm_dup(MPI_COMM_WORLD,&local_comm);CHKERRQ(ierr);
src/sys/objects/pinit.c:      ierr = MPI_Comm_dup(MPI_COMM_WORLD,&local_comm);CHKERRQ(ierr);
src/sys/objects/tagm.c:      ierr = MPI_Comm_dup(comm_in,comm_out);CHKERRQ(ierr);
src/sys/utils/mpiu.c:  ierr = MPI_Comm_dup(comm,&local_comm);CHKERRQ(ierr);
src/ts/impls/implicit/sundials/sundials.c:  ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)ts),&(cvode->comm_sundials));CHKERRQ(ierr);

Perhaps we need a PetscCommDuplicateExternalPkg() to somehow avoid these MPI_Comm_dup() calls?

Satish

On Tue, 3 Apr 2018, Smith, Barry F. wrote:

> 
>   Are we sure this is a PETSc comm issue and not a hypre comm duplication issue
> 
>  frame #6: 0x00000001061345d9 libpetsc.3.07.dylib`hypre_GenerateSubComm(comm=-1006627852, participate=<unavailable>, new_comm_ptr=<unavailable>) + 409 at gen_redcs_mat.c:531 [opt]
> 
> Looks like hypre is needed to generate subcomms, perhaps it generates too many?
> 
>    Barry
> 
> 
> > On Apr 2, 2018, at 7:07 PM, Derek Gaston <friedmud at gmail.com> wrote:
> > 
> > I’m working with Fande on this and I would like to add a bit more.  There are many circumstances where we aren’t working on COMM_WORLD at all (e.g. working on a sub-communicator) but PETSc was initialized using MPI_COMM_WORLD (think multi-level solves)… and we need to create arbitrarily many PETSc vecs/mats/solvers/preconditioners and solve.  We definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering duplication.
> > 
> > Can you explain why PETSc needs to duplicate the communicator so much?
> > 
> > Thanks for your help in tracking this down!
> > 
> > Derek
> > 
> > On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande <fande.kong at inl.gov> wrote:
> > Why we do not use user-level MPI communicators directly? What are potential risks here? 
> > 
> > 
> > Fande,
> > 
> > On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> > PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to MPI_Comm_dup() - thus potentially avoiding such errors
> > 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_PetscCommDuplicate.html&d=DwIBAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU&s=_zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU&e=
> > 
> > 
> > Satish
> > 
> > On Mon, 2 Apr 2018, Kong, Fande wrote:
> > 
> > > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> > >
> > > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
> > > >
> > > > If so - you could try changing to PETSC_COMM_WORLD
> > > >
> > >
> > >
> > > I do not think we are using PETSC_COMM_WORLD when creating PETSc objects.
> > > Why we can not use MPI_COMM_WORLD?
> > >
> > >
> > > Fande,
> > >
> > >
> > > >
> > > > Satish
> > > >
> > > >
> > > > On Mon, 2 Apr 2018, Kong, Fande wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
> > > > > applications. I have a error message for a standard test:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *preconditioners/pbp.lots_of_variables: MPI had an
> > > > > errorpreconditioners/pbp.lots_of_variables:
> > > > > ------------------------------------------------
> > > > preconditioners/pbp.lots_of_variables:
> > > > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
> > > > > PMPI_Comm_dup(177)..................: MPI_Comm_dup(comm=0x84000001,
> > > > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
> > > > > PMPI_Comm_dup(162)..................:
> > > > > preconditioners/pbp.lots_of_variables:
> > > > > MPIR_Comm_dup_impl(57)..............:
> > > > > preconditioners/pbp.lots_of_variables:
> > > > > MPIR_Comm_copy(739).................:
> > > > > preconditioners/pbp.lots_of_variables:
> > > > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
> > > > free
> > > > > on this process; ignore_id=0)*
> > > > >
> > > > >
> > > > > I did "git bisect', and the following commit introduces this issue:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano Zampini
> > > > > <stefano.zampini at gmail.com <stefano.zampini at gmail.com>>Date:   Sat Nov 5
> > > > > 20:15:19 2016 +0300    PCHYPRE: use internal Mat of type MatHYPRE
> > > > > hpmat already stores two HYPRE vectors*
> > > > >
> > > > > Before I debug line-by-line, anyone has a clue on this?
> > > > >
> > > > >
> > > > > Fande,
> > > > >
> > > >
> > > >
> > >
> > 
> 
> 


More information about the petsc-users mailing list