[petsc-users] A bad commit affects MOOSE

Derek Gaston friedmud at gmail.com
Mon Apr 2 19:07:35 CDT 2018


I’m working with Fande on this and I would like to add a bit more.  There
are many circumstances where we aren’t working on COMM_WORLD at all (e.g.
working on a sub-communicator) but PETSc was initialized using
MPI_COMM_WORLD (think multi-level solves)… and we need to create
arbitrarily many PETSc vecs/mats/solvers/preconditioners and solve.  We
definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering
duplication.

Can you explain why PETSc needs to duplicate the communicator so much?

Thanks for your help in tracking this down!

Derek

On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande <fande.kong at inl.gov> wrote:

> Why we do not use user-level MPI communicators directly? What are
> potential risks here?
>
>
> Fande,
>
> On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>
>> PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls to
>> MPI_Comm_dup() - thus potentially avoiding such errors
>>
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_petsc-2Dcurrent_docs_manualpages_Sys_PetscCommDuplicate.html&d=DwIBAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=jgv7gpZ3K52d_FWMgkK9yEScbLA7pkrWydFuJnYflsU&s=_zpWRcyk3kHuEHoq02NDqYExnXIohXpNnjyabYnnDjU&e=
>
>
>>
>> Satish
>>
>> On Mon, 2 Apr 2018, Kong, Fande wrote:
>>
>> > On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>> >
>> > > Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?
>> > >
>> > > If so - you could try changing to PETSC_COMM_WORLD
>> > >
>> >
>> >
>> > I do not think we are using PETSC_COMM_WORLD when creating PETSc
>> objects.
>> > Why we can not use MPI_COMM_WORLD?
>> >
>> >
>> > Fande,
>> >
>> >
>> > >
>> > > Satish
>> > >
>> > >
>> > > On Mon, 2 Apr 2018, Kong, Fande wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its
>> > > > applications. I have a error message for a standard test:
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > *preconditioners/pbp.lots_of_variables: MPI had an
>> > > > errorpreconditioners/pbp.lots_of_variables:
>> > > > ------------------------------------------------
>> > > preconditioners/pbp.lots_of_variables:
>> > > > Other MPI error, error stack:preconditioners/pbp.lots_of_variables:
>> > > > PMPI_Comm_dup(177)..................: MPI_Comm_dup(comm=0x84000001,
>> > > > new_comm=0x97d1068) failedpreconditioners/pbp.lots_of_variables:
>> > > > PMPI_Comm_dup(162)..................:
>> > > > preconditioners/pbp.lots_of_variables:
>> > > > MPIR_Comm_dup_impl(57)..............:
>> > > > preconditioners/pbp.lots_of_variables:
>> > > > MPIR_Comm_copy(739).................:
>> > > > preconditioners/pbp.lots_of_variables:
>> > > > MPIR_Get_contextid_sparse_group(614): Too many communicators (0/2048
>> > > free
>> > > > on this process; ignore_id=0)*
>> > > >
>> > > >
>> > > > I did "git bisect', and the following commit introduces this issue:
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > *commit 49a781f5cee36db85e8d5b951eec29f10ac13593Author: Stefano
>> Zampini
>> > > > <stefano.zampini at gmail.com <stefano.zampini at gmail.com>>Date:   Sat
>> Nov 5
>> > > > 20:15:19 2016 +0300    PCHYPRE: use internal Mat of type MatHYPRE
>> > > > hpmat already stores two HYPRE vectors*
>> > > >
>> > > > Before I debug line-by-line, anyone has a clue on this?
>> > > >
>> > > >
>> > > > Fande,
>> > > >
>> > >
>> > >
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180403/3525b348/attachment.html>


More information about the petsc-users mailing list