<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Apr 3, 2018, at 5:43 PM, Fande Kong <<a href="mailto:fdkong.jd@gmail.com" class="">fdkong.jd@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><br class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">On Tue, Apr 3, 2018 at 9:12 AM, Stefano Zampini <span dir="ltr" class=""><<a href="mailto:stefano.zampini@gmail.com" target="_blank" class="">stefano.zampini@gmail.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><br class=""><div class=""><span class="gmail-"><blockquote type="cite" class=""><div class="">On Apr 3, 2018, at 4:58 PM, Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank" class="">balay@mcs.anl.gov</a>> wrote:</div><br class="gmail-m_2524865371699403345Apple-interchange-newline"><div class=""><div class="">On Tue, 3 Apr 2018, Kong, Fande wrote:<br class=""><br class=""><blockquote type="cite" class="">On Tue, Apr 3, 2018 at 1:17 AM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank" class="">bsmith@mcs.anl.gov</a>> wrote:<br class=""><br class=""><blockquote type="cite" class=""><br class=""> Each external package definitely needs its own duplicated communicator;<br class="">cannot share between packages.<br class=""><br class=""> The only problem with the dups below is if they are in a loop and get<br class="">called many times.<br class=""><br class=""></blockquote><br class=""><br class="">The "standard test" that has this issue actually has 1K fields. MOOSE<br class="">creates its own field-split preconditioner (not based on the PETSc<br class="">fieldsplit), and each filed is associated with one PC HYPRE. If PETSc<br class="">duplicates communicators, we should easily reach the limit 2048.<br class=""><br class="">I also want to confirm what extra communicators are introduced in the bad<br class="">commit.<br class=""></blockquote><br class="">To me it looks like there is 1 extra comm created [for MATHYPRE] for each PCHYPRE that is created [which also creates one comm for this object].<br class=""><br class=""></div></div></blockquote><div class=""><br class=""></div></span><div class="">You’re right; however, it was the same before the commit.</div><div class="">I don’t understand how this specific commit is related with this issue, being the error not in the MPI_Comm_Dup which is inside MatCreate_MATHYPRE. Actually, the error comes from MPI_Comm_create</div><span class="gmail-"><div class=""><br class=""></div><div class=""><i class=""> frame #5: 0x00000001068defd4 libmpi.12.dylib`MPI_Comm_<wbr class="">create + 3492<br class=""> frame #6: 0x00000001061345d9 libpetsc.3.07.dylib`hypre_<wbr class="">GenerateSubComm(comm=-<wbr class="">1006627852, participate=<unavailable>, new_comm_ptr=<unavailable>) + 409 at gen_redcs_mat.c:531 [opt]<br class=""> frame #7: 0x000000010618f8ba libpetsc.3.07.dylib`hypre_<wbr class="">GaussElimSetup(amg_data=<wbr class="">0x00007fe7ff857a00, level=<unavailable>, relax_type=9) + 74 at par_relax.c:4209 [opt]<br class=""> frame #8: 0x0000000106140e93 libpetsc.3.07.dylib`hypre_<wbr class="">BoomerAMGSetup(amg_vdata=<<wbr class="">unavailable>, A=0x00007fe80842aff0, f=0x00007fe80842a980, u=0x00007fe80842a510) + 17699 at par_amg_setup.c:2108 [opt]<br class=""> frame #9: 0x0000000105ec773c libpetsc.3.07.dylib`PCSetUp_<wbr class="">HYPRE(pc=<unavailable>) + 2540 at hypre.c:226 [opt</i></div><div class=""><br class=""></div></span><div class="">How did you perform the bisection? make clean + make all ? Which version of HYPRE are you using?</div></div></div></blockquote><div class=""><br class=""></div><div class="">I did more aggressively. </div><div class=""><br class=""></div><div class="">"rm -rf arch-darwin-c-opt-bisect "</div><div class=""><br class=""></div><div class="">"./configure --optionsModule=config.compilerOptions -with-debugging=no --with-shared-libraries=1 --with-mpi=1 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-hypre=1 --download-mumps=1 --download-scalapack=1 PETSC_ARCH=arch-darwin-c-opt-bisect"</div><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div>Good, so this removes some possible sources of errors<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><div class="">HYPRE verison:</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class=""> self.gitcommit = 'v2.11.1-55-g2ea0e43'</div><div class=""> self.download = ['git://<a href="https://github.com/LLNL/hypre" class="">https://github.com/LLNL/hypre</a>','<a href="https://github.com/LLNL/hypre/archive/'+self.gitcommit+'.tar.gz" class="">https://github.com/LLNL/hypre/archive/'+self.gitcommit+'.tar.gz</a>']</div></div><div class=""><br class=""></div><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div><div>When reconfiguring, the HYPRE version can be different too (that commit is from 11/2016, so the HYPRE version used by the PETSc configure can have been upgraded too)</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">I do not think this is caused by HYPRE.</div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><div class="">Fande,</div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class=""><div class="gmail-h5"><br class=""><blockquote type="cite" class=""><div class=""><div class="">But you might want to verify [by linking with mpi trace library?]<br class=""><br class=""><br class="">There are some debugging hints at <a href="https://lists.mpich.org/pipermail/discuss/2012-December/000148.html" target="_blank" class="">https://lists.mpich.org/<wbr class="">pipermail/discuss/2012-<wbr class="">December/000148.html</a> [wrt mpich] - which I haven't checked..<br class=""><br class="">Satish<br class=""><br class=""><blockquote type="cite" class=""><br class=""><br class="">Fande,<br class=""><br class=""><br class=""><br class=""><blockquote type="cite" class=""><br class=""> To debug the hypre/duplication issue in MOOSE I would run in the<br class="">debugger with a break point in MPI_Comm_dup() and see<br class="">who keeps calling it an unreasonable amount of times. (My guess is this is<br class="">a new "feature" in hypre that they will need to fix but only debugging will<br class="">tell)<br class=""><br class=""> Barry<br class=""><br class=""><br class=""><blockquote type="cite" class="">On Apr 2, 2018, at 7:44 PM, Balay, Satish <<a href="mailto:balay@mcs.anl.gov" target="_blank" class="">balay@mcs.anl.gov</a>> wrote:<br class=""><br class="">We do a MPI_Comm_dup() for objects related to externalpackages.<br class=""><br class="">Looks like we added a new mat type MATHYPRE - in 3.8 that PCHYPRE is<br class="">using. Previously there was one MPI_Comm_dup() PCHYPRE - now I think<br class="">is one more for MATHYPRE - so more calls to MPI_Comm_dup in 3.8 vs 3.7<br class=""><br class="">src/dm/impls/da/hypre/mhyp.c: ierr = MPI_Comm_dup(PetscObjectComm((<br class=""></blockquote>PetscObject)B),&(ex->hcomm));<wbr class="">CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/dm/impls/da/hypre/mhyp.c: ierr = MPI_Comm_dup(PetscObjectComm((<br class=""></blockquote>PetscObject)B),&(ex->hcomm));<wbr class="">CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/dm/impls/swarm/data_ex.c: ierr = MPI_Comm_dup(comm,&d->comm);<br class=""></blockquote>CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/ksp/pc/impls/hypre/hypre.<wbr class="">c: ierr = MPI_Comm_dup(PetscObjectComm((<br class=""></blockquote>PetscObject)pc),&(jac->comm_<wbr class="">hypre));CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/ksp/pc/impls/hypre/hypre.<wbr class="">c: ierr = MPI_Comm_dup(PetscObjectComm((<br class=""></blockquote>PetscObject)pc),&(ex->hcomm));<wbr class="">CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/ksp/pc/impls/hypre/hypre.<wbr class="">c: ierr = MPI_Comm_dup(PetscObjectComm((<br class=""></blockquote>PetscObject)pc),&(ex->hcomm));<wbr class="">CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/ksp/pc/impls/spai/ispai.c: ierr =<br class=""></blockquote>MPI_Comm_dup(PetscObjectComm((<wbr class="">PetscObject)pc),&(ispai->comm_<br class="">spai));CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/mat/examples/tests/ex152.<wbr class="">c: ierr = MPI_Comm_dup(MPI_COMM_WORLD,<br class=""></blockquote>&comm);CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/mat/impls/aij/mpi/mkl_<wbr class="">cpardiso/mkl_cpardiso.c: ierr =<br class=""></blockquote>MPI_Comm_dup(PetscObjectComm((<wbr class="">PetscObject)A),&(mat_mkl_<br class="">cpardiso->comm_mkl_cpardiso));<wbr class="">CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/mat/impls/aij/mpi/mumps/<wbr class="">mumps.c: ierr =<br class=""></blockquote>MPI_Comm_dup(PetscObjectComm((<wbr class="">PetscObject)A),&(mumps->comm_<br class="">mumps));CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/mat/impls/aij/mpi/pastix/<wbr class="">pastix.c: ierr =<br class=""></blockquote>MPI_Comm_dup(PetscObjectComm((<wbr class="">PetscObject)A),&(lu->pastix_<br class="">comm));CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/mat/impls/aij/mpi/superlu_<wbr class="">dist/superlu_dist.c: ierr =<br class=""></blockquote>MPI_Comm_dup(PetscObjectComm((<wbr class="">PetscObject)A),&(lu->comm_<br class="">superlu));CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/mat/impls/hypre/mhypre.c: ierr = MPI_Comm_dup(PetscObjectComm((<br class=""></blockquote>PetscObject)B),&hB->comm);<wbr class="">CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/mat/partition/impls/<wbr class="">pmetis/pmetis.c: ierr =<br class=""></blockquote>MPI_Comm_dup(pcomm,&comm);<wbr class="">CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/sys/mpiuni/mpi.c: MPI_COMM_SELF, MPI_COMM_WORLD, and a<br class=""></blockquote>MPI_Comm_dup() of each of these (duplicates of duplicates return the same<br class="">communictor)<br class=""><blockquote type="cite" class="">src/sys/mpiuni/mpi.c:int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)<br class="">src/sys/objects/pinit.c: ierr = MPI_Comm_dup(MPI_COMM_WORLD,&<br class=""></blockquote>local_comm);CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/sys/objects/pinit.c: ierr = MPI_Comm_dup(MPI_COMM_WORLD,&<br class=""></blockquote>local_comm);CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/sys/objects/tagm.c: ierr = MPI_Comm_dup(comm_in,comm_out)<br class=""></blockquote>;CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/sys/utils/mpiu.c: ierr = MPI_Comm_dup(comm,&local_comm)<br class=""></blockquote>;CHKERRQ(ierr);<br class=""><blockquote type="cite" class="">src/ts/impls/implicit/<wbr class="">sundials/sundials.c: ierr =<br class=""></blockquote>MPI_Comm_dup(PetscObjectComm((<wbr class="">PetscObject)ts),&(cvode->comm_<br class="">sundials));CHKERRQ(ierr);<br class=""><blockquote type="cite" class=""><br class="">Perhaps we need a PetscCommDuplicateExternalPkg(<wbr class="">) to somehow avoid<br class=""></blockquote>these MPI_Comm_dup() calls?<br class=""><blockquote type="cite" class=""><br class="">Satish<br class=""><br class="">On Tue, 3 Apr 2018, Smith, Barry F. wrote:<br class=""><br class=""><blockquote type="cite" class=""><br class=""> Are we sure this is a PETSc comm issue and not a hypre comm<br class=""></blockquote></blockquote>duplication issue<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class="">frame #6: 0x00000001061345d9 libpetsc.3.07.dylib`hypre_<br class=""></blockquote></blockquote>GenerateSubComm(comm=-<wbr class="">1006627852, participate=<unavailable>,<br class="">new_comm_ptr=<unavailable>) + 409 at gen_redcs_mat.c:531 [opt]<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class="">Looks like hypre is needed to generate subcomms, perhaps it generates<br class=""></blockquote></blockquote>too many?<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class=""> Barry<br class=""><br class=""><br class=""><blockquote type="cite" class="">On Apr 2, 2018, at 7:07 PM, Derek Gaston <<a href="mailto:friedmud@gmail.com" target="_blank" class="">friedmud@gmail.com</a>> wrote:<br class=""><br class="">I’m working with Fande on this and I would like to add a bit more.<br class=""></blockquote></blockquote></blockquote>There are many circumstances where we aren’t working on COMM_WORLD at all<br class="">(e.g. working on a sub-communicator) but PETSc was initialized using<br class="">MPI_COMM_WORLD (think multi-level solves)… and we need to create<br class="">arbitrarily many PETSc vecs/mats/solvers/<wbr class="">preconditioners and solve. We<br class="">definitely can’t rely on using PETSC_COMM_WORLD to avoid triggering<br class="">duplication.<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class="">Can you explain why PETSc needs to duplicate the communicator so much?<br class=""><br class="">Thanks for your help in tracking this down!<br class=""><br class="">Derek<br class=""><br class="">On Mon, Apr 2, 2018 at 5:44 PM Kong, Fande <<a href="mailto:fande.kong@inl.gov" target="_blank" class="">fande.kong@inl.gov</a>> wrote:<br class="">Why we do not use user-level MPI communicators directly? What are<br class=""></blockquote></blockquote></blockquote>potential risks here?<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class=""><br class="">Fande,<br class=""><br class="">On Mon, Apr 2, 2018 at 5:08 PM, Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank" class="">balay@mcs.anl.gov</a>><br class=""></blockquote></blockquote></blockquote>wrote:<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">PETSC_COMM_WORLD [via PetscCommDuplicate()] attempts to minimize calls<br class=""></blockquote></blockquote></blockquote>to MPI_Comm_dup() - thus potentially avoiding such errors<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class=""><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mcs" target="_blank" class="">https://urldefense.proofpoint.<wbr class="">com/v2/url?u=http-3A__www.mcs</a>.<br class=""></blockquote></blockquote></blockquote>anl.gov_petsc_petsc-2Dcurrent_<wbr class="">docs_manualpages_Sys_<br class="">PetscCommDuplicate.html&d=<wbr class="">DwIBAg&c=<wbr class="">54IZrppPQZKX9mLzcGdPfFD1hxrcB_<br class="">_aEkJFOKJFd00&r=DUUt3SRGI0_<wbr class="">JgtNaS3udV68GRkgV4ts7XKfj2opmi<br class="">CY&m=jgv7gpZ3K52d_<wbr class="">FWMgkK9yEScbLA7pkrWydFuJnYflsU<wbr class="">&s=_<br class="">zpWRcyk3kHuEHoq02NDqYExnXIohXp<wbr class="">NnjyabYnnDjU&e=<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class=""><br class="">Satish<br class=""><br class="">On Mon, 2 Apr 2018, Kong, Fande wrote:<br class=""><br class=""><blockquote type="cite" class="">On Mon, Apr 2, 2018 at 4:23 PM, Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank" class="">balay@mcs.anl.gov</a>><br class=""></blockquote></blockquote></blockquote></blockquote>wrote:<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" class="">Does this 'standard test' use MPI_COMM_WORLD' to crate PETSc objects?<br class=""><br class="">If so - you could try changing to PETSC_COMM_WORLD<br class=""><br class=""></blockquote><br class=""><br class="">I do not think we are using PETSC_COMM_WORLD when creating PETSc<br class=""></blockquote></blockquote></blockquote></blockquote>objects.<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">Why we can not use MPI_COMM_WORLD?<br class=""><br class=""><br class="">Fande,<br class=""><br class=""><br class=""><blockquote type="cite" class=""><br class="">Satish<br class=""><br class=""><br class="">On Mon, 2 Apr 2018, Kong, Fande wrote:<br class=""><br class=""><blockquote type="cite" class="">Hi All,<br class=""><br class="">I am trying to upgrade PETSc from 3.7.6 to 3.8.3 for MOOSE and its<br class="">applications. I have a error message for a standard test:<br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class="">*preconditioners/pbp.lots_of_<wbr class="">variables: MPI had an<br class="">errorpreconditioners/pbp.lots_<wbr class="">of_variables:<br class="">------------------------------<wbr class="">------------------<br class=""></blockquote>preconditioners/pbp.lots_of_<wbr class="">variables:<br class=""><blockquote type="cite" class="">Other MPI error, error stack:preconditioners/pbp.<wbr class="">lots_of_variables:<br class="">PMPI_Comm_dup(177)............<wbr class="">......: MPI_Comm_dup(comm=0x84000001,<br class="">new_comm=0x97d1068) failedpreconditioners/pbp.<wbr class="">lots_of_variables:<br class="">PMPI_Comm_dup(162)............<wbr class="">......:<br class="">preconditioners/pbp.lots_of_<wbr class="">variables:<br class="">MPIR_Comm_dup_impl(57)........<wbr class="">......:<br class="">preconditioners/pbp.lots_of_<wbr class="">variables:<br class="">MPIR_Comm_copy(739)...........<wbr class="">......:<br class="">preconditioners/pbp.lots_of_<wbr class="">variables:<br class="">MPIR_Get_contextid_sparse_<wbr class="">group(614): Too many communicators<br class=""></blockquote></blockquote></blockquote></blockquote></blockquote></blockquote>(0/2048<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">free<br class=""><blockquote type="cite" class="">on this process; ignore_id=0)*<br class=""><br class=""><br class="">I did "git bisect', and the following commit introduces this issue:<br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class="">*commit 49a781f5cee36db85e8d5b951eec29<wbr class="">f10ac13593Author: Stefano<br class=""></blockquote></blockquote></blockquote></blockquote></blockquote></blockquote>Zampini<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><<a href="mailto:stefano.zampini@gmail.com" target="_blank" class="">stefano.zampini@gmail.com</a> <<a href="mailto:stefano.zampini@gmail.com" target="_blank" class="">stefano.zampini@gmail.com</a>>><wbr class="">Date: Sat<br class=""></blockquote></blockquote></blockquote></blockquote></blockquote></blockquote>Nov 5<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">20:15:19 2016 +0300 PCHYPRE: use internal Mat of type MatHYPRE<br class="">hpmat already stores two HYPRE vectors*<br class=""><br class="">Before I debug line-by-line, anyone has a clue on this?<br class=""><br class=""><br class="">Fande,<br class=""><br class=""></blockquote><br class=""><br class=""></blockquote><br class=""></blockquote><br class=""></blockquote><br class=""></blockquote></blockquote><br class=""><br class=""></blockquote></blockquote></div></div></blockquote></div></div></div><br class=""></div></blockquote></div><br class=""></div></div>
</div></blockquote></div><br class=""></body></html>