<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi Satish and Junchao,</p>
    <p>I just tried replacing all MPI_COMM_WORLD with PETSC_COMM_WORLD,
      but it didn't do the trick. One thing that interests me is that, I
      ran with 40 ranks but only 2 ranks reported the communicator
      error. I think this means at least the rest 38 ranks freed the
      communicators properly.</p>
    <p>Thanks!</p>
    <p>Feimi<br>
    </p>
    <div class="moz-cite-prefix">On 8/18/21 4:53 PM, Junchao Zhang
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>Hi, Feimi,</div>
        <div>  I need to consult Jed (cc'ed).</div>
          Jed, is this an example of <a
href="https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663"
          moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663</a>?
        If Feimi really can not free matrices, then we just need to
        attach a hypre-comm to a petsc inner comm, and pass that to
        hypre.
        <div>
          <div>
            <div><br clear="all">
              <div>
                <div dir="ltr" class="gmail_signature"
                  data-smartmail="gmail_signature">
                  <div dir="ltr">--Junchao Zhang</div>
                </div>
              </div>
              <br>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, Aug 18, 2021 at 3:38
          PM Satish Balay <<a href="mailto:balay@mcs.anl.gov"
            moz-do-not-send="true">balay@mcs.anl.gov</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Is
          the communicator used to create PETSc objects MPI_COMM_WORLD?<br>
          <br>
          If so - try changing it to PETSC_COMM_WORLD<br>
          <br>
          Satish<br>
          <br>
           On Wed, 18 Aug 2021, Feimi Yu wrote:<br>
          <br>
          > Hi Junchao,<br>
          > <br>
          > Thank you for the suggestion! I'm using the deal.ii
          wrapper<br>
          > dealii::PETScWrappers::PreconditionBase to handle the
          PETSc preconditioners,<br>
          > and the wrappers does the destroy when the preconditioner
          is reinitialized or<br>
          > gets out of scope. I just double-checked, this is called
          to make sure the old<br>
          > matrices are destroyed:<br>
          > <br>
          >    void<br>
          >    PreconditionBase::clear()<br>
          >    {<br>
          >      matrix = nullptr;<br>
          > <br>
          >      if (pc != nullptr)<br>
          >        {<br>
          >          PetscErrorCode ierr = PCDestroy(&pc);<br>
          >          pc                  = nullptr;<br>
          >          AssertThrow(ierr == 0, ExcPETScError(ierr));<br>
          >        }<br>
          >    }<br>
          > <br>
          > Thanks!<br>
          > <br>
          > Feimi<br>
          > <br>
          > On 8/18/21 4:23 PM, Junchao Zhang wrote:<br>
          > ><br>
          > ><br>
          > ><br>
          > > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu <<a
            href="mailto:yuf2@rpi.edu" target="_blank"
            moz-do-not-send="true">yuf2@rpi.edu</a><br>
          > > <mailto:<a href="mailto:yuf2@rpi.edu"
            target="_blank" moz-do-not-send="true">yuf2@rpi.edu</a>>>
          wrote:<br>
          > ><br>
          > >     Hi,<br>
          > ><br>
          > >     I was trying to run a simulation with a
          PETSc-wrapped Hypre<br>
          > >     preconditioner, and encountered this problem:<br>
          > ><br>
          > >     [dcs122:133012] Out of resources: all 4095
          communicator IDs have<br>
          > >     been used.<br>
          > >     [19]PETSC ERROR: --------------------- Error
          Message<br>
          > >   
           --------------------------------------------------------------<br>
          > >     [19]PETSC ERROR: General MPI error<br>
          > >     [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN:
          internal error<br>
          > >     [19]PETSC ERROR: See<br>
          > >     <a
            href="https://www.mcs.anl.gov/petsc/documentation/faq.html"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/documentation/faq.html</a><br>
          > >     <<a
            href="https://www.mcs.anl.gov/petsc/documentation/faq.html"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/documentation/faq.html</a>>
          for trouble<br>
          > >     shooting.<br>
          > >     [19]PETSC ERROR: Petsc Release Version 3.15.2,
          unknown<br>
          > >     [19]PETSC ERROR: ./main on a arch-linux-c-opt
          named dcs122 by<br>
          > >     CFSIfmyu Wed Aug 11 19:51:47 2021<br>
          > >     [19]PETSC ERROR: [dcs122:133010] Out of
          resources: all 4095<br>
          > >     communicator IDs have been used.<br>
          > >     [18]PETSC ERROR: --------------------- Error
          Message<br>
          > >   
           --------------------------------------------------------------<br>
          > >     [18]PETSC ERROR: General MPI error<br>
          > >     [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN:
          internal error<br>
          > >     [18]PETSC ERROR: See<br>
          > >     <a
            href="https://www.mcs.anl.gov/petsc/documentation/faq.html"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/documentation/faq.html</a><br>
          > >     <<a
            href="https://www.mcs.anl.gov/petsc/documentation/faq.html"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/documentation/faq.html</a>>
          for trouble<br>
          > >     shooting.<br>
          > >     [18]PETSC ERROR: Petsc Release Version 3.15.2,
          unknown<br>
          > >     [18]PETSC ERROR: ./main on a arch-linux-c-opt
          named dcs122 by<br>
          > >     CFSIfmyu Wed Aug 11 19:51:47 2021<br>
          > >     [18]PETSC ERROR: Configure options
          --download-scalapack<br>
          > >     --download-mumps --download-hypre
          --with-cc=mpicc<br>
          > >     --with-cxx=mpicxx --with-fc=mpif90
          --with-cudac=0<br>
          > >     --with-debugging=0<br>
          > >   
 --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/<br>
          > >     [18]PETSC ERROR: #1 <<a
            href="https://itssc.rpi.edu/hc/requests/1" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/1</a>><br>
          > >     MatCreate_HYPRE() at<br>
          > >   
 /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120<br>
          > >     [18]PETSC ERROR: #2 <<a
            href="https://itssc.rpi.edu/hc/requests/2" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/2</a>><br>
          > >     MatSetType() at<br>
          > >   
           /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91<br>
          > >     [18]PETSC ERROR: #3 <<a
            href="https://itssc.rpi.edu/hc/requests/3" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/3</a>><br>
          > >     MatConvert_AIJ_HYPRE() at<br>
          > >   
           /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392<br>
          > >     [18]PETSC ERROR: #4 <<a
            href="https://itssc.rpi.edu/hc/requests/4" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/4</a>><br>
          > >     MatConvert() at<br>
          > >   
           /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439<br>
          > >     [18]PETSC ERROR: #5 <<a
            href="https://itssc.rpi.edu/hc/requests/5" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/5</a>><br>
          > >     PCSetUp_HYPRE() at<br>
          > >   
 /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240<br>
          > >     [18]PETSC ERROR: #6 <<a
            href="https://itssc.rpi.edu/hc/requests/6" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/6</a>><br>
          > >     PCSetUp() at<br>
          > >   
 /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015<br>
          > >     Configure options --download-scalapack
          --download-mumps<br>
          > >     --download-hypre --with-cc=mpicc
          --with-cxx=mpicxx<br>
          > >     --with-fc=mpif90 --with-cudac=0
          --with-debugging=0<br>
          > >   
 --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/<br>
          > >     [19]PETSC ERROR: #1 <<a
            href="https://itssc.rpi.edu/hc/requests/1" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/1</a>><br>
          > >     MatCreate_HYPRE() at<br>
          > >   
 /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120<br>
          > >     [19]PETSC ERROR: #2 <<a
            href="https://itssc.rpi.edu/hc/requests/2" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/2</a>><br>
          > >     MatSetType() at<br>
          > >   
           /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91<br>
          > >     [19]PETSC ERROR: #3 <<a
            href="https://itssc.rpi.edu/hc/requests/3" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/3</a>><br>
          > >     MatConvert_AIJ_HYPRE() at<br>
          > >   
           /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392<br>
          > >     [19]PETSC ERROR: #4 <<a
            href="https://itssc.rpi.edu/hc/requests/4" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/4</a>><br>
          > >     MatConvert() at<br>
          > >   
           /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439<br>
          > >     [19]PETSC ERROR: #5 <<a
            href="https://itssc.rpi.edu/hc/requests/5" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/5</a>><br>
          > >     PCSetUp_HYPRE() at<br>
          > >   
 /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240<br>
          > >     [19]PETSC ERROR: #6 <<a
            href="https://itssc.rpi.edu/hc/requests/6" rel="noreferrer"
            target="_blank" moz-do-not-send="true">https://itssc.rpi.edu/hc/requests/6</a>><br>
          > >     PCSetUp() at<br>
          > >   
 /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015<br>
          > ><br>
          > >     It seems that MPI_Comm_dup() at<br>
          > >     petsc/src/mat/impls/hypre/mhypre.c:2120 caused
          the problem. Since<br>
          > >     mine is a time-dependent problem,
          MatCreate_HYPRE() is called<br>
          > >     every time the new system matrix is assembled.
          The above error<br>
          > >     message is reported after ~4095 calls of
          MatCreate_HYPRE(), which<br>
          > >     is around 455 time steps in my code. Here is
          some basic compiler<br>
          > >     information:<br>
          > ><br>
          > > Can you destroy old matrices to free MPI
          communicators? Otherwise, you run<br>
          > > into a limitation we knew before.<br>
          > ><br>
          > >     IBM Spectrum MPI 10.4.0<br>
          > ><br>
          > >     GCC 8.4.1<br>
          > ><br>
          > >     I've never had this problem before with OpenMPI
          or MPICH<br>
          > >     implementation, so I was wondering if this can
          be resolved from my<br>
          > >     end, or it's an implementation specific problem.<br>
          > ><br>
          > >     Thanks!<br>
          > ><br>
          > >     Feimi<br>
          > ><br>
          > <br>
          > </blockquote>
      </div>
    </blockquote>
  </body>
</html>