<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Thanks Stefano.  <br>
    <br>
    Reading the manual pages a bit more carefully,<br>
    I think I can see what I should be doing.  Which should be roughly
    to <br>
    <br>
    1. Set up target Seq vectors on PETSC_COMM_SELF<br>
    2. Use ISCreateGeneral to create ISs for the target Vecs  and the
    source Vec which will be MPI on PETSC_COMM_WORLD.<br>
    3. Create the scatter context with VecScatterCreate<br>
    4. Call VecScatterBegin/End on each process (instead of using my
    prior routine).<br>
    <br>
    Lingering questions:<br>
    <br>
    a. Is there any performance advantage/disadvantage to creating a
    single parallel target Vec instead<br>
    of multiple target Seq Vecs (in terms of the scatter operation)?<br>
    <br>
    b. The data that ends up in the target on each processor needs to be
    in an application<br>
    array.  Is there a clever way to 'move' the data from the scatter
    target to the array (short<br>
    of just running a loop over it and copying)?<br>
    <br>
    -sanjay<br>
    <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 5/31/19 12:02 PM, Stefano Zampini
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:ACE97360-FB5C-454E-B665-87265BA738E0@gmail.com">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <br class="">
      <div><br class="">
        <blockquote type="cite" class="">
          <div class="">On May 31, 2019, at 9:50 PM, Sanjay Govindjee
            via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov"
              class="" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
            wrote:</div>
          <br class="Apple-interchange-newline">
          <div class="">
            <meta http-equiv="Content-Type" content="text/html;
              charset=windows-1252" class="">
            <div text="#000000" bgcolor="#FFFFFF" class=""> Matt,<br
                class="">
                Here is the process as it currently stands:<br class="">
              <br class="">
              1) I have a PETSc Vec (sol), which come from a KSPSolve<br
                class="">
              <br class="">
              2) Each processor grabs its section of sol via
              VecGetOwnershipRange and VecGetArrayReadF90<br class="">
              and inserts parts of its section of sol in a local array
              (locarr) using a complex but easily computable mapping.<br
                class="">
              <br class="">
              3) The routine you are looking at then exchanges various
              parts of the locarr between the processors.<br class="">
              <br class="">
            </div>
          </div>
        </blockquote>
        <div><br class="">
        </div>
        <div>You need a VecScatter object <a
href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate"
            class="" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate</a> </div>
        <br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div text="#000000" bgcolor="#FFFFFF" class=""> 4) Each
              processor then does computations using its updated locarr.<br
                class="">
              <br class="">
              Typing it out this way, I guess the answer to your
              question is "yes."  I have a global Vec and I want its
              values<br class="">
              sent in a complex but computable way to local vectors on
              each process.<br class="">
              <br class="">
              -sanjay<br class="">
              <div class="moz-cite-prefix">On 5/31/19 3:37 AM, Matthew
                Knepley wrote:<br class="">
              </div>
              <blockquote type="cite"
cite="mid:CAMYG4Gk_eccMW8e2k0DMZTxQcFcU+AqtUmM0UAgnaF=qFGCrdg@mail.gmail.com"
                class="">
                <meta http-equiv="content-type" content="text/html;
                  charset=windows-1252" class="">
                <div dir="ltr" class="">
                  <div dir="ltr" class="">On Thu, May 30, 2019 at 11:55
                    PM Sanjay Govindjee via petsc-users <<a
                      href="mailto:petsc-users@mcs.anl.gov"
                      moz-do-not-send="true" class="">petsc-users@mcs.anl.gov</a>>
                    wrote:<br class="">
                  </div>
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div bgcolor="#FFFFFF" class=""> Hi Juanchao,<br
                          class="">
                        Thanks for the hints below, they will take some
                        time to absorb as the vectors that are being 
                        moved around<br class="">
                        are actually partly petsc vectors and partly
                        local process vectors.<br class="">
                      </div>
                    </blockquote>
                    <div class=""><br class="">
                    </div>
                    <div class="">Is this code just doing a
                      global-to-local map? Meaning, does it just map all
                      the local unknowns to some global</div>
                    <div class="">unknown on some process? We have an
                      even simpler interface for that, where we make the
                      VecScatter</div>
                    <div class="">automatically,</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">  <a
href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate"
                        moz-do-not-send="true" class="">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate</a></div>
                    <div class=""><br class="">
                    </div>
                    <div class="">Then you can use it with Vecs, Mats,
                      etc.</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">  Thanks,</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">     Matt</div>
                    <div class=""> </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div bgcolor="#FFFFFF" class=""> Attached is the
                        modified routine that now works (on leaking
                        memory) with openmpi.<br class="">
                        <br class="">
                        -sanjay<br class="">
                        <div
                          class="gmail-m_-6089453002349408992moz-cite-prefix">On
                          5/30/19 8:41 PM, Zhang, Junchao wrote:<br
                            class="">
                        </div>
                        <blockquote type="cite" class="">
                          <div dir="ltr" class="">
                            <div class=""><br class="">
                              Hi, Sanjay,</div>
                            <div class="">  Could you send your modified
                              data exchange code (psetb.F) with
                              MPI_Waitall? See other inlined comments
                              below. Thanks.</div>
                            <br class="">
                            <div class="gmail_quote">
                              <div dir="ltr" class="gmail_attr">On Thu,
                                May 30, 2019 at 1:49 PM Sanjay Govindjee
                                via petsc-users <<a
                                  href="mailto:petsc-users@mcs.anl.gov"
                                  target="_blank" moz-do-not-send="true"
                                  class="">petsc-users@mcs.anl.gov</a>>
                                wrote:<br class="">
                              </div>
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex">
                                Lawrence,<br class="">
                                Thanks for taking a look!  This is what
                                I had been wondering about -- my <br
                                  class="">
                                knowledge of MPI is pretty minimal and<br
                                  class="">
                                this origins of the routine were from a
                                programmer we hired a decade+ <br
                                  class="">
                                back from NERSC.  I'll have to look into<br
                                  class="">
                                VecScatter.  It will be great to
                                dispense with our roll-your-own <br
                                  class="">
                                routines (we even have our own reduceALL
                                scattered around the code).<br class="">
                              </blockquote>
                              <div class="">Petsc VecScatter has a very
                                simple interface and you definitely
                                should go with.  With VecScatter, you
                                can think in familiar vectors and
                                indices instead of the low level
                                MPI_Send/Recv. Besides that, PETSc has
                                optimized VecScatter so that
                                communication is efficient.<br class="">
                              </div>
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex"> <br
                                  class="">
                                Interestingly, the MPI_WaitALL has
                                solved the problem when using OpenMPI <br
                                  class="">
                                but it still persists with MPICH. 
                                Graphs attached.<br class="">
                                I'm going to run with openmpi for now
                                (but I guess I really still need <br
                                  class="">
                                to figure out what is wrong with MPICH
                                and WaitALL;<br class="">
                                I'll try Barry's suggestion of <br
                                  class="">
--download-mpich-configure-arguments="--enable-error-messages=all <br
                                  class="">
                                --enable-g" later today and report
                                back).<br class="">
                                <br class="">
                                Regarding MPI_Barrier, it was put in due
                                a problem that some processes <br
                                  class="">
                                were finishing up sending and receiving
                                and exiting the subroutine<br class="">
                                before the receiving processes had
                                completed (which resulted in data <br
                                  class="">
                                loss as the buffers are freed after the
                                call to the routine). <br class="">
                                MPI_Barrier was the solution proposed<br
                                  class="">
                                to us.  I don't think I can dispense
                                with it, but will think about some <br
                                  class="">
                                more.</blockquote>
                              <div class="">After MPI_Send(), or after
                                MPI_Isend(..,req) and MPI_Wait(req), you
                                can safely free the send buffer without
                                worry that the receive has not
                                completed. MPI guarantees the receiver
                                can get the data, for example, through
                                internal buffering.</div>
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex"> <br
                                  class="">
                                I'm not so sure about using MPI_IRecv as
                                it will require a bit of <br class="">
                                rewriting since right now I process the
                                received<br class="">
                                data sequentially after each blocking
                                MPI_Recv -- clearly slower but <br
                                  class="">
                                easier to code.<br class="">
                                <br class="">
                                Thanks again for the help.<br class="">
                                <br class="">
                                -sanjay<br class="">
                                <br class="">
                                On 5/30/19 4:48 AM, Lawrence Mitchell
                                wrote:<br class="">
                                > Hi Sanjay,<br class="">
                                ><br class="">
                                >> On 30 May 2019, at 08:58,
                                Sanjay Govindjee via petsc-users <<a
                                  href="mailto:petsc-users@mcs.anl.gov"
                                  target="_blank" moz-do-not-send="true"
                                  class="">petsc-users@mcs.anl.gov</a>>
                                wrote:<br class="">
                                >><br class="">
                                >> The problem seems to persist
                                but with a different signature.  Graphs
                                attached as before.<br class="">
                                >><br class="">
                                >> Totals with MPICH (NB: single
                                run)<br class="">
                                >><br class="">
                                >> For the CG/Jacobi         
                                data_exchange_total = 41,385,984;
                                kspsolve_total = 38,289,408<br class="">
                                >> For the GMRES/BJACOBI     
                                data_exchange_total = 41,324,544;
                                kspsolve_total = 41,324,544<br class="">
                                >><br class="">
                                >> Just reading the MPI docs I am
                                wondering if I need some sort of
                                MPI_Wait/MPI_Waitall before my
                                MPI_Barrier in the data exchange
                                routine?<br class="">
                                >> I would have thought that with
                                the blocking receives and the
                                MPI_Barrier that everything will have
                                fully completed and cleaned up before<br
                                  class="">
                                >> all processes exited the
                                routine, but perhaps I am wrong on that.<br
                                  class="">
                                ><br class="">
                                > Skimming the fortran code you sent
                                you do:<br class="">
                                ><br class="">
                                > for i in ...:<br class="">
                                >     call MPI_Isend(..., req, ierr)<br
                                  class="">
                                ><br class="">
                                > for i in ...:<br class="">
                                >     call MPI_Recv(..., ierr)<br
                                  class="">
                                ><br class="">
                                > But you never call MPI_Wait on the
                                request you got back from the Isend. So
                                the MPI library will never free the data
                                structures it created.<br class="">
                                ><br class="">
                                > The usual pattern for these
                                non-blocking communications is to
                                allocate an array for the requests of
                                length nsend+nrecv and then do:<br
                                  class="">
                                ><br class="">
                                > for i in nsend:<br class="">
                                >     call MPI_Isend(..., req[i],
                                ierr)<br class="">
                                > for j in nrecv:<br class="">
                                >     call MPI_Irecv(...,
                                req[nsend+j], ierr)<br class="">
                                ><br class="">
                                > call MPI_Waitall(req, ..., ierr)<br
                                  class="">
                                ><br class="">
                                > I note also there's no need for the
                                Barrier at the end of the routine, this
                                kind of communication does neighbourwise
                                synchronisation, no need to add
                                (unnecessary) global synchronisation
                                too.<br class="">
                                ><br class="">
                                > As an aside, is there a reason you
                                don't use PETSc's VecScatter to manage
                                this global to local exchange?<br
                                  class="">
                                ><br class="">
                                > Cheers,<br class="">
                                ><br class="">
                                > Lawrence<br class="">
                                <br class="">
                              </blockquote>
                            </div>
                          </div>
                        </blockquote>
                        <br class="">
                      </div>
                    </blockquote>
                  </div>
                  <br class="" clear="all">
                  <div class=""><br class="">
                  </div>
                  -- <br class="">
                  <div dir="ltr" class="gmail_signature">
                    <div dir="ltr" class="">
                      <div class="">
                        <div dir="ltr" class="">
                          <div class="">
                            <div dir="ltr" class="">
                              <div class="">What most experimenters take
                                for granted before they begin their
                                experiments is infinitely more
                                interesting than any results to which
                                their experiments lead.<br class="">
                                -- Norbert Wiener</div>
                              <div class=""><br class="">
                              </div>
                              <div class=""><a
                                  href="http://www.cse.buffalo.edu/~knepley/"
                                  target="_blank" moz-do-not-send="true"
                                  class="">https://www.cse.buffalo.edu/~knepley/</a><br
                                  class="">
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </blockquote>
              <br class="">
            </div>
          </div>
        </blockquote>
      </div>
      <br class="">
    </blockquote>
    <br>
  </body>
</html>