<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html;

      charset=windows-1252">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Thanks Stefano.  <br>

    <br>

    Reading the manual pages a bit more carefully,<br>

    I think I can see what I should be doing.  Which should be roughly

    to <br>

    <br>

    1. Set up target Seq vectors on PETSC_COMM_SELF<br>

    2. Use ISCreateGeneral to create ISs for the target Vecs  and the

    source Vec which will be MPI on PETSC_COMM_WORLD.<br>

    3. Create the scatter context with VecScatterCreate<br>

    4. Call VecScatterBegin/End on each process (instead of using my

    prior routine).<br>

    <br>

    Lingering questions:<br>

    <br>

    a. Is there any performance advantage/disadvantage to creating a

    single parallel target Vec instead<br>

    of multiple target Seq Vecs (in terms of the scatter operation)?<br>

    <br>

    b. The data that ends up in the target on each processor needs to be

    in an application<br>

    array.  Is there a clever way to 'move' the data from the scatter

    target to the array (short<br>

    of just running a loop over it and copying)?<br>

    <br>

    -sanjay<br>

    <br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 5/31/19 12:02 PM, Stefano Zampini

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:ACE97360-FB5C-454E-B665-87265BA738E0@gmail.com">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <br class="">

      <div><br class="">

        <blockquote type="cite" class="">

          <div class="">On May 31, 2019, at 9:50 PM, Sanjay Govindjee

            via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov"

              class="" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>

            wrote:</div>

          <br class="Apple-interchange-newline">

          <div class="">

            <meta http-equiv="Content-Type" content="text/html;

              charset=windows-1252" class="">

            <div text="#000000" bgcolor="#FFFFFF" class=""> Matt,<br

                class="">

                Here is the process as it currently stands:<br class="">

              <br class="">

              1) I have a PETSc Vec (sol), which come from a KSPSolve<br

                class="">

              <br class="">

              2) Each processor grabs its section of sol via

              VecGetOwnershipRange and VecGetArrayReadF90<br class="">

              and inserts parts of its section of sol in a local array

              (locarr) using a complex but easily computable mapping.<br

                class="">

              <br class="">

              3) The routine you are looking at then exchanges various

              parts of the locarr between the processors.<br class="">

              <br class="">

            </div>

          </div>

        </blockquote>

        <div><br class="">

        </div>

        <div>You need a VecScatter object <a

href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate"

            class="" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate</a> </div>

        <br class="">

        <blockquote type="cite" class="">

          <div class="">

            <div text="#000000" bgcolor="#FFFFFF" class=""> 4) Each

              processor then does computations using its updated locarr.<br

                class="">

              <br class="">

              Typing it out this way, I guess the answer to your

              question is "yes."  I have a global Vec and I want its

              values<br class="">

              sent in a complex but computable way to local vectors on

              each process.<br class="">

              <br class="">

              -sanjay<br class="">

              <div class="moz-cite-prefix">On 5/31/19 3:37 AM, Matthew

                Knepley wrote:<br class="">

              </div>

              <blockquote type="cite"

cite="mid:CAMYG4Gk_eccMW8e2k0DMZTxQcFcU+AqtUmM0UAgnaF=qFGCrdg@mail.gmail.com"

                class="">

                <meta http-equiv="content-type" content="text/html;

                  charset=windows-1252" class="">

                <div dir="ltr" class="">

                  <div dir="ltr" class="">On Thu, May 30, 2019 at 11:55

                    PM Sanjay Govindjee via petsc-users <<a

                      href="mailto:petsc-users@mcs.anl.gov"

                      moz-do-not-send="true" class="">petsc-users@mcs.anl.gov</a>>

                    wrote:<br class="">

                  </div>

                  <div class="gmail_quote">

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">

                      <div bgcolor="#FFFFFF" class=""> Hi Juanchao,<br

                          class="">

                        Thanks for the hints below, they will take some

                        time to absorb as the vectors that are being 

                        moved around<br class="">

                        are actually partly petsc vectors and partly

                        local process vectors.<br class="">

                      </div>

                    </blockquote>

                    <div class=""><br class="">

                    </div>

                    <div class="">Is this code just doing a

                      global-to-local map? Meaning, does it just map all

                      the local unknowns to some global</div>

                    <div class="">unknown on some process? We have an

                      even simpler interface for that, where we make the

                      VecScatter</div>

                    <div class="">automatically,</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">  <a

href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate"

                        moz-do-not-send="true" class="">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate</a></div>

                    <div class=""><br class="">

                    </div>

                    <div class="">Then you can use it with Vecs, Mats,

                      etc.</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">  Thanks,</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">     Matt</div>

                    <div class=""> </div>

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">

                      <div bgcolor="#FFFFFF" class=""> Attached is the

                        modified routine that now works (on leaking

                        memory) with openmpi.<br class="">

                        <br class="">

                        -sanjay<br class="">

                        <div

                          class="gmail-m_-6089453002349408992moz-cite-prefix">On

                          5/30/19 8:41 PM, Zhang, Junchao wrote:<br

                            class="">

                        </div>

                        <blockquote type="cite" class="">

                          <div dir="ltr" class="">

                            <div class=""><br class="">

                              Hi, Sanjay,</div>

                            <div class="">  Could you send your modified

                              data exchange code (psetb.F) with

                              MPI_Waitall? See other inlined comments

                              below. Thanks.</div>

                            <br class="">

                            <div class="gmail_quote">

                              <div dir="ltr" class="gmail_attr">On Thu,

                                May 30, 2019 at 1:49 PM Sanjay Govindjee

                                via petsc-users <<a

                                  href="mailto:petsc-users@mcs.anl.gov"

                                  target="_blank" moz-do-not-send="true"

                                  class="">petsc-users@mcs.anl.gov</a>>

                                wrote:<br class="">

                              </div>

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

                                0.8ex;border-left:1px solid

                                rgb(204,204,204);padding-left:1ex">

                                Lawrence,<br class="">

                                Thanks for taking a look!  This is what

                                I had been wondering about -- my <br

                                  class="">

                                knowledge of MPI is pretty minimal and<br

                                  class="">

                                this origins of the routine were from a

                                programmer we hired a decade+ <br

                                  class="">

                                back from NERSC.  I'll have to look into<br

                                  class="">

                                VecScatter.  It will be great to

                                dispense with our roll-your-own <br

                                  class="">

                                routines (we even have our own reduceALL

                                scattered around the code).<br class="">

                              </blockquote>

                              <div class="">Petsc VecScatter has a very

                                simple interface and you definitely

                                should go with.  With VecScatter, you

                                can think in familiar vectors and

                                indices instead of the low level

                                MPI_Send/Recv. Besides that, PETSc has

                                optimized VecScatter so that

                                communication is efficient.<br class="">

                              </div>

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

                                0.8ex;border-left:1px solid

                                rgb(204,204,204);padding-left:1ex"> <br

                                  class="">

                                Interestingly, the MPI_WaitALL has

                                solved the problem when using OpenMPI <br

                                  class="">

                                but it still persists with MPICH. 

                                Graphs attached.<br class="">

                                I'm going to run with openmpi for now

                                (but I guess I really still need <br

                                  class="">

                                to figure out what is wrong with MPICH

                                and WaitALL;<br class="">

                                I'll try Barry's suggestion of <br

                                  class="">

--download-mpich-configure-arguments="--enable-error-messages=all <br

                                  class="">

                                --enable-g" later today and report

                                back).<br class="">

                                <br class="">

                                Regarding MPI_Barrier, it was put in due

                                a problem that some processes <br

                                  class="">

                                were finishing up sending and receiving

                                and exiting the subroutine<br class="">

                                before the receiving processes had

                                completed (which resulted in data <br

                                  class="">

                                loss as the buffers are freed after the

                                call to the routine). <br class="">

                                MPI_Barrier was the solution proposed<br

                                  class="">

                                to us.  I don't think I can dispense

                                with it, but will think about some <br

                                  class="">

                                more.</blockquote>

                              <div class="">After MPI_Send(), or after

                                MPI_Isend(..,req) and MPI_Wait(req), you

                                can safely free the send buffer without

                                worry that the receive has not

                                completed. MPI guarantees the receiver

                                can get the data, for example, through

                                internal buffering.</div>

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

                                0.8ex;border-left:1px solid

                                rgb(204,204,204);padding-left:1ex"> <br

                                  class="">

                                I'm not so sure about using MPI_IRecv as

                                it will require a bit of <br class="">

                                rewriting since right now I process the

                                received<br class="">

                                data sequentially after each blocking

                                MPI_Recv -- clearly slower but <br

                                  class="">

                                easier to code.<br class="">

                                <br class="">

                                Thanks again for the help.<br class="">

                                <br class="">

                                -sanjay<br class="">

                                <br class="">

                                On 5/30/19 4:48 AM, Lawrence Mitchell

                                wrote:<br class="">

                                > Hi Sanjay,<br class="">

                                ><br class="">

                                >> On 30 May 2019, at 08:58,

                                Sanjay Govindjee via petsc-users <<a

                                  href="mailto:petsc-users@mcs.anl.gov"

                                  target="_blank" moz-do-not-send="true"

                                  class="">petsc-users@mcs.anl.gov</a>>

                                wrote:<br class="">

                                >><br class="">

                                >> The problem seems to persist

                                but with a different signature.  Graphs

                                attached as before.<br class="">

                                >><br class="">

                                >> Totals with MPICH (NB: single

                                run)<br class="">

                                >><br class="">

                                >> For the CG/Jacobi         

                                data_exchange_total = 41,385,984;

                                kspsolve_total = 38,289,408<br class="">

                                >> For the GMRES/BJACOBI     

                                data_exchange_total = 41,324,544;

                                kspsolve_total = 41,324,544<br class="">

                                >><br class="">

                                >> Just reading the MPI docs I am

                                wondering if I need some sort of

                                MPI_Wait/MPI_Waitall before my

                                MPI_Barrier in the data exchange

                                routine?<br class="">

                                >> I would have thought that with

                                the blocking receives and the

                                MPI_Barrier that everything will have

                                fully completed and cleaned up before<br

                                  class="">

                                >> all processes exited the

                                routine, but perhaps I am wrong on that.<br

                                  class="">

                                ><br class="">

                                > Skimming the fortran code you sent

                                you do:<br class="">

                                ><br class="">

                                > for i in ...:<br class="">

                                >     call MPI_Isend(..., req, ierr)<br

                                  class="">

                                ><br class="">

                                > for i in ...:<br class="">

                                >     call MPI_Recv(..., ierr)<br

                                  class="">

                                ><br class="">

                                > But you never call MPI_Wait on the

                                request you got back from the Isend. So

                                the MPI library will never free the data

                                structures it created.<br class="">

                                ><br class="">

                                > The usual pattern for these

                                non-blocking communications is to

                                allocate an array for the requests of

                                length nsend+nrecv and then do:<br

                                  class="">

                                ><br class="">

                                > for i in nsend:<br class="">

                                >     call MPI_Isend(..., req[i],

                                ierr)<br class="">

                                > for j in nrecv:<br class="">

                                >     call MPI_Irecv(...,

                                req[nsend+j], ierr)<br class="">

                                ><br class="">

                                > call MPI_Waitall(req, ..., ierr)<br

                                  class="">

                                ><br class="">

                                > I note also there's no need for the

                                Barrier at the end of the routine, this

                                kind of communication does neighbourwise

                                synchronisation, no need to add

                                (unnecessary) global synchronisation

                                too.<br class="">

                                ><br class="">

                                > As an aside, is there a reason you

                                don't use PETSc's VecScatter to manage

                                this global to local exchange?<br

                                  class="">

                                ><br class="">

                                > Cheers,<br class="">

                                ><br class="">

                                > Lawrence<br class="">

                                <br class="">

                              </blockquote>

                            </div>

                          </div>

                        </blockquote>

                        <br class="">

                      </div>

                    </blockquote>

                  </div>

                  <br class="" clear="all">

                  <div class=""><br class="">

                  </div>

                  -- <br class="">

                  <div dir="ltr" class="gmail_signature">

                    <div dir="ltr" class="">

                      <div class="">

                        <div dir="ltr" class="">

                          <div class="">

                            <div dir="ltr" class="">

                              <div class="">What most experimenters take

                                for granted before they begin their

                                experiments is infinitely more

                                interesting than any results to which

                                their experiments lead.<br class="">

                                -- Norbert Wiener</div>

                              <div class=""><br class="">

                              </div>

                              <div class=""><a

                                  href="http://www.cse.buffalo.edu/~knepley/"

                                  target="_blank" moz-do-not-send="true"

                                  class="">https://www.cse.buffalo.edu/~knepley/</a><br

                                  class="">

                              </div>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </blockquote>

              <br class="">

            </div>

          </div>

        </blockquote>

      </div>

      <br class="">

    </blockquote>

    <br>

  </body>

</html>