<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Thanks Stefano. <br>
<br>
Reading the manual pages a bit more carefully,<br>
I think I can see what I should be doing. Which should be roughly
to <br>
<br>
1. Set up target Seq vectors on PETSC_COMM_SELF<br>
2. Use ISCreateGeneral to create ISs for the target Vecs and the
source Vec which will be MPI on PETSC_COMM_WORLD.<br>
3. Create the scatter context with VecScatterCreate<br>
4. Call VecScatterBegin/End on each process (instead of using my
prior routine).<br>
<br>
Lingering questions:<br>
<br>
a. Is there any performance advantage/disadvantage to creating a
single parallel target Vec instead<br>
of multiple target Seq Vecs (in terms of the scatter operation)?<br>
<br>
b. The data that ends up in the target on each processor needs to be
in an application<br>
array. Is there a clever way to 'move' the data from the scatter
target to the array (short<br>
of just running a loop over it and copying)?<br>
<br>
-sanjay<br>
<br>
<br>
<br>
<div class="moz-cite-prefix">On 5/31/19 12:02 PM, Stefano Zampini
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:ACE97360-FB5C-454E-B665-87265BA738E0@gmail.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On May 31, 2019, at 9:50 PM, Sanjay Govindjee
via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov"
class="" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252" class="">
<div text="#000000" bgcolor="#FFFFFF" class=""> Matt,<br
class="">
Here is the process as it currently stands:<br class="">
<br class="">
1) I have a PETSc Vec (sol), which come from a KSPSolve<br
class="">
<br class="">
2) Each processor grabs its section of sol via
VecGetOwnershipRange and VecGetArrayReadF90<br class="">
and inserts parts of its section of sol in a local array
(locarr) using a complex but easily computable mapping.<br
class="">
<br class="">
3) The routine you are looking at then exchanges various
parts of the locarr between the processors.<br class="">
<br class="">
</div>
</div>
</blockquote>
<div><br class="">
</div>
<div>You need a VecScatter object <a
href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate"
class="" moz-do-not-send="true">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate</a> </div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class=""> 4) Each
processor then does computations using its updated locarr.<br
class="">
<br class="">
Typing it out this way, I guess the answer to your
question is "yes." I have a global Vec and I want its
values<br class="">
sent in a complex but computable way to local vectors on
each process.<br class="">
<br class="">
-sanjay<br class="">
<div class="moz-cite-prefix">On 5/31/19 3:37 AM, Matthew
Knepley wrote:<br class="">
</div>
<blockquote type="cite"
cite="mid:CAMYG4Gk_eccMW8e2k0DMZTxQcFcU+AqtUmM0UAgnaF=qFGCrdg@mail.gmail.com"
class="">
<meta http-equiv="content-type" content="text/html;
charset=windows-1252" class="">
<div dir="ltr" class="">
<div dir="ltr" class="">On Thu, May 30, 2019 at 11:55
PM Sanjay Govindjee via petsc-users <<a
href="mailto:petsc-users@mcs.anl.gov"
moz-do-not-send="true" class="">petsc-users@mcs.anl.gov</a>>
wrote:<br class="">
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF" class=""> Hi Juanchao,<br
class="">
Thanks for the hints below, they will take some
time to absorb as the vectors that are being
moved around<br class="">
are actually partly petsc vectors and partly
local process vectors.<br class="">
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Is this code just doing a
global-to-local map? Meaning, does it just map all
the local unknowns to some global</div>
<div class="">unknown on some process? We have an
even simpler interface for that, where we make the
VecScatter</div>
<div class="">automatically,</div>
<div class=""><br class="">
</div>
<div class=""> <a
href="https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate"
moz-do-not-send="true" class="">https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate</a></div>
<div class=""><br class="">
</div>
<div class="">Then you can use it with Vecs, Mats,
etc.</div>
<div class=""><br class="">
</div>
<div class=""> Thanks,</div>
<div class=""><br class="">
</div>
<div class=""> Matt</div>
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF" class=""> Attached is the
modified routine that now works (on leaking
memory) with openmpi.<br class="">
<br class="">
-sanjay<br class="">
<div
class="gmail-m_-6089453002349408992moz-cite-prefix">On
5/30/19 8:41 PM, Zhang, Junchao wrote:<br
class="">
</div>
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class=""><br class="">
Hi, Sanjay,</div>
<div class=""> Could you send your modified
data exchange code (psetb.F) with
MPI_Waitall? See other inlined comments
below. Thanks.</div>
<br class="">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu,
May 30, 2019 at 1:49 PM Sanjay Govindjee
via petsc-users <<a
href="mailto:petsc-users@mcs.anl.gov"
target="_blank" moz-do-not-send="true"
class="">petsc-users@mcs.anl.gov</a>>
wrote:<br class="">
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Lawrence,<br class="">
Thanks for taking a look! This is what
I had been wondering about -- my <br
class="">
knowledge of MPI is pretty minimal and<br
class="">
this origins of the routine were from a
programmer we hired a decade+ <br
class="">
back from NERSC. I'll have to look into<br
class="">
VecScatter. It will be great to
dispense with our roll-your-own <br
class="">
routines (we even have our own reduceALL
scattered around the code).<br class="">
</blockquote>
<div class="">Petsc VecScatter has a very
simple interface and you definitely
should go with. With VecScatter, you
can think in familiar vectors and
indices instead of the low level
MPI_Send/Recv. Besides that, PETSc has
optimized VecScatter so that
communication is efficient.<br class="">
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"> <br
class="">
Interestingly, the MPI_WaitALL has
solved the problem when using OpenMPI <br
class="">
but it still persists with MPICH.
Graphs attached.<br class="">
I'm going to run with openmpi for now
(but I guess I really still need <br
class="">
to figure out what is wrong with MPICH
and WaitALL;<br class="">
I'll try Barry's suggestion of <br
class="">
--download-mpich-configure-arguments="--enable-error-messages=all <br
class="">
--enable-g" later today and report
back).<br class="">
<br class="">
Regarding MPI_Barrier, it was put in due
a problem that some processes <br
class="">
were finishing up sending and receiving
and exiting the subroutine<br class="">
before the receiving processes had
completed (which resulted in data <br
class="">
loss as the buffers are freed after the
call to the routine). <br class="">
MPI_Barrier was the solution proposed<br
class="">
to us. I don't think I can dispense
with it, but will think about some <br
class="">
more.</blockquote>
<div class="">After MPI_Send(), or after
MPI_Isend(..,req) and MPI_Wait(req), you
can safely free the send buffer without
worry that the receive has not
completed. MPI guarantees the receiver
can get the data, for example, through
internal buffering.</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"> <br
class="">
I'm not so sure about using MPI_IRecv as
it will require a bit of <br class="">
rewriting since right now I process the
received<br class="">
data sequentially after each blocking
MPI_Recv -- clearly slower but <br
class="">
easier to code.<br class="">
<br class="">
Thanks again for the help.<br class="">
<br class="">
-sanjay<br class="">
<br class="">
On 5/30/19 4:48 AM, Lawrence Mitchell
wrote:<br class="">
> Hi Sanjay,<br class="">
><br class="">
>> On 30 May 2019, at 08:58,
Sanjay Govindjee via petsc-users <<a
href="mailto:petsc-users@mcs.anl.gov"
target="_blank" moz-do-not-send="true"
class="">petsc-users@mcs.anl.gov</a>>
wrote:<br class="">
>><br class="">
>> The problem seems to persist
but with a different signature. Graphs
attached as before.<br class="">
>><br class="">
>> Totals with MPICH (NB: single
run)<br class="">
>><br class="">
>> For the CG/Jacobi
data_exchange_total = 41,385,984;
kspsolve_total = 38,289,408<br class="">
>> For the GMRES/BJACOBI
data_exchange_total = 41,324,544;
kspsolve_total = 41,324,544<br class="">
>><br class="">
>> Just reading the MPI docs I am
wondering if I need some sort of
MPI_Wait/MPI_Waitall before my
MPI_Barrier in the data exchange
routine?<br class="">
>> I would have thought that with
the blocking receives and the
MPI_Barrier that everything will have
fully completed and cleaned up before<br
class="">
>> all processes exited the
routine, but perhaps I am wrong on that.<br
class="">
><br class="">
> Skimming the fortran code you sent
you do:<br class="">
><br class="">
> for i in ...:<br class="">
> call MPI_Isend(..., req, ierr)<br
class="">
><br class="">
> for i in ...:<br class="">
> call MPI_Recv(..., ierr)<br
class="">
><br class="">
> But you never call MPI_Wait on the
request you got back from the Isend. So
the MPI library will never free the data
structures it created.<br class="">
><br class="">
> The usual pattern for these
non-blocking communications is to
allocate an array for the requests of
length nsend+nrecv and then do:<br
class="">
><br class="">
> for i in nsend:<br class="">
> call MPI_Isend(..., req[i],
ierr)<br class="">
> for j in nrecv:<br class="">
> call MPI_Irecv(...,
req[nsend+j], ierr)<br class="">
><br class="">
> call MPI_Waitall(req, ..., ierr)<br
class="">
><br class="">
> I note also there's no need for the
Barrier at the end of the routine, this
kind of communication does neighbourwise
synchronisation, no need to add
(unnecessary) global synchronisation
too.<br class="">
><br class="">
> As an aside, is there a reason you
don't use PETSc's VecScatter to manage
this global to local exchange?<br
class="">
><br class="">
> Cheers,<br class="">
><br class="">
> Lawrence<br class="">
<br class="">
</blockquote>
</div>
</div>
</blockquote>
<br class="">
</div>
</blockquote>
</div>
<br class="" clear="all">
<div class=""><br class="">
</div>
-- <br class="">
<div dir="ltr" class="gmail_signature">
<div dir="ltr" class="">
<div class="">
<div dir="ltr" class="">
<div class="">
<div dir="ltr" class="">
<div class="">What most experimenters take
for granted before they begin their
experiments is infinitely more
interesting than any results to which
their experiments lead.<br class="">
-- Norbert Wiener</div>
<div class=""><br class="">
</div>
<div class=""><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true"
class="">https://www.cse.buffalo.edu/~knepley/</a><br
class="">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</blockquote>
<br>
</body>
</html>