[petsc-users] Memory growth issue

Sanjay Govindjee s_g at berkeley.edu
Fri May 31 13:50:25 CDT 2019


Matt,
   Here is the process as it currently stands:

1) I have a PETSc Vec (sol), which come from a KSPSolve

2) Each processor grabs its section of sol via VecGetOwnershipRange and 
VecGetArrayReadF90
and inserts parts of its section of sol in a local array (locarr) using 
a complex but easily computable mapping.

3) The routine you are looking at then exchanges various parts of the 
locarr between the processors.

4) Each processor then does computations using its updated locarr.

Typing it out this way, I guess the answer to your question is "yes."  I 
have a global Vec and I want its values
sent in a complex but computable way to local vectors on each process.

-sanjay

On 5/31/19 3:37 AM, Matthew Knepley wrote:
> On Thu, May 30, 2019 at 11:55 PM Sanjay Govindjee via petsc-users 
> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>
>     Hi Juanchao,
>     Thanks for the hints below, they will take some time to absorb as
>     the vectors that are being  moved around
>     are actually partly petsc vectors and partly local process vectors.
>
>
> Is this code just doing a global-to-local map? Meaning, does it just 
> map all the local unknowns to some global
> unknown on some process? We have an even simpler interface for that, 
> where we make the VecScatter
> automatically,
>
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate
>
> Then you can use it with Vecs, Mats, etc.
>
>   Thanks,
>
>      Matt
>
>     Attached is the modified routine that now works (on leaking
>     memory) with openmpi.
>
>     -sanjay
>     On 5/30/19 8:41 PM, Zhang, Junchao wrote:
>>
>>     Hi, Sanjay,
>>       Could you send your modified data exchange code (psetb.F) with
>>     MPI_Waitall? See other inlined comments below. Thanks.
>>
>>     On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users
>>     <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>
>>         Lawrence,
>>         Thanks for taking a look!  This is what I had been wondering
>>         about -- my
>>         knowledge of MPI is pretty minimal and
>>         this origins of the routine were from a programmer we hired a
>>         decade+
>>         back from NERSC.  I'll have to look into
>>         VecScatter.  It will be great to dispense with our roll-your-own
>>         routines (we even have our own reduceALL scattered around the
>>         code).
>>
>>     Petsc VecScatter has a very simple interface and you definitely
>>     should go with.  With VecScatter, you can think in familiar
>>     vectors and indices instead of the low level MPI_Send/Recv.
>>     Besides that, PETSc has optimized VecScatter so that
>>     communication is efficient.
>>
>>
>>         Interestingly, the MPI_WaitALL has solved the problem when
>>         using OpenMPI
>>         but it still persists with MPICH.  Graphs attached.
>>         I'm going to run with openmpi for now (but I guess I really
>>         still need
>>         to figure out what is wrong with MPICH and WaitALL;
>>         I'll try Barry's suggestion of
>>         --download-mpich-configure-arguments="--enable-error-messages=all
>>
>>         --enable-g" later today and report back).
>>
>>         Regarding MPI_Barrier, it was put in due a problem that some
>>         processes
>>         were finishing up sending and receiving and exiting the
>>         subroutine
>>         before the receiving processes had completed (which resulted
>>         in data
>>         loss as the buffers are freed after the call to the routine).
>>         MPI_Barrier was the solution proposed
>>         to us.  I don't think I can dispense with it, but will think
>>         about some
>>         more.
>>
>>     After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req),
>>     you can safely free the send buffer without worry that the
>>     receive has not completed. MPI guarantees the receiver can get
>>     the data, for example, through internal buffering.
>>
>>
>>         I'm not so sure about using MPI_IRecv as it will require a
>>         bit of
>>         rewriting since right now I process the received
>>         data sequentially after each blocking MPI_Recv -- clearly
>>         slower but
>>         easier to code.
>>
>>         Thanks again for the help.
>>
>>         -sanjay
>>
>>         On 5/30/19 4:48 AM, Lawrence Mitchell wrote:
>>         > Hi Sanjay,
>>         >
>>         >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users
>>         <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>         >>
>>         >> The problem seems to persist but with a different
>>         signature.  Graphs attached as before.
>>         >>
>>         >> Totals with MPICH (NB: single run)
>>         >>
>>         >> For the CG/Jacobi data_exchange_total = 41,385,984;
>>         kspsolve_total = 38,289,408
>>         >> For the GMRES/BJACOBI data_exchange_total = 41,324,544;
>>         kspsolve_total = 41,324,544
>>         >>
>>         >> Just reading the MPI docs I am wondering if I need some
>>         sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the
>>         data exchange routine?
>>         >> I would have thought that with the blocking receives and
>>         the MPI_Barrier that everything will have fully completed and
>>         cleaned up before
>>         >> all processes exited the routine, but perhaps I am wrong
>>         on that.
>>         >
>>         > Skimming the fortran code you sent you do:
>>         >
>>         > for i in ...:
>>         >     call MPI_Isend(..., req, ierr)
>>         >
>>         > for i in ...:
>>         >     call MPI_Recv(..., ierr)
>>         >
>>         > But you never call MPI_Wait on the request you got back
>>         from the Isend. So the MPI library will never free the data
>>         structures it created.
>>         >
>>         > The usual pattern for these non-blocking communications is
>>         to allocate an array for the requests of length nsend+nrecv
>>         and then do:
>>         >
>>         > for i in nsend:
>>         >     call MPI_Isend(..., req[i], ierr)
>>         > for j in nrecv:
>>         >     call MPI_Irecv(..., req[nsend+j], ierr)
>>         >
>>         > call MPI_Waitall(req, ..., ierr)
>>         >
>>         > I note also there's no need for the Barrier at the end of
>>         the routine, this kind of communication does neighbourwise
>>         synchronisation, no need to add (unnecessary) global
>>         synchronisation too.
>>         >
>>         > As an aside, is there a reason you don't use PETSc's
>>         VecScatter to manage this global to local exchange?
>>         >
>>         > Cheers,
>>         >
>>         > Lawrence
>>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190531/4d200967/attachment-0001.html>


More information about the petsc-users mailing list