<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body>


<div dir="ltr">


<div>Sanjay,</div>


I tried petsc with MPICH and OpenMPI on my Macbook. I inserted PetscMemoryGetCurrentUsage/PetscMallocGetCurrentUsage at the beginning and end of KSPSolve and then computed the delta and summed over processes. Then I tested with src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c


<div>With OpenMPI, <br>


</div>


<div>


<div>mpirun -n 4 ./ex5 -da_grid_x 128 -da_grid_y 128 -ts_type beuler -ts_max_steps 500 > 128.log<br>


</div>


<div>grep -n -v "RSS Delta=         0, Malloc Delta=         0" 128.log<br>


</div>


<div>1:RSS Delta=     69632, Malloc Delta=         0<br>


2:RSS Delta=     69632, Malloc Delta=         0<br>


3:RSS Delta=     69632, Malloc Delta=         0<br>


4:RSS Delta=     69632, Malloc Delta=         0<br>


9:RSS Delta=9.25286e+06, Malloc Delta=         0<br>


22:RSS Delta=     49152, Malloc Delta=         0<br>


44:RSS Delta=     20480, Malloc Delta=         0<br>


53:RSS Delta=     49152, Malloc Delta=         0<br>


66:RSS Delta=      4096, Malloc Delta=         0<br>


97:RSS Delta=     16384, Malloc Delta=         0<br>


119:RSS Delta=     20480, Malloc Delta=         0<br>


141:RSS Delta=     53248, Malloc Delta=         0<br>


176:RSS Delta=     16384, Malloc Delta=         0<br>


308:RSS Delta=     16384, Malloc Delta=         0<br>


352:RSS Delta=     16384, Malloc Delta=         0<br>


550:RSS Delta=     16384, Malloc Delta=         0<br>


572:RSS Delta=     16384, Malloc Delta=         0<br>


669:RSS Delta=     40960, Malloc Delta=         0<br>


924:RSS Delta=     32768, Malloc Delta=         0<br>


1694:RSS Delta=     20480, Malloc Delta=         0<br>


2099:RSS Delta=     16384, Malloc Delta=         0<br>


2244:RSS Delta=     20480, Malloc Delta=         0<br>


3001:RSS Delta=     16384, Malloc Delta=         0<br>


5883:RSS Delta=     16384, Malloc Delta=         0<br>


</div>


<div><br>


</div>


<div>If I increased the grid</div>


<div>mpirun -n 4 ./ex5 -da_grid_x 512 -da_grid_y 512 -ts_type beuler -ts_max_steps 500 -malloc_test >512.log</div>


<div>


<div>grep -n -v "RSS Delta=         0, Malloc Delta=         0" 512.log<br>


<div>1:RSS Delta=1.05267e+06, Malloc Delta=         0<br>


2:RSS Delta=1.05267e+06, Malloc Delta=         0<br>


3:RSS Delta=1.05267e+06, Malloc Delta=         0<br>


4:RSS Delta=1.05267e+06, Malloc Delta=         0<br>


13:RSS Delta=1.24932e+08, Malloc Delta=         0</div>


<div><br>


So we did see RSS increase in 4k-page sizes after KSPSolve. As long as no memory leaks, why do you care about it? Is it because you run out of memory?</div>


</div>


</div>


</div>


</div>


<br>


<div class="gmail_quote">


<div dir="ltr" class="gmail_attr">On Thu, May 30, 2019 at 1:59 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>


</div>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<br>


   Thanks for the update. So the current conclusions are that using the Waitall in your code<br>


<br>


1) solves the memory issue with OpenMPI in your code<br>


<br>


2) does not solve the memory issue with PETSc KSPSolve <br>


<br>


3) MPICH has memory issues both for your code and PETSc KSPSolve (despite) the wait all fix?<br>


<br>


If you literately just comment out the call to KSPSolve() with OpenMPI is there no growth in memory usage?<br>


<br>


<br>


Both 2 and 3 are concerning, indicate possible memory leak bugs in MPICH and not freeing all MPI resources in KSPSolve()<br>


<br>


Junchao, can you please investigate 2 and 3 with, for example, a TS example that uses the linear solver (like with -ts_type beuler)? Thanks<br>


<br>


<br>


  Barry<br>


<br>


<br>


<br>


> On May 30, 2019, at 1:47 PM, Sanjay Govindjee <<a href="mailto:s_g@berkeley.edu" target="_blank">s_g@berkeley.edu</a>> wrote:<br>


> <br>


> Lawrence,<br>


> Thanks for taking a look!  This is what I had been wondering about -- my knowledge of MPI is pretty minimal and<br>


> this origins of the routine were from a programmer we hired a decade+ back from NERSC.  I'll have to look into<br>


> VecScatter.  It will be great to dispense with our roll-your-own routines (we even have our own reduceALL scattered around the code).<br>


> <br>


> Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but it still persists with MPICH.  Graphs attached.<br>


> I'm going to run with openmpi for now (but I guess I really still need to figure out what is wrong with MPICH and WaitALL;<br>


> I'll try Barry's suggestion of --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" later today and report back).<br>


> <br>


> Regarding MPI_Barrier, it was put in due a problem that some processes were finishing up sending and receiving and exiting the subroutine<br>


> before the receiving processes had completed (which resulted in data loss as the buffers are freed after the call to the routine). MPI_Barrier was the solution proposed<br>


> to us.  I don't think I can dispense with it, but will think about some more.<br>


> <br>


> I'm not so sure about using MPI_IRecv as it will require a bit of rewriting since right now I process the received<br>


> data sequentially after each blocking MPI_Recv -- clearly slower but easier to code.<br>


> <br>


> Thanks again for the help.<br>


> <br>


> -sanjay<br>


> <br>


> On 5/30/19 4:48 AM, Lawrence Mitchell wrote:<br>


>> Hi Sanjay,<br>


>> <br>


>>> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>


>>> <br>


>>> The problem seems to persist but with a different signature.  Graphs attached as before.<br>


>>> <br>


>>> Totals with MPICH (NB: single run)<br>


>>> <br>


>>> For the CG/Jacobi          data_exchange_total = 41,385,984; kspsolve_total = 38,289,408<br>


>>> For the GMRES/BJACOBI      data_exchange_total = 41,324,544; kspsolve_total = 41,324,544<br>


>>> <br>


>>> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine?<br>


>>> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before<br>


>>> all processes exited the routine, but perhaps I am wrong on that.<br>


>> <br>


>> Skimming the fortran code you sent you do:<br>


>> <br>


>> for i in ...:<br>


>>    call MPI_Isend(..., req, ierr)<br>


>> <br>


>> for i in ...:<br>


>>    call MPI_Recv(..., ierr)<br>


>> <br>


>> But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created.<br>


>> <br>


>> The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do:<br>


>> <br>


>> for i in nsend:<br>


>>    call MPI_Isend(..., req[i], ierr)<br>


>> for j in nrecv:<br>


>>    call MPI_Irecv(..., req[nsend+j], ierr)<br>


>> <br>


>> call MPI_Waitall(req, ..., ierr)<br>


>> <br>


>> I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too.<br>


>> <br>


>> As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange?<br>


>> <br>


>> Cheers,<br>


>> <br>


>> Lawrence<br>


> <br>


> <cg_mpichwall.png><cg_wall.png><gmres_mpichwall.png><gmres_wall.png><br>


<br>


</blockquote>


</div>


</body>


</html>