<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 10, 2016 at 2:53 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
> On Jun 10, 2016, at 2:04 PM, Xujun Zhao <<a href="mailto:xzhao99@gmail.com">xzhao99@gmail.com</a>> wrote:<br>
><br>
> Using gamg makes the convergence super slow (I don't know why)<br>
><br>
> --->test in StokesSolver::solve(): Start the KSP solve...<br>
> 0 KSP Residual norm 5.077511671295e+01<br>
> 1 KSP Residual norm 2.902020348172e+01<br>
> 2 KSP Residual norm 2.034415002085e+01<br>
> 3 KSP Residual norm 1.525150176412e+01<br>
> 4 KSP Residual norm 1.332688121023e+01<br>
> 5 KSP Residual norm 1.178298922349e+01<br>
> ..................................<br>
> 28 KSP Residual norm 1.300489085085e+00<br>
> 29 KSP Residual norm 1.200176938138e+00<br>
> 30 KSP Residual norm 1.087425787854e+00<br>
><br>
><br>
><br>
> In addition, I found the process memory is less than PetscMalloc(). Should the process memory usage be the PETScMalloc + others?<br>
<br>
</span> Yes, normally process would be more than what was PetscMalloced. The case when PetscMalloc is smaller would be when some of the space that was PetscMalloced but was never used (say for example I allocate an array with 10 million entries but only access the first 100 entries; memory pages are created on-demand in unix so no pages would be allocated for most of the array hence the process memory would be less than the PetscMalloced memory.)<br>
<br>
So what conclusions did you reach from this<br>
<br>
1) hypre is using a lot of memory more than GAMG? (only look at process memory) Is this confirmed?<br></blockquote><div>In this example, yes.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
2) gamg is converging very poorly compared to hypre? This is not good, can you run with -fieldsplit_0_ksp_view_mat and send the resulting binary file to <a href="mailto:petsc-maint@mcs.anl.gov">petsc-maint@mcs.anl.gov</a> so we can see if we can figure out why hypre is doing much better than gamg on the matrix.<br></blockquote><div>Again yes, only for this specific example and pc setup, especially when fine mesh is used. Will do.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
3) somewhere in gamg it is allocating much larger memory regions then is needed. This is actually not surprising since with sparse matrix algorithms such as R A R^T one doesn't know in advance the memory needed and has to make upper bound estimates.<br>
<span class="HOEnZb"><font color="#888888"><br>
Barry<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
<br>
<br>
<br>
<br>
> So the former should be more than the later.<br>
><br>
> Summary of Memory Usage in PETSc<br>
> Maximum (over computational time) process memory: total 1.4316e+10 max 3.7933e+09 min 3.4463e+09<br>
> Current process memory: total 1.4316e+10 max 3.7933e+09 min 3.4463e+09<br>
> Maximum (over computational time) space PetscMalloc()ed: total 6.5987e+10 max 1.7018e+10 min 1.5901e+10<br>
> Current space PetscMalloc()ed: total 3.0957e+05 max 7.7392e+04 min 7.7392e+04<br>
><br>
> On Fri, Jun 10, 2016 at 12:07 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
><br>
> > On Jun 10, 2016, at 11:43 AM, Xujun Zhao <<a href="mailto:xzhao99@gmail.com">xzhao99@gmail.com</a>> wrote:<br>
> ><br>
> > I found that this was caused by using -fieldsplit_0_pc_type hypre, which ate a lot of memory during the KSPSolve().<br>
> > Does hypre use BoomerAMG as the default pc type?<br>
><br>
> Yes<br>
><br>
> > I am curious why it uses such a huge memory. PETSs allocated mem is about 22G, but the total process memory is up to 100G. So it looks like the additional 70+G is associated with hypre, which is more than 3 times of Petsc matrices and vectors!<br>
><br>
> You can try -fieldsplit_0_pc_type gamg and see how that goes memory-wise. The memory used by GAMG will be listed as PETSc allocated memory.<br>
><br>
> Barry<br>
><br>
> ><br>
> > -Xujun<br>
> ><br>
> > On Wed, Jun 8, 2016 at 4:29 PM, Xujun Zhao <<a href="mailto:xzhao99@gmail.com">xzhao99@gmail.com</a>> wrote:<br>
> > OK, this makes sense.<br>
> > Now my libMesh code use SerialMesh, which keeps a copy on each processor, although the operation is parallelized. So it requires more memory if multiple CPUs are used. This may be a potential culprit. But I suppose the 60x60x60 mesh data(all second order) shouldn't be so large.... there may be some other bugs<br>
> ><br>
> > On Wed, Jun 8, 2016 at 4:18 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> ><br>
> > > On Jun 8, 2016, at 4:08 PM, Xujun Zhao <<a href="mailto:xzhao99@gmail.com">xzhao99@gmail.com</a>> wrote:<br>
> > ><br>
> > > Barry,<br>
> > ><br>
> > > Thank you. I am testing on the blues.<br>
> > > btw, what do they mean for the different types of memory usage? for example, below is the summary of mem usage for 60x60x60(with 2.9M dofs). The max process memory is 3 times as the max space PetscMalloc()ed.<br>
> ><br>
> > PetscMalloced is basically the PETSc data structures; Process memory is size of the program plus PETSc malloced space plus space allocated by any other library, in this case libMesh. It looks like libMesh is requiring a lot of space?<br>
> ><br>
> > Barry<br>
> ><br>
> > ><br>
> > > Summary of Memory Usage in PETSc<br>
> > > Maximum (over computational time) process memory: total 1.0930e+11 max 5.6928e+10 min 5.2376e+10<br>
> > > Current process memory: total 3.1762e+09 max 2.8804e+09 min 2.9583e+08<br>
> > > Maximum (over computational time) space PetscMalloc()ed: total 3.0071e+10 max 1.5286e+10 min 1.4785e+10<br>
> > > Current space PetscMalloc()ed: total 1.5453e+05 max 7.7264e+04 min 7.7264e+04<br>
> > ><br>
> > ><br>
> > ><br>
> > > On Wed, Jun 8, 2016 at 1:40 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> > ><br>
> > > > On Jun 8, 2016, at 1:30 PM, Xujun Zhao <<a href="mailto:xzhao99@gmail.com">xzhao99@gmail.com</a>> wrote:<br>
> > > ><br>
> > > > A quick test of a smaller system on my laptop with 25x25x25 mesh gives the following info.<br>
> > > > The memory used keeps increasing from 1 to 3 CPUs, but slightly decreases with 4 CPUs.<br>
> > ><br>
> > > yes this does not look problematic<br>
> > ><br>
> > > > On the other hand, 60x60x60 mesh (2.9M dofs) is also not a big system...<br>
> > ><br>
> > > True.<br>
> > ><br>
> > > I think you need to run the 60 60 60 system also on 1 2 4 and 8 processes to see how the memory trends. I don't think we should eliminate memory as the culprit yet.<br>
> > ><br>
> > ><br>
> > > Barry<br>
> > ><br>
> > > ><br>
> > > ><br>
> > > > ------------------------------------------------- 1 CPU -------------------------------------------------<br>
> > > > Summary of Memory Usage in PETSc<br>
> > > > Maximum (over computational time) process memory: total 4.7054e+09 max 4.7054e+09 min 4.7054e+09<br>
> > > > Current process memory: total 4.7054e+09 max 4.7054e+09 min 4.7054e+09<br>
> > > > Maximum (over computational time) space PetscMalloc()ed: total 1.6151e+09 max 1.6151e+09 min 1.6151e+09<br>
> > > > Current space PetscMalloc()ed: total 7.7232e+04 max 7.7232e+04 min 7.7232e+04<br>
> > > ><br>
> > > > ------------------------------------------------- 2 CPU -------------------------------------------------<br>
> > > > Summary of Memory Usage in PETSc<br>
> > > > Maximum (over computational time) process memory: total 6.2389e+09 max 3.1275e+09 min 3.1113e+09<br>
> > > > Current process memory: total 6.2389e+09 max 3.1275e+09 min 3.1113e+09<br>
> > > > Maximum (over computational time) space PetscMalloc()ed: total 2.1589e+09 max 1.1193e+09 min 1.0397e+09<br>
> > > > Current space PetscMalloc()ed: total 1.5446e+05 max 7.7232e+04 min 7.7232e+04<br>
> > > ><br>
> > > > ------------------------------------------------- 3 CPU -------------------------------------------------<br>
> > > > Summary of Memory Usage in PETSc<br>
> > > > Maximum (over computational time) process memory: total 7.7116e+09 max 1.9572e+09 min 1.8715e+09<br>
> > > > Current process memory: total 7.7116e+09 max 1.9572e+09 min 1.8715e+09<br>
> > > > Maximum (over computational time) space PetscMalloc()ed: total 2.1754e+09 max 5.8450e+08 min 5.0516e+08<br>
> > > > Current space PetscMalloc()ed: total 3.0893e+05 max 7.7232e+04 min 7.7232e+04<br>
> > > ><br>
> > > > ------------------------------------------------- 4 CPU -------------------------------------------------<br>
> > > > Summary of Memory Usage in PETSc<br>
> > > > Maximum (over computational time) process memory: total 7.1188e+09 max 2.4651e+09 min 2.2909e+09<br>
> > > > Current process memory: total 7.1188e+09 max 2.4651e+09 min 2.2909e+09<br>
> > > > Maximum (over computational time) space PetscMalloc()ed: total 2.1750e+09 max 7.6982e+08 min 6.5289e+08<br>
> > > > Current space PetscMalloc()ed: total 2.3170e+05 max 7.7232e+04 min 7.7232e+04<br>
> > > ><br>
> > > ><br>
> > > > On Wed, Jun 8, 2016 at 11:51 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> > > ><br>
> > > > Signal 9 SIGKILL on batch systems usually means the process was killed because it ran out of time or ran out of memory.<br>
> > > ><br>
> > > > Perhaps there is something in the code that is unscalable and requires more more memory with more processes. You can run on 1 2 and 3 processes and measure the memory usage to see if it goes up with the number of processes using for example -memory_view<br>
> > > ><br>
> > > > Barry<br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > > On Jun 8, 2016, at 11:41 AM, Xujun Zhao <<a href="mailto:xzhao99@gmail.com">xzhao99@gmail.com</a>> wrote:<br>
> > > > ><br>
> > > > > Hi all,<br>
> > > > ><br>
> > > > > I am running a FE Stokes Solver with schur complement type PC on blues. The program runs well when mesh is 40X40X40 (0.88M dofs), but when I use 60X60X60 mesh, the program crashes and gives out some errors, which looks like a segmentation fault. The "strange" thing is that it runs well with 1CPU, 2CPUs, but fails on 4 or 8 CPUs. The log files are also attached. It seems like the global matrix and vector are assembled well, and errors come out before calling the KSPSolve().<br>
> > > > ><br>
> > > > > btw, I use the recent PETSc 3.7 dbg version. for libMesh I use both dbg and opt version, but none of those can give useful information. Has anyone met such situations before? Many thinks.<br>
> > > > > <ex01_validation_test.o1386940><ex01_validation_test.o1387007><br>
> > > ><br>
> > > ><br>
> > ><br>
> > ><br>
> ><br>
> ><br>
> ><br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div></div>