[petsc-users] KSPSolve errors on blues
Mark Adams
mfadams at lbl.gov
Wed Jun 15 18:13:30 CDT 2016
And to diagnose GAMG you want to run with -info and grep on GAMG, and send
that to us.
On Fri, Jun 10, 2016 at 4:45 PM, Xujun Zhao <xzhao99 at gmail.com> wrote:
>
>
> On Fri, Jun 10, 2016 at 2:53 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> > On Jun 10, 2016, at 2:04 PM, Xujun Zhao <xzhao99 at gmail.com> wrote:
>> >
>> > Using gamg makes the convergence super slow (I don't know why)
>> >
>> > --->test in StokesSolver::solve(): Start the KSP solve...
>> > 0 KSP Residual norm 5.077511671295e+01
>> > 1 KSP Residual norm 2.902020348172e+01
>> > 2 KSP Residual norm 2.034415002085e+01
>> > 3 KSP Residual norm 1.525150176412e+01
>> > 4 KSP Residual norm 1.332688121023e+01
>> > 5 KSP Residual norm 1.178298922349e+01
>> > ..................................
>> > 28 KSP Residual norm 1.300489085085e+00
>> > 29 KSP Residual norm 1.200176938138e+00
>> > 30 KSP Residual norm 1.087425787854e+00
>> >
>> >
>> >
>> > In addition, I found the process memory is less than PetscMalloc().
>> Should the process memory usage be the PETScMalloc + others?
>>
>> Yes, normally process would be more than what was PetscMalloced. The
>> case when PetscMalloc is smaller would be when some of the space that was
>> PetscMalloced but was never used (say for example I allocate an array with
>> 10 million entries but only access the first 100 entries; memory pages are
>> created on-demand in unix so no pages would be allocated for most of the
>> array hence the process memory would be less than the PetscMalloced memory.)
>>
>> So what conclusions did you reach from this
>>
>> 1) hypre is using a lot of memory more than GAMG? (only look at process
>> memory) Is this confirmed?
>>
> In this example, yes.
>
>
>>
>> 2) gamg is converging very poorly compared to hypre? This is not good,
>> can you run with -fieldsplit_0_ksp_view_mat and send the resulting binary
>> file to petsc-maint at mcs.anl.gov so we can see if we can figure out why
>> hypre is doing much better than gamg on the matrix.
>>
> Again yes, only for this specific example and pc setup, especially when
> fine mesh is used. Will do.
>
>>
>> 3) somewhere in gamg it is allocating much larger memory regions then is
>> needed. This is actually not surprising since with sparse matrix algorithms
>> such as R A R^T one doesn't know in advance the memory needed and has to
>> make upper bound estimates.
>>
>> Barry
>>
>>
>>
>>
>>
>>
>> > So the former should be more than the later.
>> >
>> > Summary of Memory Usage in PETSc
>> > Maximum (over computational time) process memory: total
>> 1.4316e+10 max 3.7933e+09 min 3.4463e+09
>> > Current process memory: total
>> 1.4316e+10 max 3.7933e+09 min 3.4463e+09
>> > Maximum (over computational time) space PetscMalloc()ed: total
>> 6.5987e+10 max 1.7018e+10 min 1.5901e+10
>> > Current space PetscMalloc()ed: total
>> 3.0957e+05 max 7.7392e+04 min 7.7392e+04
>> >
>> > On Fri, Jun 10, 2016 at 12:07 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > > On Jun 10, 2016, at 11:43 AM, Xujun Zhao <xzhao99 at gmail.com> wrote:
>> > >
>> > > I found that this was caused by using -fieldsplit_0_pc_type hypre,
>> which ate a lot of memory during the KSPSolve().
>> > > Does hypre use BoomerAMG as the default pc type?
>> >
>> > Yes
>> >
>> > > I am curious why it uses such a huge memory. PETSs allocated mem is
>> about 22G, but the total process memory is up to 100G. So it looks like the
>> additional 70+G is associated with hypre, which is more than 3 times of
>> Petsc matrices and vectors!
>> >
>> > You can try -fieldsplit_0_pc_type gamg and see how that goes
>> memory-wise. The memory used by GAMG will be listed as PETSc allocated
>> memory.
>> >
>> > Barry
>> >
>> > >
>> > > -Xujun
>> > >
>> > > On Wed, Jun 8, 2016 at 4:29 PM, Xujun Zhao <xzhao99 at gmail.com> wrote:
>> > > OK, this makes sense.
>> > > Now my libMesh code use SerialMesh, which keeps a copy on each
>> processor, although the operation is parallelized. So it requires more
>> memory if multiple CPUs are used. This may be a potential culprit. But I
>> suppose the 60x60x60 mesh data(all second order) shouldn't be so large....
>> there may be some other bugs
>> > >
>> > > On Wed, Jun 8, 2016 at 4:18 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> > >
>> > > > On Jun 8, 2016, at 4:08 PM, Xujun Zhao <xzhao99 at gmail.com> wrote:
>> > > >
>> > > > Barry,
>> > > >
>> > > > Thank you. I am testing on the blues.
>> > > > btw, what do they mean for the different types of memory usage? for
>> example, below is the summary of mem usage for 60x60x60(with 2.9M dofs).
>> The max process memory is 3 times as the max space PetscMalloc()ed.
>> > >
>> > > PetscMalloced is basically the PETSc data structures; Process
>> memory is size of the program plus PETSc malloced space plus space
>> allocated by any other library, in this case libMesh. It looks like libMesh
>> is requiring a lot of space?
>> > >
>> > > Barry
>> > >
>> > > >
>> > > > Summary of Memory Usage in PETSc
>> > > > Maximum (over computational time) process memory: total
>> 1.0930e+11 max 5.6928e+10 min 5.2376e+10
>> > > > Current process memory:
>> total 3.1762e+09 max 2.8804e+09 min 2.9583e+08
>> > > > Maximum (over computational time) space PetscMalloc()ed: total
>> 3.0071e+10 max 1.5286e+10 min 1.4785e+10
>> > > > Current space PetscMalloc()ed:
>> total 1.5453e+05 max 7.7264e+04 min 7.7264e+04
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Jun 8, 2016 at 1:40 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> > > >
>> > > > > On Jun 8, 2016, at 1:30 PM, Xujun Zhao <xzhao99 at gmail.com> wrote:
>> > > > >
>> > > > > A quick test of a smaller system on my laptop with 25x25x25 mesh
>> gives the following info.
>> > > > > The memory used keeps increasing from 1 to 3 CPUs, but slightly
>> decreases with 4 CPUs.
>> > > >
>> > > > yes this does not look problematic
>> > > >
>> > > > > On the other hand, 60x60x60 mesh (2.9M dofs) is also not a big
>> system...
>> > > >
>> > > > True.
>> > > >
>> > > > I think you need to run the 60 60 60 system also on 1 2 4 and 8
>> processes to see how the memory trends. I don't think we should eliminate
>> memory as the culprit yet.
>> > > >
>> > > >
>> > > > Barry
>> > > >
>> > > > >
>> > > > >
>> > > > > ------------------------------------------------- 1 CPU
>> -------------------------------------------------
>> > > > > Summary of Memory Usage in PETSc
>> > > > > Maximum (over computational time) process memory: total
>> 4.7054e+09 max 4.7054e+09 min 4.7054e+09
>> > > > > Current process memory:
>> total 4.7054e+09 max 4.7054e+09 min 4.7054e+09
>> > > > > Maximum (over computational time) space PetscMalloc()ed: total
>> 1.6151e+09 max 1.6151e+09 min 1.6151e+09
>> > > > > Current space PetscMalloc()ed:
>> total 7.7232e+04 max 7.7232e+04 min 7.7232e+04
>> > > > >
>> > > > > ------------------------------------------------- 2 CPU
>> -------------------------------------------------
>> > > > > Summary of Memory Usage in PETSc
>> > > > > Maximum (over computational time) process memory: total
>> 6.2389e+09 max 3.1275e+09 min 3.1113e+09
>> > > > > Current process memory:
>> total 6.2389e+09 max 3.1275e+09 min 3.1113e+09
>> > > > > Maximum (over computational time) space PetscMalloc()ed: total
>> 2.1589e+09 max 1.1193e+09 min 1.0397e+09
>> > > > > Current space PetscMalloc()ed:
>> total 1.5446e+05 max 7.7232e+04 min 7.7232e+04
>> > > > >
>> > > > > ------------------------------------------------- 3 CPU
>> -------------------------------------------------
>> > > > > Summary of Memory Usage in PETSc
>> > > > > Maximum (over computational time) process memory: total
>> 7.7116e+09 max 1.9572e+09 min 1.8715e+09
>> > > > > Current process memory:
>> total 7.7116e+09 max 1.9572e+09 min 1.8715e+09
>> > > > > Maximum (over computational time) space PetscMalloc()ed: total
>> 2.1754e+09 max 5.8450e+08 min 5.0516e+08
>> > > > > Current space PetscMalloc()ed:
>> total 3.0893e+05 max 7.7232e+04 min 7.7232e+04
>> > > > >
>> > > > > ------------------------------------------------- 4 CPU
>> -------------------------------------------------
>> > > > > Summary of Memory Usage in PETSc
>> > > > > Maximum (over computational time) process memory: total
>> 7.1188e+09 max 2.4651e+09 min 2.2909e+09
>> > > > > Current process memory:
>> total 7.1188e+09 max 2.4651e+09 min 2.2909e+09
>> > > > > Maximum (over computational time) space PetscMalloc()ed: total
>> 2.1750e+09 max 7.6982e+08 min 6.5289e+08
>> > > > > Current space PetscMalloc()ed:
>> total 2.3170e+05 max 7.7232e+04 min 7.7232e+04
>> > > > >
>> > > > >
>> > > > > On Wed, Jun 8, 2016 at 11:51 AM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> > > > >
>> > > > > Signal 9 SIGKILL on batch systems usually means the process
>> was killed because it ran out of time or ran out of memory.
>> > > > >
>> > > > > Perhaps there is something in the code that is unscalable and
>> requires more more memory with more processes. You can run on 1 2 and 3
>> processes and measure the memory usage to see if it goes up with the number
>> of processes using for example -memory_view
>> > > > >
>> > > > > Barry
>> > > > >
>> > > > >
>> > > > >
>> > > > > > On Jun 8, 2016, at 11:41 AM, Xujun Zhao <xzhao99 at gmail.com>
>> wrote:
>> > > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > I am running a FE Stokes Solver with schur complement type PC
>> on blues. The program runs well when mesh is 40X40X40 (0.88M dofs), but
>> when I use 60X60X60 mesh, the program crashes and gives out some errors,
>> which looks like a segmentation fault. The "strange" thing is that it runs
>> well with 1CPU, 2CPUs, but fails on 4 or 8 CPUs. The log files are also
>> attached. It seems like the global matrix and vector are assembled well,
>> and errors come out before calling the KSPSolve().
>> > > > > >
>> > > > > > btw, I use the recent PETSc 3.7 dbg version. for libMesh I use
>> both dbg and opt version, but none of those can give useful information.
>> Has anyone met such situations before? Many thinks.
>> > > > > > <ex01_validation_test.o1386940><ex01_validation_test.o1387007>
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> > >
>> > >
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160615/d04e78c1/attachment-0001.html>
More information about the petsc-users
mailing list