[petsc-users] KSPSolve errors on blues

Xujun Zhao xzhao99 at gmail.com
Wed Jun 8 16:08:47 CDT 2016


Barry,

Thank you. I am testing on the blues.
btw, what do they mean for the different types of memory usage? for
example, below is the summary of mem usage for 60x60x60(with 2.9M dofs).
The max process memory is 3 times as the max space PetscMalloc()ed.

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:        total 1.0930e+11
max 5.6928e+10 min 5.2376e+10
Current process memory:
total 3.1762e+09 max 2.8804e+09 min 2.9583e+08
Maximum (over computational time) space PetscMalloc()ed: total 3.0071e+10
max 1.5286e+10 min 1.4785e+10
Current space PetscMalloc()ed:
total 1.5453e+05 max 7.7264e+04 min 7.7264e+04



On Wed, Jun 8, 2016 at 1:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jun 8, 2016, at 1:30 PM, Xujun Zhao <xzhao99 at gmail.com> wrote:
> >
> > A quick test of a smaller system on my laptop with 25x25x25 mesh gives
> the following info.
> > The memory used keeps increasing from 1 to 3 CPUs, but slightly
> decreases with 4 CPUs.
>
>    yes this does not look problematic
>
> > On the other hand, 60x60x60 mesh (2.9M dofs) is also not a big system...
>
>   True.
>
>    I think you need to run the 60 60 60 system also on 1 2 4 and 8
> processes to see how the memory trends. I don't think we should eliminate
> memory as the culprit yet.
>
>
>   Barry
>
> >
> >
> > ------------------------------------------------- 1 CPU
> -------------------------------------------------
> > Summary of Memory Usage in PETSc
> > Maximum (over computational time) process memory:        total
> 4.7054e+09 max 4.7054e+09 min 4.7054e+09
> > Current process memory:
>    total 4.7054e+09 max 4.7054e+09 min 4.7054e+09
> > Maximum (over computational time) space PetscMalloc()ed: total
> 1.6151e+09 max 1.6151e+09 min 1.6151e+09
> > Current space PetscMalloc()ed:
>  total 7.7232e+04 max 7.7232e+04 min 7.7232e+04
> >
> > ------------------------------------------------- 2 CPU
> -------------------------------------------------
> > Summary of Memory Usage in PETSc
> > Maximum (over computational time) process memory:        total
> 6.2389e+09 max 3.1275e+09 min 3.1113e+09
> > Current process memory:
>    total 6.2389e+09 max 3.1275e+09 min 3.1113e+09
> > Maximum (over computational time) space PetscMalloc()ed: total
> 2.1589e+09 max 1.1193e+09 min 1.0397e+09
> > Current space PetscMalloc()ed:
>  total 1.5446e+05 max 7.7232e+04 min 7.7232e+04
> >
> > ------------------------------------------------- 3 CPU
> -------------------------------------------------
> > Summary of Memory Usage in PETSc
> > Maximum (over computational time) process memory:        total
> 7.7116e+09 max 1.9572e+09 min 1.8715e+09
> > Current process memory:
>    total 7.7116e+09 max 1.9572e+09 min 1.8715e+09
> > Maximum (over computational time) space PetscMalloc()ed: total
> 2.1754e+09 max 5.8450e+08 min 5.0516e+08
> > Current space PetscMalloc()ed:
>  total 3.0893e+05 max 7.7232e+04 min 7.7232e+04
> >
> > ------------------------------------------------- 4 CPU
> -------------------------------------------------
> > Summary of Memory Usage in PETSc
> > Maximum (over computational time) process memory:        total
> 7.1188e+09 max 2.4651e+09 min 2.2909e+09
> > Current process memory:
>    total 7.1188e+09 max 2.4651e+09 min 2.2909e+09
> > Maximum (over computational time) space PetscMalloc()ed: total
> 2.1750e+09 max 7.6982e+08 min 6.5289e+08
> > Current space PetscMalloc()ed:
>  total 2.3170e+05 max 7.7232e+04 min 7.7232e+04
> >
> >
> > On Wed, Jun 8, 2016 at 11:51 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    Signal 9 SIGKILL on batch systems usually means the process was
> killed because it ran out of time or ran out of memory.
> >
> >     Perhaps there is something in the code that is unscalable and
> requires more more memory with more processes. You can run on 1 2 and 3
> processes and measure the memory usage to see if it goes up with the number
> of processes using for example -memory_view
> >
> >   Barry
> >
> >
> >
> > > On Jun 8, 2016, at 11:41 AM, Xujun Zhao <xzhao99 at gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > I am running a FE Stokes Solver with schur complement type PC on
> blues. The program runs well when mesh is 40X40X40 (0.88M dofs), but when I
> use 60X60X60 mesh, the program crashes and gives out some errors, which
> looks like a segmentation fault. The "strange" thing is that it runs well
> with 1CPU, 2CPUs, but fails on 4 or 8 CPUs. The log files are also
> attached. It seems like the global matrix and vector are assembled well,
> and errors come out before calling the KSPSolve().
> > >
> > > btw, I use the recent PETSc 3.7 dbg version. for libMesh I use both
> dbg and opt version, but none of those can give useful information. Has
> anyone met such situations before? Many thinks.
> > > <ex01_validation_test.o1386940><ex01_validation_test.o1387007>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160608/44a6c3c1/attachment-0001.html>


More information about the petsc-users mailing list