memory problem at parallel on a linux cluster

Matthew Knepley knepley at gmail.com
Wed Sep 5 06:11:17 CDT 2007


  Here is the trace:

0]PETSC ERROR: Memory allocated 865987336 Memory used
by process 1591005184
[0]PETSC ERROR: Try running with -malloc_dump or
-malloc_log for info.
[0]PETSC ERROR: Memory requested 1310720296!
[0]PETSC ERROR: PetscTrMallocDefault() line 188 in
src/sys/src/memory/mtr.c
[0]PETSC ERROR: MatStashExpand_Private() line 240 in
src/mat/utils/matstash.c
[0]PETSC ERROR: MatStashValuesRow_Private() line 276
in src/mat/utils/matstash.c
[0]PETSC ERROR: MatSetValues_MPIAIJ() line 199 in
src/mat/impls/aij/mpi/mpiaij.c
[0]PETSC ERROR: MatSetValues() line 702 in
src/mat/interface/matrix.c
[0]PETSC ERROR: User provided function() line 312 in
unknowndirectory/src/numerics/petsc_matrix.C

So, you did not write petsc_matrix? What is happening here is
off-processor values are being set with MatSetValues(). That means
we have to stash them. This is not inherently bad, but the stash space
grows so large that memory on the node is exhausted. This is very rare
with a PDE problem on a mesh. That what leads me to think that too many
values are being generated on a single proc.

   Matt

On 9/5/07, li pan <li76pan at yahoo.com> wrote:
> hi Matt,
> I'm using libmesh. So I have no idea how the values
> were set. Before, I was connecting several computers
> in my office. And I didn't have this problem.
> Recently, I tried to install all libraries to a linux
> cluster. And I've got this problem. I don't know why.
> mpdtrace shows all the connected nodes I want. The
> only one difference is, all the nodes are mounted to a
> headnode. In my office I didn't use mount.
> Could this be the reason?
>
> thanx
>
> pan
>
>
> --- Matthew Knepley <knepley at gmail.com> wrote:
>
> > Are you trying to set all the values from a single
> > processor?
> >
> >   Matt
> >
> > On 9/4/07, li pan <li76pan at yahoo.com> wrote:
> > > Dear all,
> > > I recently installed Petsc on a linux cluster and
> > > tried to solve a linear equation in parallel way.
> > I
> > > used 3D Hex mesh. Mesh dimension is 181, 181, 41.
> > The
> > > number of Dofs are 1343201.
> > > In serial run, there was no problem. But at
> > parallel
> > > run, there was memory allocation problem.
> > >
> >
> -----------------------------------------------------------------------
> > > [0]PETSC ERROR: PetscMallocAlign() line 62 in
> > > src/sys/src/memory/mal.c
> > > [0]PETSC ERROR: Out of memory. This could be due
> > to
> > > allocating
> > > [0]PETSC ERROR: too large an object or bleeding by
> > not
> > > properly
> > > [0]PETSC ERROR: destroying unneeded objects.
> > > [3]PETSC ERROR: MatSetValues() line 702 in
> > > src/mat/interface/matrix.c
> > > [3]PETSC ERROR: User provided function() line 312
> > in
> > > unknowndirectory/src/numerics/petsc_matrix.C
> > > [cli_3]: aborting job:
> > > application called MPI_Abort(comm=0x84000000, 55)
> > -
> > > process 3
> > > [0]PETSC ERROR: Memory allocated 865987336 Memory
> > used
> > > by process 1591005184
> > > [0]PETSC ERROR: Try running with -malloc_dump or
> > > -malloc_log for info.
> > > [0]PETSC ERROR: Memory requested 1310720296!
> > > [0]PETSC ERROR: PetscTrMallocDefault() line 188 in
> > > src/sys/src/memory/mtr.c
> > > [0]PETSC ERROR: MatStashExpand_Private() line 240
> > in
> > > src/mat/utils/matstash.c
> > > [0]PETSC ERROR: MatStashValuesRow_Private() line
> > 276
> > > in src/mat/utils/matstash.c
> > > [0]PETSC ERROR: MatSetValues_MPIAIJ() line 199 in
> > > src/mat/impls/aij/mpi/mpiaij.c
> > > [0]PETSC ERROR: MatSetValues() line 702 in
> > > src/mat/interface/matrix.c
> > > [0]PETSC ERROR: User provided function() line 312
> > in
> > > unknowndirectory/src/numerics/petsc_matrix.C
> > > [cli_0]: aborting job:
> > > application called MPI_Abort(comm=0x84000000, 55)
> > -
> > > process 0
> > > rank 3 in job 1  hpc16_44261   caused collective
> > abort
> > > of all ranks
> > >   exit status of rank 3: return code 55
> > >
> > >
> > > I checked memory on all the nodes. Each of them
> > has
> > > more than  2.5 GB before program starts.
> > > What could be the reason?
> > >
> > > thanx
> > >
> > > pan
> > >
> > >
> > >
> > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > Building a website is a piece of cake. Yahoo!
> > Small Business gives you all the tools to get
> > online.
> > > http://smallbusiness.yahoo.com/webhosting
> > >
> > >
> >
> >
> > --
> > What most experimenters take for granted before they
> > begin their
> > experiments is infinitely more interesting than any
> > results to which
> > their experiments lead.
> > -- Norbert Wiener
> >
> >
>
>
>
>
> ____________________________________________________________________________________
> Sick sense of humor? Visit Yahoo! TV's
> Comedy with an Edge to see what's on, when.
> http://tv.yahoo.com/collections/222
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener




More information about the petsc-users mailing list