memory problem at parallel on a linux cluster

li pan li76pan at yahoo.com
Wed Sep 5 06:36:47 CDT 2007


hi Matt,
I found the error. It's my problem. I used a serial
code. My program tried to write value for other
process. As you said, off-process values. After I
corrected the code to parallel mode, it works now.
Sorry for wasting your time. :-)

thanx

pan


--- Matthew Knepley <knepley at gmail.com> wrote:

>   Here is the trace:
> 
> 0]PETSC ERROR: Memory allocated 865987336 Memory
> used
> by process 1591005184
> [0]PETSC ERROR: Try running with -malloc_dump or
> -malloc_log for info.
> [0]PETSC ERROR: Memory requested 1310720296!
> [0]PETSC ERROR: PetscTrMallocDefault() line 188 in
> src/sys/src/memory/mtr.c
> [0]PETSC ERROR: MatStashExpand_Private() line 240 in
> src/mat/utils/matstash.c
> [0]PETSC ERROR: MatStashValuesRow_Private() line 276
> in src/mat/utils/matstash.c
> [0]PETSC ERROR: MatSetValues_MPIAIJ() line 199 in
> src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: MatSetValues() line 702 in
> src/mat/interface/matrix.c
> [0]PETSC ERROR: User provided function() line 312 in
> unknowndirectory/src/numerics/petsc_matrix.C
> 
> So, you did not write petsc_matrix? What is
> happening here is
> off-processor values are being set with
> MatSetValues(). That means
> we have to stash them. This is not inherently bad,
> but the stash space
> grows so large that memory on the node is exhausted.
> This is very rare
> with a PDE problem on a mesh. That what leads me to
> think that too many
> values are being generated on a single proc.
> 
>    Matt
> 
> On 9/5/07, li pan <li76pan at yahoo.com> wrote:
> > hi Matt,
> > I'm using libmesh. So I have no idea how the
> values
> > were set. Before, I was connecting several
> computers
> > in my office. And I didn't have this problem.
> > Recently, I tried to install all libraries to a
> linux
> > cluster. And I've got this problem. I don't know
> why.
> > mpdtrace shows all the connected nodes I want. The
> > only one difference is, all the nodes are mounted
> to a
> > headnode. In my office I didn't use mount.
> > Could this be the reason?
> >
> > thanx
> >
> > pan
> >
> >
> > --- Matthew Knepley <knepley at gmail.com> wrote:
> >
> > > Are you trying to set all the values from a
> single
> > > processor?
> > >
> > >   Matt
> > >
> > > On 9/4/07, li pan <li76pan at yahoo.com> wrote:
> > > > Dear all,
> > > > I recently installed Petsc on a linux cluster
> and
> > > > tried to solve a linear equation in parallel
> way.
> > > I
> > > > used 3D Hex mesh. Mesh dimension is 181, 181,
> 41.
> > > The
> > > > number of Dofs are 1343201.
> > > > In serial run, there was no problem. But at
> > > parallel
> > > > run, there was memory allocation problem.
> > > >
> > >
> >
>
-----------------------------------------------------------------------
> > > > [0]PETSC ERROR: PetscMallocAlign() line 62 in
> > > > src/sys/src/memory/mal.c
> > > > [0]PETSC ERROR: Out of memory. This could be
> due
> > > to
> > > > allocating
> > > > [0]PETSC ERROR: too large an object or
> bleeding by
> > > not
> > > > properly
> > > > [0]PETSC ERROR: destroying unneeded objects.
> > > > [3]PETSC ERROR: MatSetValues() line 702 in
> > > > src/mat/interface/matrix.c
> > > > [3]PETSC ERROR: User provided function() line
> 312
> > > in
> > > > unknowndirectory/src/numerics/petsc_matrix.C
> > > > [cli_3]: aborting job:
> > > > application called MPI_Abort(comm=0x84000000,
> 55)
> > > -
> > > > process 3
> > > > [0]PETSC ERROR: Memory allocated 865987336
> Memory
> > > used
> > > > by process 1591005184
> > > > [0]PETSC ERROR: Try running with -malloc_dump
> or
> > > > -malloc_log for info.
> > > > [0]PETSC ERROR: Memory requested 1310720296!
> > > > [0]PETSC ERROR: PetscTrMallocDefault() line
> 188 in
> > > > src/sys/src/memory/mtr.c
> > > > [0]PETSC ERROR: MatStashExpand_Private() line
> 240
> > > in
> > > > src/mat/utils/matstash.c
> > > > [0]PETSC ERROR: MatStashValuesRow_Private()
> line
> > > 276
> > > > in src/mat/utils/matstash.c
> > > > [0]PETSC ERROR: MatSetValues_MPIAIJ() line 199
> in
> > > > src/mat/impls/aij/mpi/mpiaij.c
> > > > [0]PETSC ERROR: MatSetValues() line 702 in
> > > > src/mat/interface/matrix.c
> > > > [0]PETSC ERROR: User provided function() line
> 312
> > > in
> > > > unknowndirectory/src/numerics/petsc_matrix.C
> > > > [cli_0]: aborting job:
> > > > application called MPI_Abort(comm=0x84000000,
> 55)
> > > -
> > > > process 0
> > > > rank 3 in job 1  hpc16_44261   caused
> collective
> > > abort
> > > > of all ranks
> > > >   exit status of rank 3: return code 55
> > > >
> > > >
> > > > I checked memory on all the nodes. Each of
> them
> > > has
> > > > more than  2.5 GB before program starts.
> > > > What could be the reason?
> > > >
> > > > thanx
> > > >
> > > > pan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>
____________________________________________________________________________________
> > > > Building a website is a piece of cake. Yahoo!
> > > Small Business gives you all the tools to get
> > > online.
> > > > http://smallbusiness.yahoo.com/webhosting
> > > >
> > > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before
> they
> > > begin their
> > > experiments is infinitely more interesting than
> any
> > > results to which
> > > their experiments lead.
> > > -- Norbert Wiener
> > >
> > >
> >
> >
> >
> >
> >
>
____________________________________________________________________________________
> > Sick sense of humor? Visit Yahoo! TV's
> > Comedy with an Edge to see what's on, when.
> > http://tv.yahoo.com/collections/222
> >
> >
> 
> 
> -- 
> What most experimenters take for granted before they
> begin their
> experiments is infinitely more interesting than any
> results to which
> their experiments lead.
> 
=== message truncated ===



       
____________________________________________________________________________________Ready for the edge of your seat? 
Check out tonight's top picks on Yahoo! TV. 
http://tv.yahoo.com/




More information about the petsc-users mailing list