memory problem at parallel on a linux cluster
li pan
li76pan at yahoo.com
Wed Sep 5 02:10:33 CDT 2007
hi Matt,
I'm using libmesh. So I have no idea how the values
were set. Before, I was connecting several computers
in my office. And I didn't have this problem.
Recently, I tried to install all libraries to a linux
cluster. And I've got this problem. I don't know why.
mpdtrace shows all the connected nodes I want. The
only one difference is, all the nodes are mounted to a
headnode. In my office I didn't use mount.
Could this be the reason?
thanx
pan
--- Matthew Knepley <knepley at gmail.com> wrote:
> Are you trying to set all the values from a single
> processor?
>
> Matt
>
> On 9/4/07, li pan <li76pan at yahoo.com> wrote:
> > Dear all,
> > I recently installed Petsc on a linux cluster and
> > tried to solve a linear equation in parallel way.
> I
> > used 3D Hex mesh. Mesh dimension is 181, 181, 41.
> The
> > number of Dofs are 1343201.
> > In serial run, there was no problem. But at
> parallel
> > run, there was memory allocation problem.
> >
>
-----------------------------------------------------------------------
> > [0]PETSC ERROR: PetscMallocAlign() line 62 in
> > src/sys/src/memory/mal.c
> > [0]PETSC ERROR: Out of memory. This could be due
> to
> > allocating
> > [0]PETSC ERROR: too large an object or bleeding by
> not
> > properly
> > [0]PETSC ERROR: destroying unneeded objects.
> > [3]PETSC ERROR: MatSetValues() line 702 in
> > src/mat/interface/matrix.c
> > [3]PETSC ERROR: User provided function() line 312
> in
> > unknowndirectory/src/numerics/petsc_matrix.C
> > [cli_3]: aborting job:
> > application called MPI_Abort(comm=0x84000000, 55)
> -
> > process 3
> > [0]PETSC ERROR: Memory allocated 865987336 Memory
> used
> > by process 1591005184
> > [0]PETSC ERROR: Try running with -malloc_dump or
> > -malloc_log for info.
> > [0]PETSC ERROR: Memory requested 1310720296!
> > [0]PETSC ERROR: PetscTrMallocDefault() line 188 in
> > src/sys/src/memory/mtr.c
> > [0]PETSC ERROR: MatStashExpand_Private() line 240
> in
> > src/mat/utils/matstash.c
> > [0]PETSC ERROR: MatStashValuesRow_Private() line
> 276
> > in src/mat/utils/matstash.c
> > [0]PETSC ERROR: MatSetValues_MPIAIJ() line 199 in
> > src/mat/impls/aij/mpi/mpiaij.c
> > [0]PETSC ERROR: MatSetValues() line 702 in
> > src/mat/interface/matrix.c
> > [0]PETSC ERROR: User provided function() line 312
> in
> > unknowndirectory/src/numerics/petsc_matrix.C
> > [cli_0]: aborting job:
> > application called MPI_Abort(comm=0x84000000, 55)
> -
> > process 0
> > rank 3 in job 1 hpc16_44261 caused collective
> abort
> > of all ranks
> > exit status of rank 3: return code 55
> >
> >
> > I checked memory on all the nodes. Each of them
> has
> > more than 2.5 GB before program starts.
> > What could be the reason?
> >
> > thanx
> >
> > pan
> >
> >
> >
> >
> >
> >
>
____________________________________________________________________________________
> > Building a website is a piece of cake. Yahoo!
> Small Business gives you all the tools to get
> online.
> > http://smallbusiness.yahoo.com/webhosting
> >
> >
>
>
> --
> What most experimenters take for granted before they
> begin their
> experiments is infinitely more interesting than any
> results to which
> their experiments lead.
> -- Norbert Wiener
>
>
____________________________________________________________________________________
Sick sense of humor? Visit Yahoo! TV's
Comedy with an Edge to see what's on, when.
http://tv.yahoo.com/collections/222
More information about the petsc-users
mailing list