[petsc-users] Out of memory during MatAssemblyBegin

Raeth, Peter PRaeth at hpti.com
Tue Jan 25 07:36:45 CST 2011


The matrix resides on disk. It was generated by a single-process program. Its purpose is for comparing those results with those generated by a PETSc-based multi-process program. The current approach works well for small and medium-sized matrices but not for the large matrix. 

What I can do is let each process determine which rows it holds locally. Then each process can read its rows and populate its part of the matrix. Just a bit more code. Not a big problem.

Thank you very much Barry for your input. Let me assure you that I have no intention of faking or hacking.  :)   This project is too important to our transition from shared-memory machines. (See http://www.afrl.hpc.mil/hardware/hawk.php.)


Best,

Peter.

Peter G. Raeth, Ph.D.
Senior Staff Scientist
Signal and Image Processing
High Performance Technologies, Inc
937-904-5147
praeth at hpti.com

________________________________________
From: petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] on behalf of Barry Smith [bsmith at mcs.anl.gov]
Sent: Monday, January 24, 2011 4:23 PM
To: PETSc users list
Subject: Re: [petsc-users] Out of memory during MatAssemblyBegin

On Jan 24, 2011, at 3:08 PM, Raeth, Peter wrote:

> Am running out of memory while using MatAssemblyBegin on a dense matrix that spans several processors. My calculations show that the matrices I am using do not require more than 25% of available memory.
>
> Different about this matrix compared to the others is that the program runs out of memory after the matrix has been populated by a single process, rather than by multiple processes. Used MatSetValues. Since the values are held in cache until MatAssemblyEnd is called (as I understand things), is it possible that using one process to populate the entire matrix is causing this problem?


   Yes, absolutely, this is a terrible non-scalable way of filling a parallel matrix. You can fake it by calling MatAssemblyBegin/End() repeatedly with the flag MAT_FLUSH_ASSEMBLY to keep the stash from getting too big. But you really need a much better way of setting values into the matrix. How are these "brought in row by row" matrix entries generated?

   Barry


> The data is brought in only row by row for the population process. All buffer memory is cleared before the call to MatAssemblyBegin.
>
> The error dump contains:
>
> mpirun -prefix [%g]   -np 256 Peter.x
> [0]  [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> [0]  [0]PETSC ERROR: Out of memory. This could be due to allocating
> [0]  [0]PETSC ERROR: too large an object or bleeding by not properly
> [0]  [0]PETSC ERROR: destroying unneeded objects.
> [0]  [0]PETSC ERROR: Memory allocated 1372407920 Memory used by process -122585088
> [0]  [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [0]  [0]PETSC ERROR: Memory requested 18446744071829395456!
> [0]  [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]  [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 6, Tue Nov 16 17:02:32 CST 2010
> [0]  [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]  [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]  [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]  [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]  [0]PETSC ERROR: Peter.x on a linux-int named hawk-6 by praeth Mon Jan 24 15:44:28 2011
> [0]  [0]PETSC ERROR: Libraries linked from /default/praeth/MATH/petsc-3.1-p6/linux-intel-g/lib
> [0]  [0]PETSC ERROR: Configure run at Tue Dec 21 08:45:25 2010
> [0]  [0]PETSC ERROR: Configure options --download-superlu=1 --download-parmetis=1 --download-superlu_dist=1 --with-debugging=1 --with-error-checking=1 -PETSC_ARCH=linux-intel-g --with-fc="ifort -lmpi" --with-cc="icc -lmpi" --with-gnu-compilers=false
> [0]  [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]  [0]PETSC ERROR: PetscMallocAlign() line 49 in src/sys/memory/mal.c
> [0]  [0]PETSC ERROR: PetscTrMallocDefault() line 192 in src/sys/memory/mtr.c
> [0]  [0]PETSC ERROR: MatStashScatterBegin_Private() line 510 in src/mat/utils/matstash.c
> [0]  [0]PETSC ERROR: MatAssemblyBegin_MPIDense() line 286 in src/mat/impls/dense/mpi/mpidense.c
> [0]  [0]PETSC ERROR: MatAssemblyBegin() line 4564 in src/mat/interface/matrix.c
> [0]  [0]PETSC ERROR: User provided function() line 195 in "unknowndirectory/"Peter.c
> [-1]  MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
> [-1]  MPI: aborting job
> exit
>
> Had tried to use the suggestion to employ -malloc_dump or -malloc_log but do not see any result from the batch run.
>
> Thank you all for any insights you can offer.
>
>
> Best,
>
> Peter.
>



More information about the petsc-users mailing list