[petsc-users] Out of memory during MatAssemblyBegin

Barry Smith bsmith at mcs.anl.gov
Tue Jan 25 08:01:23 CST 2011


On Jan 25, 2011, at 7:36 AM, Raeth, Peter wrote:

> The matrix resides on disk. It was generated by a single-process program. Its purpose is for comparing those results with those generated by a PETSc-based multi-process program. The current approach works well for small and medium-sized matrices but not for the large matrix. 
> 
> What I can do is let each process determine which rows it holds locally. Then each process can read its rows

   You don't want to do this with standard Unix IO. Having all the processes trying to access  the same file will really stall out. Since it is a dense matrix you can easily use MPI IO and have each process access its piece of the matrix "in parallel".

   Barry

> and populate its part of the matrix. Just a bit more code. Not a big problem.
> 
> Thank you very much Barry for your input. Let me assure you that I have no intention of faking or hacking.  :)   This project is too important to our transition from shared-memory machines. (See http://www.afrl.hpc.mil/hardware/hawk.php.)
> 
> 
> Best,
> 
> Peter.
> 
> Peter G. Raeth, Ph.D.
> Senior Staff Scientist
> Signal and Image Processing
> High Performance Technologies, Inc
> 937-904-5147
> praeth at hpti.com
> 
> ________________________________________
> From: petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] on behalf of Barry Smith [bsmith at mcs.anl.gov]
> Sent: Monday, January 24, 2011 4:23 PM
> To: PETSc users list
> Subject: Re: [petsc-users] Out of memory during MatAssemblyBegin
> 
> On Jan 24, 2011, at 3:08 PM, Raeth, Peter wrote:
> 
>> Am running out of memory while using MatAssemblyBegin on a dense matrix that spans several processors. My calculations show that the matrices I am using do not require more than 25% of available memory.
>> 
>> Different about this matrix compared to the others is that the program runs out of memory after the matrix has been populated by a single process, rather than by multiple processes. Used MatSetValues. Since the values are held in cache until MatAssemblyEnd is called (as I understand things), is it possible that using one process to populate the entire matrix is causing this problem?
> 
> 
>   Yes, absolutely, this is a terrible non-scalable way of filling a parallel matrix. You can fake it by calling MatAssemblyBegin/End() repeatedly with the flag MAT_FLUSH_ASSEMBLY to keep the stash from getting too big. But you really need a much better way of setting values into the matrix. How are these "brought in row by row" matrix entries generated?
> 
>   Barry
> 
> 
>> The data is brought in only row by row for the population process. All buffer memory is cleared before the call to MatAssemblyBegin.
>> 
>> The error dump contains:
>> 
>> mpirun -prefix [%g]   -np 256 Peter.x
>> [0]  [0]PETSC ERROR: --------------------- Error Message ------------------------------------
>> [0]  [0]PETSC ERROR: Out of memory. This could be due to allocating
>> [0]  [0]PETSC ERROR: too large an object or bleeding by not properly
>> [0]  [0]PETSC ERROR: destroying unneeded objects.
>> [0]  [0]PETSC ERROR: Memory allocated 1372407920 Memory used by process -122585088
>> [0]  [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
>> [0]  [0]PETSC ERROR: Memory requested 18446744071829395456!
>> [0]  [0]PETSC ERROR: ------------------------------------------------------------------------
>> [0]  [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 6, Tue Nov 16 17:02:32 CST 2010
>> [0]  [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [0]  [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [0]  [0]PETSC ERROR: See docs/index.html for manual pages.
>> [0]  [0]PETSC ERROR: ------------------------------------------------------------------------
>> [0]  [0]PETSC ERROR: Peter.x on a linux-int named hawk-6 by praeth Mon Jan 24 15:44:28 2011
>> [0]  [0]PETSC ERROR: Libraries linked from /default/praeth/MATH/petsc-3.1-p6/linux-intel-g/lib
>> [0]  [0]PETSC ERROR: Configure run at Tue Dec 21 08:45:25 2010
>> [0]  [0]PETSC ERROR: Configure options --download-superlu=1 --download-parmetis=1 --download-superlu_dist=1 --with-debugging=1 --with-error-checking=1 -PETSC_ARCH=linux-intel-g --with-fc="ifort -lmpi" --with-cc="icc -lmpi" --with-gnu-compilers=false
>> [0]  [0]PETSC ERROR: ------------------------------------------------------------------------
>> [0]  [0]PETSC ERROR: PetscMallocAlign() line 49 in src/sys/memory/mal.c
>> [0]  [0]PETSC ERROR: PetscTrMallocDefault() line 192 in src/sys/memory/mtr.c
>> [0]  [0]PETSC ERROR: MatStashScatterBegin_Private() line 510 in src/mat/utils/matstash.c
>> [0]  [0]PETSC ERROR: MatAssemblyBegin_MPIDense() line 286 in src/mat/impls/dense/mpi/mpidense.c
>> [0]  [0]PETSC ERROR: MatAssemblyBegin() line 4564 in src/mat/interface/matrix.c
>> [0]  [0]PETSC ERROR: User provided function() line 195 in "unknowndirectory/"Peter.c
>> [-1]  MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
>> [-1]  MPI: aborting job
>> exit
>> 
>> Had tried to use the suggestion to employ -malloc_dump or -malloc_log but do not see any result from the batch run.
>> 
>> Thank you all for any insights you can offer.
>> 
>> 
>> Best,
>> 
>> Peter.
>> 
> 



More information about the petsc-users mailing list