FW: [PETSC #18391] PETSc crash with memory allocation in ILU preconditioning

Deng, Ying ying.deng at intel.com
Fri Oct 10 18:29:38 CDT 2008


I am seeing problems when trying to build petsc-dev code. My configure
line is below, same as I successfully did for 2.3.2-p10. I tried with
mkl 9 and mkl 10. Same errors. There are references to undefined
symbols. Please share with me if you have any experience with the issue
or suggestions to resolve it. 


./config/configure.py --with-batch=1 --with-clanguage=C++
--with-vendor-compilers=intel '--CXXFLAGS=-g
-gcc-name=/usr/intel/pkgs/gcc/4.2.2/bin/g++ -gcc-version=420 '
'--LDFLAGS=-L/usr/lib64 -L/usr/intel/pkgs/gcc/4.2.2/lib -ldl -lpthread
-L/usr/intel/pkgs/icc/10.1.008e/lib -lirc' --with-cxx=$ICCDIR/bin/icpc
--with-fc=$IFCDIR/bin/ifort --with-mpi-compilers=0 --with-mpi-shared=0
--with-debugging=yes --with-mpi=yes --with-mpi-include=$MPIDIR/include
--with-scalapack=yes --with-scalapack-include=$MKLDIR/include
--with-scalapack-lib=$MKLLIBDIR/libmkl_scalapack.a --with-blacs=yes
D/Lib/libamd.a\] --with-umfpack-include=$UMFPACKDIR/UMFPACK/Include
--with-parmetis=1 --with-parmetis-dir=$PARMETISDIR --with-mumps=1


undefined reference to `f2cblaslapack311_id_'
undefined reference to `pthread_atfork'


You set a value for --with-blas-lapack-lib=<lib>, but
', '/p/dt/sde/tools/x86-64_linux26/mkl/']
cannot be used

-----Original Message-----
From: Barry Smith [mailto:bsmith at mcs.anl.gov] 
Sent: Thursday, October 09, 2008 12:39 PM
To: Rhew, Jung-hoon
Cc: PETSc-Maint Smith; Linton, Tom; Cea, Stephen M; Stettler, Mark
Subject: Re: [PETSC #18391] PETSc crash with memory allocation in ILU

    We don't have all the code just right to use those packages with
64 bit integers. I will try to get them all
working by Monday and will let you know my progress. To use them you
will need to be using
  so you can switch to
that now if you are not yet using it in preparation for my updates.


On Oct 9, 2008, at 12:52 PM, Rhew, Jung-hoon wrote:

> Hi,
> I found that the root cause of malloc error was that our PETSc
> library had been compiled without 64 bit flag on.  Thus, PetscInt
> was defined as "int" instead of "long long" and for large problems,
> the memory allocation requires memory beyond the maximum of int and
> causes integer overflow.
> But when I tried to build using 64 bit flag (--with-64-bit-
> indices=1), all files associated with the external libraries (such
> as UMFPACK, and MUMPS) built with PETSc started failing in
> compilation mainly due to the incompatibility between "int" in those
> libraries and "long long" in PETSc.
> I wonder if you can let us know how to resolve this conflict when
> builing PETSc with 64 bit.  The brute force way is to change the
> source codes of those libraries where the conflicts occur but I
> wonder if there is a neater way of doing this.
> Thanks.
> jr
> Example:
> libfast in: /nfs/ltdn/disks/td_disk49/usr.cdmg/jrhew/work/mds_work/
> PETSC/mypetsc-2.3.2-p10/src/mat/impls/aij/seq/umfpack
> umfpack.c(154): error: a value of type "PetscInt={long long} *"
> cannot be used to initialize an entity of type "int *"
>    int          m=A->rmap.n,n=A->cmap.n,*ai=mat->i,*aj=mat-
> >j,status,*ra,idx;
> -----Original Message-----
> From: Barry Smith [mailto:bsmith at mcs.anl.gov]
> Sent: Tuesday, October 07, 2008 6:15 PM
> To: Rhew, Jung-hoon
> Cc: petsc-maint at mcs.anl.gov; Linton, Tom; Cea, Stephen M; Stettler,
> Mark
> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in
> ILU preconditioning
>    During the symbolic phase of ILU(N) there is no way in advance to
> know how many new nonzeros are needed
> in the factored version over the original matrix (this is tree for LU
> too).  We handle this by starting with a certain
> amount of memory and then if that is not enough for for the symbolic
> factor we double the memory allocated
> and copy the values over from the old copy of the symbolic factor
> (what has been computed so far) and then
> free the old copy.
>    To avoid this "memory doubling" (which is not super memory
> efficient) you can use the option
> -mat_factor_fill or PCFactorSetFill() to set slightly more than the
> "correct" value then only a single malloc
> is needed and you can do larger problems.
>   Of course, the question is "what value should I use for fill"?
> There is no formula, if there was we would
> use it automatically. So the only way I know is to run smaller
> problems and get a feel for what the ratio
> should be for your larger problem. Run with -info | grep
> pc_factor_fill and it will tell you what "you should
> have used"
>   Hope this helps,
>    Barry
> On Oct 7, 2008, at 5:46 PM, Rhew, Jung-hoon wrote:
>> Hi,
>> 1. I ran it with 64-bit machine with 32GB physical memory but it
>> still crashed.  At the crash, the peak memory was 17GB so there were
>> plenty of memory left.  This is why I don't think the simulation
>> needed full 32GB + swap space more than 64GB.
>> 2. The problem size is too big for direct solver as it can easily go
>> beyond 32GB.  Actually, we use MUMPS for smaller problems.
>> 3. ILUN is the most robust preconditioner we found for our
>> production simulation so we want to stick to it.
>> I think I'll send a test case that reproduces the problem.
>> -----Original Message-----
>> From: knepley at gmail.com [mailto:knepley at gmail.com] On Behalf Of
>> Matthew Knepley
>> Sent: Tuesday, October 07, 2008 2:21 PM
>> To: Rhew, Jung-hoon
>> Cc: PETSC Maintenance
>> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in
>> ILU preconditioning
>> Its not hard for ILU(k) to run out of the 32-bit limit for large
>> matrices. I would recommend
>> 1) Using a 64-bit machine with more memory
>> 2) Trying a sparse direct solver like MUMPS
>> 3) Trying another preconditioner, which is of course problem
>> dependent
>>  Thanks,
>>    Matt
>> On Tue, Oct 7, 2008 at 4:03 PM, Rhew, Jung-hoon
>> <jung-hoon.rhew at intel.com> wrote:
>>> Dear PETSc team,
>>> We use PETSc as a linear solver library in our tool and in some
>>> test cases
>>> using ILU(N) preconditioner, we have problems with memory.  I'm not
>>> sending
>>> our matrix at this time since it is huge but if you think it is
>>> needed, I'll
>>> send it to you.
>>> Thanks for your help in advance.
>>> Log file is attached.
>>> OS: suse 64bit sles9
>>> 2.6.5-7.276.PTF.196309.1-smp #1 SMP Mon Jul 24 10:45:31 UTC 2006
>>> x86_64
>>> x86_64 x86_64 GNU/Linux
>>> PETSc ver: petsc-2.3.2-p10
>>> MPI implementation: Intel MPI based on MPICH2 and MVAPICH2
>>> Compiler: GCC 4.2.2
>>> Probable PETSc component: n/a
>>> Problem Description
>>> Solver setting: BCGSL (L=2) and ILU(N=2)
>>> -ksp_rtol=1e-14
>>> -ksp_type=bcgsl
>>> -ksp_bcgsl_ell=2
>>> -pc_factor_levels=2
>>> -pc_factor_reuseordering
>>> -pc_factor_zeropivot=0.0
>>> -pc_type=ilu
>>> -pc_factor_fill=2
>>> -pc_factor_mat_ordering_type=rcm
>>> malloc crash: sparse matrix size ~ 500K by 500K with NNZ ~ 0.002%
>>> (full
>>> error message is attached.)
>>> In debugger, symbolic ILU requires memory beyond the max int.  At
>>> line 1089
>>> In aijfact.c, len becomes -2147483648 as
>>> (bi[n])*sizeof(PetscScalar) > max
>>> int.
>>> len = (bi[n])*sizeof(PetscScalar);
>>> Then, it causes the following malloc error in subsequent function
>>> calls (the
>>> call stack is also in the attached error message).
>>> [0]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>>> [0]PETSC ERROR: too large an object or bleeding by not properly
>>> [0]PETSC ERROR: destroying unneeded objects.
>>> [0]PETSC ERROR: Memory allocated -2147483648 Memory used by process
>>> -2147483648
>>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for
>>> info.
>>> [0]PETSC ERROR: Memory requested 18446744071912865792!
>>> [0]PETSC ERROR: Petsc Release Version 2.3.2, Patch 10, Wed Mar 28
>>> 19:13:22
>>> CDT 2007 HG revision: d7298c71db7f5e767f359ae35d33cab3bed44428
>>> Possibly relevant symptom: iterative solver with ILU(N) consumes
>>> more memory
>>> than direct solver as N gets larger (>5) although the matrix is not
>>> big
>>> enough to cause malloc crash like the above.
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener

More information about the petsc-dev mailing list