FW: [PETSC #18391] PETSc crash with memory allocation in ILU preconditioning

Matthew Knepley knepley at gmail.com
Fri Oct 10 18:41:43 CDT 2008


On Fri, Oct 10, 2008 at 6:29 PM, Deng, Ying <ying.deng at intel.com> wrote:
> Hi,
>
> I am seeing problems when trying to build petsc-dev code. My configure
> line is below, same as I successfully did for 2.3.2-p10. I tried with
> mkl 9 and mkl 10. Same errors. There are references to undefined
> symbols. Please share with me if you have any experience with the issue
> or suggestions to resolve it.

1) Please always send configure.log. The screen output does not tell us
    enough to debug problems.

2) Specifying libraries directly is not usually a good idea since some packages,
    like MKL, tend to depend on other libraries (like libguide,
libpthread). I would
    use --with-blas-lapack-dir=$MKLDIR

3) Mail about install problems should go to petsc-maint at mcs.anl.gov. petsc-dev
    is for discussion of development.

 Thanks,

   Matt

> Thanks,
> Ying
>
>
> ./config/configure.py --with-batch=1 --with-clanguage=C++
> --with-vendor-compilers=intel '--CXXFLAGS=-g
> -gcc-name=/usr/intel/pkgs/gcc/4.2.2/bin/g++ -gcc-version=420 '
> '--LDFLAGS=-L/usr/lib64 -L/usr/intel/pkgs/gcc/4.2.2/lib -ldl -lpthread
> -Qlocation,ld,/usr/intel/pkgs/gcc/4.2.2/x86_64-suse-linux/bin
> -L/usr/intel/pkgs/icc/10.1.008e/lib -lirc' --with-cxx=$ICCDIR/bin/icpc
> --with-fc=$IFCDIR/bin/ifort --with-mpi-compilers=0 --with-mpi-shared=0
> --with-debugging=yes --with-mpi=yes --with-mpi-include=$MPIDIR/include
> --with-mpi-lib=\[$MPIDIR/lib64/libmpi.a,$MPIDIR/lib64/libmpiif.a,$MPIDIR
> /lib64/libmpigi.a\]
> --with-blas-lapack-lib=\[$MKLLIBDIR/libguide.so,$MKLLIBDIR/libmkl_lapack
> .so,$MKLLIBDIR/libmkl_solver.a,$MKLLIBDIR/libmkl.so\]
> --with-scalapack=yes --with-scalapack-include=$MKLDIR/include
> --with-scalapack-lib=$MKLLIBDIR/libmkl_scalapack.a --with-blacs=yes
> --with-blacs-include=$MKLDIR/include
> --with-blacs-lib=$MKLLIBDIR/libmkl_blacs_intelmpi_lp64.a
> --with-umfpack=1
> --with-umfpack-lib=\[$UMFPACKDIR/UMFPACK/Lib/libumfpack.a,$UMFPACKDIR/AM
> D/Lib/libamd.a\] --with-umfpack-include=$UMFPACKDIR/UMFPACK/Include
> --with-parmetis=1 --with-parmetis-dir=$PARMETISDIR --with-mumps=1
> --download-mumps=$PETSC_DIR/externalpackages/MUMPS_4.6.3.tar.gz
> --with-superlu_dist=1
> --download-superlu_dist=$PETSC_DIR/externalpackages/superlu_dist_2.0.tar
> .gz
>
>
> ....
>
> /nfs/pdx/proj/dt/pdx_sde02/x86-64_linux26/petsc/petsc-dev/conftest.c:7:
> undefined reference to `f2cblaslapack311_id_'
> /p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libguide.so:
> undefined reference to `pthread_atfork'
>
> ....
>
>
> ------------------------------------------------------------------------
> --------------
> You set a value for --with-blas-lapack-lib=<lib>, but
> ['/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libguide.so',
> '/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl_lapack.s
> o',
> '/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl_solver.a
> ', '/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl.so']
> cannot be used
> ************************************************************************
> *********
>
>
> -----Original Message-----
> From: Barry Smith [mailto:bsmith at mcs.anl.gov]
> Sent: Thursday, October 09, 2008 12:39 PM
> To: Rhew, Jung-hoon
> Cc: PETSc-Maint Smith; Linton, Tom; Cea, Stephen M; Stettler, Mark
> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in ILU
> preconditioning
>
>
>    We don't have all the code just right to use those packages with
> 64 bit integers. I will try to get them all
> working by Monday and will let you know my progress. To use them you
> will need to be using
> petsc-dev
> http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/index.html
>  so you can switch to
> that now if you are not yet using it in preparation for my updates.
>
>
>    Barry
>
> On Oct 9, 2008, at 12:52 PM, Rhew, Jung-hoon wrote:
>
>> Hi,
>>
>> I found that the root cause of malloc error was that our PETSc
>> library had been compiled without 64 bit flag on.  Thus, PetscInt
>> was defined as "int" instead of "long long" and for large problems,
>> the memory allocation requires memory beyond the maximum of int and
>> causes integer overflow.
>>
>> But when I tried to build using 64 bit flag (--with-64-bit-
>> indices=1), all files associated with the external libraries (such
>> as UMFPACK, and MUMPS) built with PETSc started failing in
>> compilation mainly due to the incompatibility between "int" in those
>> libraries and "long long" in PETSc.
>>
>> I wonder if you can let us know how to resolve this conflict when
>> builing PETSc with 64 bit.  The brute force way is to change the
>> source codes of those libraries where the conflicts occur but I
>> wonder if there is a neater way of doing this.
>>
>> Thanks.
>> jr
>>
>> Example:
>> libfast in: /nfs/ltdn/disks/td_disk49/usr.cdmg/jrhew/work/mds_work/
>> PETSC/mypetsc-2.3.2-p10/src/mat/impls/aij/seq/umfpack
>>
>> umfpack.c(154): error: a value of type "PetscInt={long long} *"
>> cannot be used to initialize an entity of type "int *"
>>    int          m=A->rmap.n,n=A->cmap.n,*ai=mat->i,*aj=mat-
>> >j,status,*ra,idx;
>>
>>
>> -----Original Message-----
>> From: Barry Smith [mailto:bsmith at mcs.anl.gov]
>> Sent: Tuesday, October 07, 2008 6:15 PM
>> To: Rhew, Jung-hoon
>> Cc: petsc-maint at mcs.anl.gov; Linton, Tom; Cea, Stephen M; Stettler,
>> Mark
>> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in
>> ILU preconditioning
>>
>>
>>    During the symbolic phase of ILU(N) there is no way in advance to
>> know how many new nonzeros are needed
>> in the factored version over the original matrix (this is tree for LU
>> too).  We handle this by starting with a certain
>> amount of memory and then if that is not enough for for the symbolic
>> factor we double the memory allocated
>> and copy the values over from the old copy of the symbolic factor
>> (what has been computed so far) and then
>> free the old copy.
>>
>>    To avoid this "memory doubling" (which is not super memory
>> efficient) you can use the option
>> -mat_factor_fill or PCFactorSetFill() to set slightly more than the
>> "correct" value then only a single malloc
>> is needed and you can do larger problems.
>>
>>   Of course, the question is "what value should I use for fill"?
>> There is no formula, if there was we would
>> use it automatically. So the only way I know is to run smaller
>> problems and get a feel for what the ratio
>> should be for your larger problem. Run with -info | grep
>> pc_factor_fill and it will tell you what "you should
>> have used"
>>
>>   Hope this helps,
>>
>>    Barry
>>
>>
>>
>> On Oct 7, 2008, at 5:46 PM, Rhew, Jung-hoon wrote:
>>
>>> Hi,
>>>
>>> 1. I ran it with 64-bit machine with 32GB physical memory but it
>>> still crashed.  At the crash, the peak memory was 17GB so there were
>>> plenty of memory left.  This is why I don't think the simulation
>>> needed full 32GB + swap space more than 64GB.
>>>
>>> 2. The problem size is too big for direct solver as it can easily go
>>> beyond 32GB.  Actually, we use MUMPS for smaller problems.
>>>
>>> 3. ILUN is the most robust preconditioner we found for our
>>> production simulation so we want to stick to it.
>>>
>>> I think I'll send a test case that reproduces the problem.
>>>
>>> -----Original Message-----
>>> From: knepley at gmail.com [mailto:knepley at gmail.com] On Behalf Of
>>> Matthew Knepley
>>> Sent: Tuesday, October 07, 2008 2:21 PM
>>> To: Rhew, Jung-hoon
>>> Cc: PETSC Maintenance
>>> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in
>>> ILU preconditioning
>>>
>>> Its not hard for ILU(k) to run out of the 32-bit limit for large
>>> matrices. I would recommend
>>>
>>> 1) Using a 64-bit machine with more memory
>>>
>>> 2) Trying a sparse direct solver like MUMPS
>>>
>>> 3) Trying another preconditioner, which is of course problem
>>> dependent
>>>
>>>  Thanks,
>>>
>>>    Matt
>>>
>>> On Tue, Oct 7, 2008 at 4:03 PM, Rhew, Jung-hoon
>>> <jung-hoon.rhew at intel.com> wrote:
>>>> Dear PETSc team,
>>>>
>>>> We use PETSc as a linear solver library in our tool and in some
>>>> test cases
>>>> using ILU(N) preconditioner, we have problems with memory.  I'm not
>>>> sending
>>>> our matrix at this time since it is huge but if you think it is
>>>> needed, I'll
>>>> send it to you.
>>>>
>>>> Thanks for your help in advance.
>>>>
>>>>
>>>>
>>>> Log file is attached.
>>>> OS: suse 64bit sles9
>>>>
>>>> 2.6.5-7.276.PTF.196309.1-smp #1 SMP Mon Jul 24 10:45:31 UTC 2006
>>>> x86_64
>>>> x86_64 x86_64 GNU/Linux
>>>>
>>>> PETSc ver: petsc-2.3.2-p10
>>>> MPI implementation: Intel MPI based on MPICH2 and MVAPICH2
>>>> Compiler: GCC 4.2.2
>>>> Probable PETSc component: n/a
>>>> Problem Description
>>>>
>>>> Solver setting: BCGSL (L=2) and ILU(N=2)
>>>>
>>>> -ksp_rtol=1e-14
>>>>
>>>> -ksp_type=bcgsl
>>>>
>>>> -ksp_bcgsl_ell=2
>>>>
>>>> -pc_factor_levels=2
>>>>
>>>> -pc_factor_reuseordering
>>>>
>>>> -pc_factor_zeropivot=0.0
>>>>
>>>> -pc_type=ilu
>>>>
>>>> -pc_factor_fill=2
>>>>
>>>> -pc_factor_mat_ordering_type=rcm
>>>>
>>>>
>>>>
>>>> malloc crash: sparse matrix size ~ 500K by 500K with NNZ ~ 0.002%
>>>> (full
>>>> error message is attached.)
>>>>
>>>> In debugger, symbolic ILU requires memory beyond the max int.  At
>>>> line 1089
>>>> In aijfact.c, len becomes -2147483648 as
>>>> (bi[n])*sizeof(PetscScalar) > max
>>>> int.
>>>>
>>>> len = (bi[n])*sizeof(PetscScalar);
>>>>
>>>>
>>>>
>>>> Then, it causes the following malloc error in subsequent function
>>>> calls (the
>>>> call stack is also in the attached error message).
>>>>
>>>> [0]PETSC ERROR: --------------------- Error Message
>>>> ------------------------------------
>>>>
>>>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>>>>
>>>> [0]PETSC ERROR: too large an object or bleeding by not properly
>>>>
>>>> [0]PETSC ERROR: destroying unneeded objects.
>>>>
>>>> [0]PETSC ERROR: Memory allocated -2147483648 Memory used by process
>>>> -2147483648
>>>>
>>>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for
>>>> info.
>>>>
>>>> [0]PETSC ERROR: Memory requested 18446744071912865792!
>>>>
>>>> [0]PETSC ERROR:
>>>>
> ------------------------------------------------------------------------
>>>>
>>>> [0]PETSC ERROR: Petsc Release Version 2.3.2, Patch 10, Wed Mar 28
>>>> 19:13:22
>>>> CDT 2007 HG revision: d7298c71db7f5e767f359ae35d33cab3bed44428
>>>>
>>>>
>>>>
>>>> Possibly relevant symptom: iterative solver with ILU(N) consumes
>>>> more memory
>>>> than direct solver as N gets larger (>5) although the matrix is not
>>>> big
>>>> enough to cause malloc crash like the above.
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener
>>>
>>
>
>
>
>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener




More information about the petsc-dev mailing list