FW: [PETSC #18391] PETSc crash with memory allocation in ILU preconditioning

Deng, Ying ying.deng at intel.com
Fri Oct 10 18:29:38 CDT 2008


Hi, 


I am seeing problems when trying to build petsc-dev code. My configure
line is below, same as I successfully did for 2.3.2-p10. I tried with
mkl 9 and mkl 10. Same errors. There are references to undefined
symbols. Please share with me if you have any experience with the issue
or suggestions to resolve it. 

Thanks,
Ying


./config/configure.py --with-batch=1 --with-clanguage=C++
--with-vendor-compilers=intel '--CXXFLAGS=-g
-gcc-name=/usr/intel/pkgs/gcc/4.2.2/bin/g++ -gcc-version=420 '
'--LDFLAGS=-L/usr/lib64 -L/usr/intel/pkgs/gcc/4.2.2/lib -ldl -lpthread
-Qlocation,ld,/usr/intel/pkgs/gcc/4.2.2/x86_64-suse-linux/bin
-L/usr/intel/pkgs/icc/10.1.008e/lib -lirc' --with-cxx=$ICCDIR/bin/icpc
--with-fc=$IFCDIR/bin/ifort --with-mpi-compilers=0 --with-mpi-shared=0
--with-debugging=yes --with-mpi=yes --with-mpi-include=$MPIDIR/include
--with-mpi-lib=\[$MPIDIR/lib64/libmpi.a,$MPIDIR/lib64/libmpiif.a,$MPIDIR
/lib64/libmpigi.a\]
--with-blas-lapack-lib=\[$MKLLIBDIR/libguide.so,$MKLLIBDIR/libmkl_lapack
.so,$MKLLIBDIR/libmkl_solver.a,$MKLLIBDIR/libmkl.so\]
--with-scalapack=yes --with-scalapack-include=$MKLDIR/include
--with-scalapack-lib=$MKLLIBDIR/libmkl_scalapack.a --with-blacs=yes
--with-blacs-include=$MKLDIR/include
--with-blacs-lib=$MKLLIBDIR/libmkl_blacs_intelmpi_lp64.a
--with-umfpack=1
--with-umfpack-lib=\[$UMFPACKDIR/UMFPACK/Lib/libumfpack.a,$UMFPACKDIR/AM
D/Lib/libamd.a\] --with-umfpack-include=$UMFPACKDIR/UMFPACK/Include
--with-parmetis=1 --with-parmetis-dir=$PARMETISDIR --with-mumps=1
--download-mumps=$PETSC_DIR/externalpackages/MUMPS_4.6.3.tar.gz
--with-superlu_dist=1
--download-superlu_dist=$PETSC_DIR/externalpackages/superlu_dist_2.0.tar
.gz


....

/nfs/pdx/proj/dt/pdx_sde02/x86-64_linux26/petsc/petsc-dev/conftest.c:7:
undefined reference to `f2cblaslapack311_id_'
/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libguide.so:
undefined reference to `pthread_atfork'

....


------------------------------------------------------------------------
--------------
You set a value for --with-blas-lapack-lib=<lib>, but
['/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libguide.so',
'/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl_lapack.s
o',
'/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl_solver.a
', '/p/dt/sde/tools/x86-64_linux26/mkl/10.0.2.018/lib/em64t/libmkl.so']
cannot be used
************************************************************************
*********


-----Original Message-----
From: Barry Smith [mailto:bsmith at mcs.anl.gov] 
Sent: Thursday, October 09, 2008 12:39 PM
To: Rhew, Jung-hoon
Cc: PETSc-Maint Smith; Linton, Tom; Cea, Stephen M; Stettler, Mark
Subject: Re: [PETSC #18391] PETSc crash with memory allocation in ILU
preconditioning


    We don't have all the code just right to use those packages with
64 bit integers. I will try to get them all
working by Monday and will let you know my progress. To use them you
will need to be using
petsc-dev
http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/index.html
  so you can switch to
that now if you are not yet using it in preparation for my updates.


    Barry

On Oct 9, 2008, at 12:52 PM, Rhew, Jung-hoon wrote:

> Hi,
>
> I found that the root cause of malloc error was that our PETSc
> library had been compiled without 64 bit flag on.  Thus, PetscInt
> was defined as "int" instead of "long long" and for large problems,
> the memory allocation requires memory beyond the maximum of int and
> causes integer overflow.
>
> But when I tried to build using 64 bit flag (--with-64-bit-
> indices=1), all files associated with the external libraries (such
> as UMFPACK, and MUMPS) built with PETSc started failing in
> compilation mainly due to the incompatibility between "int" in those
> libraries and "long long" in PETSc.
>
> I wonder if you can let us know how to resolve this conflict when
> builing PETSc with 64 bit.  The brute force way is to change the
> source codes of those libraries where the conflicts occur but I
> wonder if there is a neater way of doing this.
>
> Thanks.
> jr
>
> Example:
> libfast in: /nfs/ltdn/disks/td_disk49/usr.cdmg/jrhew/work/mds_work/
> PETSC/mypetsc-2.3.2-p10/src/mat/impls/aij/seq/umfpack
>
> umfpack.c(154): error: a value of type "PetscInt={long long} *"
> cannot be used to initialize an entity of type "int *"
>    int          m=A->rmap.n,n=A->cmap.n,*ai=mat->i,*aj=mat-
> >j,status,*ra,idx;
>
>
> -----Original Message-----
> From: Barry Smith [mailto:bsmith at mcs.anl.gov]
> Sent: Tuesday, October 07, 2008 6:15 PM
> To: Rhew, Jung-hoon
> Cc: petsc-maint at mcs.anl.gov; Linton, Tom; Cea, Stephen M; Stettler,
> Mark
> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in
> ILU preconditioning
>
>
>    During the symbolic phase of ILU(N) there is no way in advance to
> know how many new nonzeros are needed
> in the factored version over the original matrix (this is tree for LU
> too).  We handle this by starting with a certain
> amount of memory and then if that is not enough for for the symbolic
> factor we double the memory allocated
> and copy the values over from the old copy of the symbolic factor
> (what has been computed so far) and then
> free the old copy.
>
>    To avoid this "memory doubling" (which is not super memory
> efficient) you can use the option
> -mat_factor_fill or PCFactorSetFill() to set slightly more than the
> "correct" value then only a single malloc
> is needed and you can do larger problems.
>
>   Of course, the question is "what value should I use for fill"?
> There is no formula, if there was we would
> use it automatically. So the only way I know is to run smaller
> problems and get a feel for what the ratio
> should be for your larger problem. Run with -info | grep
> pc_factor_fill and it will tell you what "you should
> have used"
>
>   Hope this helps,
>
>    Barry
>
>
>
> On Oct 7, 2008, at 5:46 PM, Rhew, Jung-hoon wrote:
>
>> Hi,
>>
>> 1. I ran it with 64-bit machine with 32GB physical memory but it
>> still crashed.  At the crash, the peak memory was 17GB so there were
>> plenty of memory left.  This is why I don't think the simulation
>> needed full 32GB + swap space more than 64GB.
>>
>> 2. The problem size is too big for direct solver as it can easily go
>> beyond 32GB.  Actually, we use MUMPS for smaller problems.
>>
>> 3. ILUN is the most robust preconditioner we found for our
>> production simulation so we want to stick to it.
>>
>> I think I'll send a test case that reproduces the problem.
>>
>> -----Original Message-----
>> From: knepley at gmail.com [mailto:knepley at gmail.com] On Behalf Of
>> Matthew Knepley
>> Sent: Tuesday, October 07, 2008 2:21 PM
>> To: Rhew, Jung-hoon
>> Cc: PETSC Maintenance
>> Subject: Re: [PETSC #18391] PETSc crash with memory allocation in
>> ILU preconditioning
>>
>> Its not hard for ILU(k) to run out of the 32-bit limit for large
>> matrices. I would recommend
>>
>> 1) Using a 64-bit machine with more memory
>>
>> 2) Trying a sparse direct solver like MUMPS
>>
>> 3) Trying another preconditioner, which is of course problem
>> dependent
>>
>>  Thanks,
>>
>>    Matt
>>
>> On Tue, Oct 7, 2008 at 4:03 PM, Rhew, Jung-hoon
>> <jung-hoon.rhew at intel.com> wrote:
>>> Dear PETSc team,
>>>
>>> We use PETSc as a linear solver library in our tool and in some
>>> test cases
>>> using ILU(N) preconditioner, we have problems with memory.  I'm not
>>> sending
>>> our matrix at this time since it is huge but if you think it is
>>> needed, I'll
>>> send it to you.
>>>
>>> Thanks for your help in advance.
>>>
>>>
>>>
>>> Log file is attached.
>>> OS: suse 64bit sles9
>>>
>>> 2.6.5-7.276.PTF.196309.1-smp #1 SMP Mon Jul 24 10:45:31 UTC 2006
>>> x86_64
>>> x86_64 x86_64 GNU/Linux
>>>
>>> PETSc ver: petsc-2.3.2-p10
>>> MPI implementation: Intel MPI based on MPICH2 and MVAPICH2
>>> Compiler: GCC 4.2.2
>>> Probable PETSc component: n/a
>>> Problem Description
>>>
>>> Solver setting: BCGSL (L=2) and ILU(N=2)
>>>
>>> -ksp_rtol=1e-14
>>>
>>> -ksp_type=bcgsl
>>>
>>> -ksp_bcgsl_ell=2
>>>
>>> -pc_factor_levels=2
>>>
>>> -pc_factor_reuseordering
>>>
>>> -pc_factor_zeropivot=0.0
>>>
>>> -pc_type=ilu
>>>
>>> -pc_factor_fill=2
>>>
>>> -pc_factor_mat_ordering_type=rcm
>>>
>>>
>>>
>>> malloc crash: sparse matrix size ~ 500K by 500K with NNZ ~ 0.002%
>>> (full
>>> error message is attached.)
>>>
>>> In debugger, symbolic ILU requires memory beyond the max int.  At
>>> line 1089
>>> In aijfact.c, len becomes -2147483648 as
>>> (bi[n])*sizeof(PetscScalar) > max
>>> int.
>>>
>>> len = (bi[n])*sizeof(PetscScalar);
>>>
>>>
>>>
>>> Then, it causes the following malloc error in subsequent function
>>> calls (the
>>> call stack is also in the attached error message).
>>>
>>> [0]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>>
>>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>>>
>>> [0]PETSC ERROR: too large an object or bleeding by not properly
>>>
>>> [0]PETSC ERROR: destroying unneeded objects.
>>>
>>> [0]PETSC ERROR: Memory allocated -2147483648 Memory used by process
>>> -2147483648
>>>
>>> [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for
>>> info.
>>>
>>> [0]PETSC ERROR: Memory requested 18446744071912865792!
>>>
>>> [0]PETSC ERROR:
>>>
------------------------------------------------------------------------
>>>
>>> [0]PETSC ERROR: Petsc Release Version 2.3.2, Patch 10, Wed Mar 28
>>> 19:13:22
>>> CDT 2007 HG revision: d7298c71db7f5e767f359ae35d33cab3bed44428
>>>
>>>
>>>
>>> Possibly relevant symptom: iterative solver with ILU(N) consumes
>>> more memory
>>> than direct solver as N gets larger (>5) although the matrix is not
>>> big
>>> enough to cause malloc crash like the above.
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>







More information about the petsc-dev mailing list