[petsc-users] Problem on LU factorization

Hong Zhang hzhang at mcs.anl.gov
Wed Jan 26 09:39:17 CST 2011


As Matt said, this is no enough information on what solver combination
is being used.
Since superlu_dist works while petsc/superlu's sequential lu fails,
you might need
parallel lu, or shift if zero pivot causes crash. As suggested by Satish,
run code in debug version from which informative error display would be shown.

Hong

On Tue, Jan 25, 2011 at 11:19 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> I don't see anything obviously wrong with this build
>
> I guess the other thing to do is to build debug version on the machine
> and run in a debugger to determine the problem. [I believe there is a
> way to debug on bgl..]
>
> Satish
>
> On Tue, 25 Jan 2011, Rongliang Chen wrote:
>
>> Hi Balay,
>>
>> Thank you for your reply.
>> I have checked my code with valgrind on my own computer and there is no
>> problem.
>> But when I run my code on the IBM Blue Gene/L with
>> "-sub_pc_factor_mat_solver_package superlu", it has such problem.
>> Since there is not valgrind on IBM Blue Gene/L, I can not test my code with
>> valgrind on it.
>>
>> But if use the PETSC's default LU factorization, there is no such problem.
>> So I suspect that there is some problem with my petsc's installation.
>> Can you help me to check if my installation is correct?
>> Following is the detail of the installation and the configure.log and
>> make.log are attached.
>>
>> Installing Petsc on IBM Blue Gene/L:
>>
>> 1. patch -p0 < /contrib/bgl/petsc/petsc-3.0.0-p4/petsc-3.0.0-p4.patch
>> 2. ./config/bgl-ibm-goto_lapack.py  and the  the "bgl-ibm-goto_lapack.py" is
>> :
>> ******************************************************************************************
>> #!/usr/bin/env python
>> #
>> # BGL has broken 'libc' dependencies. The option 'LIBS' is used to
>> # workarround this problem.
>> #
>> # LIBS="-lc -lnss_files -lnss_dns -lresolv"
>> #
>> # Another workarround is to modify mpicc/mpif77 scripts and make them
>> # link with the corresponding compilers, and these additional
>> # libraries. The following tarball has the modified compiler scripts
>> #
>> # ftp://ftp.mcs.anl.gov/pub/petsc/tmp/petsc-bgl-tools.tar.gz
>> #
>> configure_options = [
>>   '--with-cc=/contrib/bgl/bin/mpxlc',
>>   '--with-cxx=/contrib/bgl/bin/mpxlC',
>>   '--with-fc=/contrib/bgl/bin/mpxlf -qnosave',
>>   '--with-mpi-dir=/bgl/BlueLight/ppcfloor/bglsys',  # required by BLACS to
>> get mpif.h
>>   '--with-lapack-lib=/contrib/bgl/lib/liblapack440.a',
>>   '--with-blas-lib=/contrib/bgl/lib/libblas440.a',
>> #  '--with-blas-lapack-lib=-L/contrib/bgl/lib -llapack440 -L/contrib/bgl/lib
>> -lgoto',
>>
>>   '--with-is-color-value-type=short',
>>   '--with-shared=0',
>>
>>   '-COPTFLAGS=-O2 -qbgl -qarch=440d -qtune=440 -qmaxmem=-1',
>>   '-CXXOPTFLAGS=-O2 -qbgl -qarch=440d -qtune=440 -qmaxmem=-1',
>>   '-FOPTFLAGS=-O2 -qbgl -qarch=440d -qtune=440 -qmaxmem=-1',
>>   '--with-debugging=0',
>>
>>   # the following option gets automatically enabled on BGL/with IBM
>> compilers.
>>   # '--with-fortran-kernels=bgl'
>>
>>   '--with-x=0',
>>   '--with-x11=0',
>>   '--with-batch=1',
>>   '--with-memcmp-ok',
>>   '--sizeof-char=1',
>>   '--sizeof-void-p=4',
>>   '--sizeof-short=2',
>>   '--sizeof-int=4',
>>   '--sizeof-long=4',
>>   '--sizeof-size-t=4',
>>   '--sizeof-long-long=8',
>>   '--sizeof-float=4',
>>   '--sizeof-double=8',
>>   '--bits-per-byte=8',
>>   '--sizeof-MPI-Comm=4',
>>   '--sizeof-MPI-Fint=4',
>>   '--have-mpi-long-double=1',
>>
>>
>> '--download-superlu=/home/rchen/soft/petsc-3.1-p7-nodebug/externalpackages/superlu_4.0-March_7_2010.tar.gz',
>>
>> '--download-superlu_dist=/home/rchen/soft/petsc-3.1-p7-nodebug/externalpackages/SuperLU_DIST_2.4-hg-v2.tar.gz',
>>
>> '--download-parmetis=/home/rchen/soft/petsc-3.1-p7-nodebug/externalpackages/ParMetis-dev-p3.tar.gz',
>>
>> '--download-scalapack=/home/rchen/soft/petsc-3.1-p7-nodebug/externalpackages/scalapack.tgz',
>>
>> '--download-blacs=/home/rchen/soft/petsc-3.1-p7-nodebug/externalpackages/blacs-dev.tar.gz',
>>
>> '--download-f-blas-lapack=/home/rchen/soft/petsc-3.1-p7-nodebug/externalpackages/fblaslapack-3.1.1.tar.gz',
>>
>> '--download-mumps=/home/rchen/soft/petsc-3.1-p7-nodebug/externalpackages/MUMPS_4.9.2.tar.gz',
>>
>> #  '--download-f-blas-lapack=1',
>> #  '--download-hypre=1',
>> #  '--download-spooles=1',
>> #  '--download-superlu=1',
>> #  '--download-parmetis=1',
>> #  '--download-superlu_dist=1',
>> #  '--download-blacs=1',
>>
>>    '-PETSC_ARCH=bgl-ibm-goto-O3_440d'
>>   ]
>>
>> if __name__ == '__main__':
>>   import sys,os
>>   sys.path.insert(0,os.path.abspath('config'))
>>   import configure
>>   configure.petsc_configure(configure_options)
>>
>> # Extra options used for testing locally
>> test_options = []
>> ************************************************************************
>> 3. cqsub -n 1 -t 20 -O conftest -q debug ./conftest
>> 4. ./reconfigure.py
>> 5. make all
>>
>> Thank you!
>>
>> Best,
>>
>> Rongliang
>>
>>
>> ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Mon, 24 Jan 2011 16:06:22 -0600 (CST)
>> > From: Satish Balay <balay at mcs.anl.gov>
>> > Subject: Re: [petsc-users] Problem on LU factorization
>> > To: PETSc users list <petsc-users at mcs.anl.gov>
>> > Message-ID:
>> >        <alpine.LFD.2.02.1101241605370.2510 at localhost6.localdomain6>
>> > Content-Type: TEXT/PLAIN; charset=US-ASCII
>> >
>> > On Mon, 24 Jan 2011, Matthew Knepley wrote:
>> >
>> > > > When I use superlu with command line "-sub_pc_factor_mat_solver_package
>> > > > superlu", it said
>> > >
>> > > "[43]PETSC ERROR:
>> > > >
>> > ------------------------------------------------------------------------
>> > > > [43]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> > > > probably memory access out of range
>> > > > [43]PETSC ERROR: Try option -start_in_debugger or
>> > -on_error_attach_debugger
>> > > > [43]PETSC ERROR: or see
>> > > >
>> > http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[43]PETSCERROR:
>> > or try
>> > > > http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
>> > > > corruption errors
>> > > > [43]PETSC ERROR: likely location of problem given in stack below
>> > > > [43]PETSC ERROR: ---------------------  Stack Frames
>> > > > ------------------------------------
>> > > > [43]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>> > > > available,
>> > > > [43]PETSC ERROR:       INSTEAD the line number of the start of the
>> > function
>> > > > [43]PETSC ERROR:       is given.
>> > > > [43]PETSC ERROR: [43] MatLUFactorNumeric_SuperLU line 121
>> > > > src/mat/impls/aij/seq/superlu/superlu.c
>> > > > [43]PETSC ERROR: [43] MatLUFactorNumeric line 2575
>> > > > src/mat/interface/matrix.c
>> > > > ............................
>> > > >  "
>> > > >
>> > >
>> > > Please confirm that you have the latest patch level. If so, send the
>> > matrix
>> > > in PETSc binary format to petsc-maint at mcs.anl.gov
>> > > along with the precise solver options and output of -ksp_view.
>> >
>> > More likely there is memory corruption somewhere - should run this
>> > code with valgrind to weed out such issues..
>> >
>> > Satish
>> >
>> >
>> > ------------------------------
>> >
>> >
>>
>
>


More information about the petsc-users mailing list