[petsc-dev] [petsc-users] Superlu_dist error
Matthew Knepley
knepley at gmail.com
Sun Oct 6 13:11:47 CDT 2013
On Sun, Oct 6, 2013 at 12:52 PM, Jose David Bermeol <jbermeol at purdue.edu>wrote:
> Hi again, now I'm getting the error again. The problem happens when I'm
> using the flag -O3 for the compiler. So what should I do next to solve
> this??
>
This sounds like an Intel compiler bug. We have seen lots of these.
Honestly, for the kinds of operations in
PETSc, you will see no speed improvement over GNU. Do you use something in
your code that runs faster
with Intel? If so, can you upgrade?
Thanks,
Matt
> Attached is my code, and for tis example I'm creating the matrix in the
> code.
>
> Thanks
>
> Nonzeros in L 10
> Nonzeros in U 10
> nonzeros in L+U 10
> nonzeros in LSUB 10
> NUMfact space (MB) sum(procs): L\U 0.00 all 0.03
> Total highmark (MB): All 0.03 Avg 0.02 Max
> 0.02
> Mat conversion(PETSc->SuperLU_DIST) time (max/min/avg):
> 0.000124216 / 4.81606e-05 / 8.61883e-05
> EQUIL time 0.00
> ROWPERM time 0.00
> COLPERM time 0.00
> SYMBFACT time 0.00
> DISTRIBUTE time 0.00
> FACTOR time 0.00
> Factor flops 1.000000e+02 Mflops 0.31
> SOLVE time 0.00
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSCERROR: or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [1]PETSC ERROR: likely location of problem given in stack below
> [1]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [1]PETSC ERROR: INSTEAD the line number of the start of the function
> [1]PETSC ERROR: is given.
> [1]PETSC ERROR: [1] SuperLU_DIST:pzgssvx line 234
> /home/jbermeol/petsc/petsc_superlu_dist/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [1]PETSC ERROR: [1] MatMatSolve_SuperLU_DIST line 198
> /home/jbermeol/petsc/petsc_superlu_dist/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [1]PETSC ERROR: [1] MatMatSolve line 3207
> /home/jbermeol/petsc/petsc_superlu_dist/src/mat/interface/matrix.c
> [1]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [1]PETSC ERROR: Signal received!
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Petsc Release Version 3.4.2, Jul, 02, 2013
> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [1]PETSC ERROR: See docs/index.html for manual pages.
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: ./test_solver on a arch-linux2-c-debug named
> carter-fe00.rcac.purdue.edu by jbermeol Sun Oct 6 13:43:00 2013
> [1]PETSC ERROR: Libraries linked from
> /home/jbermeol/petsc/petsc_superlu_dist/arch-linux2-c-debug/lib
> [1]PETSC ERROR: Configure run at Sun Oct 6 13:38:20 2013
> [1]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc
> --with-fc=mpiifort --with-scalar-type=complex --with-shared-libraries=1
> --with-debugging=1 --download-f-blas-lapack --download-superlu_dist=yes
> --download-superlu=yes --download-parmetis=yes --download-metis
> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) - process
> 1
> Caught signal number 11 SEGV: Segmentation Violation, probably memory
> access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSCERROR: or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: INSTEAD the line number of the start of the function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: [0] SuperLU_DIST:pzgssvx line 234
> /home/jbermeol/petsc/petsc_superlu_dist/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [0]PETSC ERROR: [0] MatMatSolve_SuperLU_DIST line 198
> /home/jbermeol/petsc/petsc_superlu_dist/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [0]PETSC ERROR: [0] MatMatSolve line 3207
> /home/jbermeol/petsc/petsc_superlu_dist/src/mat/interface/matrix.c
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.4.2, Jul, 02, 2013
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./test_solver on a arch-linux2-c-debug named
> carter-fe00.rcac.purdue.edu by jbermeol Sun Oct 6 13:43:00 2013
> [0]PETSC ERROR: Libraries linked from
> /home/jbermeol/petsc/petsc_superlu_dist/arch-linux2-c-debug/lib
> [0]PETSC ERROR: Configure run at Sun Oct 6 13:38:20 2013
> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc
> --with-fc=mpiifort --with-scalar-type=complex --with-shared-libraries=1
> --with-debugging=1 --download-f-blas-lapack --download-superlu_dist=yes
> --download-superlu=yes --download-parmetis=yes --download-metis
> COPTFLAGS=-03 CXXOPTFLAGS=-03 FOPTFLAGS=-03
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>
> ----- Original Message -----
> From: "Matthew Knepley" <knepley at gmail.com>
> To: "Jose David Bermeol" <jbermeol at purdue.edu>
> Cc: "For users of the development version of PETSc" <petsc-dev at mcs.anl.gov>,
> petsc-users at mcs.anl.gov
> Sent: Sunday, October 6, 2013 8:19:30 AM
> Subject: Re: [petsc-users] Superlu_dist error
>
>
> On Sun, Oct 6, 2013 at 12:10 AM, Jose David Bermeol < jbermeol at purdue.edu> wrote:
>
>
>
>
> Hi again, I compile with the following configuration:
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-scalar-type=complex --with-shared-libraries=1 --with-debugging=1
> --with-pic=1 --with-clanguage=C++ --with-fortran=1 --with-fortran-kernels=0
> --download-f-blas-lapack --download-superlu_dist=yes --download-superlu=yes
> --download-parmetis=yes --download-metis
>
>
>
> Get rid of:
>
>
> --with-pic=1 --with-fortran=1 --with-fortran-kernels=0
>
>
> since they do not really do anything, and just put back MKL. I suspect you
> will get a crash, and then
> it sounds like an MKL bug, or a bizarre incompatibility between SuperLU
> and MKL. If not, we explore further.
>
>
> Thanks,
>
>
> Matt
>
>
> And my code run perfec, so that means is a MKL problem or a mismatch
> between versions, so how to test what is the problem??
>
> Thanks
>
> ----- Original Message -----
> From: "Matthew Knepley" < knepley at gmail.com >
> To: "Jose David Bermeol" < jbermeol at purdue.edu >
> Cc: "For users of the development version of PETSc" <
> petsc-dev at mcs.anl.gov >, petsc-users at mcs.anl.gov
> Sent: Saturday, October 5, 2013 11:55:24 PM
> Subject: Re: [petsc-users] Superlu_dist error
>
>
> On Sat, Oct 5, 2013 at 10:49 PM, Jose David Bermeol < jbermeol at purdue.edu> wrote:
>
>
>
>
> Hi I'm runnig petsc trying to solve a linear system with superlu_dist.
> However i have a memory violation, atached is the code, and here is the
> output. Email me if you need something else to figured out what is
> happening.
>
>
>
> So it looks like SuperLU_Dist is bombing during an LAPACK operation. It
> could be an MKL problem, or a SuperLU_Dist problem, or our problem,
> or a mismatch between versions. I would try to simplify the configuration
> in order to cut down on the possibilities. Eliminate everything that is not
> necessary for SuperLU_dist first. Then change to --download-f-blas-lapack.
> If you still have a crash, send us the matrix since that should be
> reproducible and we can report a SuperLU_dist bug or fix our code.
>
>
> Thanks,
>
>
> Matt
>
>
> Thanks
>
> mpiexec -n 2 ./test_solver -mat_superlu_dist_statprint
> -mat_superlu_dist_matinput distributed
> Nonzeros in L 10
> Nonzeros in U 10
> nonzeros in L+U 10
> nonzeros in LSUB 10
> NUMfact space (MB) sum(procs): L\U 0.00 all 0.03
> Total highmark (MB): All 0.03 Avg 0.02 Max 0.02
> Mat conversion(PETSc->SuperLU_DIST) time (max/min/avg):
> 0.000146866 / 0.000145912 / 0.000146389
> EQUIL time 0.00
> ROWPERM time 0.00
> COLPERM time 0.00
> SYMBFACT time 0.00
> DISTRIBUTE time 0.00
> FACTOR time 0.00
> Factor flops 1.000000e+02 Mflops 0.31
> SOLVE time 0.00
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSCERROR: [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSCERROR: or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [1]PETSC ERROR: likely location of problem given in stack below
> [1]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: [1]PETSC ERROR: INSTEAD the line number of the start of
> the function
> [1]PETSC ERROR: is given.
> [1]PETSC ERROR: [1] SuperLU_DIST:pzgssvx line 234
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [1]PETSC ERROR: [1] MatMatSolve_SuperLU_DIST line 198
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [1]PETSC ERROR: INSTEAD the line number of the start of the function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: [0] SuperLU_DIST:pzgssvx line 234
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [0]PETSC ERROR: [1] MatMatSolve line 3207
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/src/mat/interface/matrix.c
> [1]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [1]PETSC ERROR: [0] MatMatSolve_SuperLU_DIST line 198
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [0]PETSC ERROR: [0] MatMatSolve line 3207
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/src/mat/interface/matrix.c
> Signal received!
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Petsc Release Version 3.4.2, Jul, 02, 2013
> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [1]PETSC ERROR: See docs/index.html for manual pages.
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: ./test_solver on a linux-complex named
> carter-fe02.rcac.purdue.edu by jbermeol Sat Oct 5 23:45:21 2013
> [1]PETSC ERROR: [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Libraries linked from
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/linux-complex/lib
> [1]PETSC ERROR: Configure run at Sat Oct 5 11:19:36 2013
> [1]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc
> --with-fc=mpiifort --with-scalar-type=complex --with-shared-libraries=1
> --with-debugging=1 --with-pic=1 --with-clanguage=C++ --with-fortran=1
> --with-fortran-kernels=0
> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl
> --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so
> --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include
> --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64
> -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64"
> --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include
> --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1 --COPTFLAGS=-O3
> --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
> --with-mkl-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include
> --with-mkl-lib="[/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so,/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_thread.so,/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so,/apps/rhel6/intel/composer_xe_2013.3.163/mkl/../compiler/lib/intel64/libiomp5.so]"
> --with-cpardiso-dir=/home/jbermeol/testPetscSolvers/intel_mkl_cpardiso
> --with-hdf5 --download-hdf5=yes --download-metis=yes
> --download-parmetis=yes --download-superlu_dist=yes --download-superlu=yes
> --download-mumps=yes --download-spooles=yes --download-pastix=yes
> --download-ptscotch=yes --download-umfpack=yes --download-sowing
> Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.4.2, Jul, 02, 2013
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./test_solver on a linux-complex named
> carter-fe02.rcac.purdue.edu by jbermeol Sat Oct 5 23:45:21 2013
> [0]PETSC ERROR: Libraries linked from
> /home/jbermeol/Nemo5/libs/petsc/build-cplx/linux-complex/lib
> [0]PETSC ERROR: Configure run at Sat Oct 5 11:19:36 2013
> [0]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) - process
> 1
> Configure options --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-scalar-type=complex --with-shared-libraries=1 --with-debugging=1
> --with-pic=1 --with-clanguage=C++ --with-fortran=1 --with-fortran-kernels=0
> --with-blas-lapack-dir=/apps/rhel6/intel/composer_xe_2013.3.163/mkl
> --with-blacs-lib=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so
> --with-blacs-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include
> --with-scalapack-lib="-L/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64
> -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64"
> --with-scalapack-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include
> --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1 --COPTFLAGS=-O3
> --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
> --with-mkl-include=/apps/rhel6/intel/composer_xe_2013.3.163/mkl/include
> --with-mkl-lib="[/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so,/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_thread.so,/apps/rhel6/intel/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so,/apps/rhel6/intel/composer_xe_2013.3.163/mkl/../compiler/lib/intel64/libiomp5.so]"
> --with-cpardiso-dir=/home/jbermeol/testPetscSolvers/intel_mkl_cpardiso
> --with-hdf5 --download-hdf5=yes --download-metis=yes
> --download-parmetis=yes --download-superlu_dist=yes --download-superlu=yes
> --download-mumps=yes --download-spooles=yes --download-pastix=yes
> --download-ptscotch=yes --download-umfpack=yes --download-sowing
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20131006/0e081974/attachment.html>
More information about the petsc-dev
mailing list