[petsc-users] issues with sparse direct solvers

Gideon Simpson gideon.simpson at gmail.com
Sat Aug 22 16:16:01 CDT 2015


Thanks Barry, I’ll take a look at debugging.  I’m also going to try petsc 3.6, since that has a newer MUMPS build.  

Regarding the SuperLU bugs, are they bad enough hat I should distrust output even when errors were not generated?


-gideon

> On Aug 22, 2015, at 5:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
>> On Aug 22, 2015, at 4:04 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>> 
>> I’m having issues with both SuperLU dist and MUMPS, as compiled by PETsc, in the following sense:
>> 
>> 1.  For large enough systems, which seems to vary depending on which computer I’m on, MUMPS seems to just die and never start, when it’s used as the linear solver within a SNES.    There’s no error message, it just sits there and doesn’t do anything.
> 
>  You will need to use a debugger to figure out where it is "hanging"; we haven't heard reports about this.
>> 
>> 2.  When running with SuperLU dist, I got the following error, with no further information:
> 
>  The last release of SuperLU_DIST had some pretty nasty bugs, memory corruption that caused crashes etc. We think they are now fixed if you use the maint branch of the PETSc repository and --download-superlu_dist  If you stick with the PETSc release and SuperLU_Dist you are using you will keep seeing these crashes
> 
>   Barry
> 
> 
>> 
>> [3]PETSC ERROR: ------------------------------------------------------------------------
>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [3]PETSC ERROR: likely location of problem given in stack below
>> [3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [3]PETSC ERROR:       is given.
>> [3]PETSC ERROR: [3] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [3]PETSC ERROR: [3] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [3]PETSC ERROR: [3] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [3]PETSC ERROR: [3] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [3]PETSC ERROR: [3] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [3]PETSC ERROR: Signal received
>> [3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [3]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [3]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [3]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [3]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD 
>> with errorcode 59.
>> 
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> --------------------------------------------------------------------------
>> [proteusi01:14037] 1 more process has sent help message help-mpi-api.txt / mpi-abort
>> [proteusi01:14037] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>> [6]PETSC ERROR: ------------------------------------------------------------------------
>> [6]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [6]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [6]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [6]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [6]PETSC ERROR: likely location of problem given in stack below
>> [6]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [6]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [6]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [6]PETSC ERROR:       is given.
>> [6]PETSC ERROR: [6] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [6]PETSC ERROR: [6] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [6]PETSC ERROR: [6] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [6]PETSC ERROR: [6] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [6]PETSC ERROR: [6] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [6]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [6]PETSC ERROR: Signal received
>> [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [6]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [6]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [6]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [6]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> [7]PETSC ERROR: ------------------------------------------------------------------------
>> [7]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [7]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [7]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [7]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [7]PETSC ERROR: likely location of problem given in stack below
>> [7]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [7]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [7]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [7]PETSC ERROR:       is given.
>> [7]PETSC ERROR: [7] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [7]PETSC ERROR: [7] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [7]PETSC ERROR: [7] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [7]PETSC ERROR: [7] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [7]PETSC ERROR: [7] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [7]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [7]PETSC ERROR: Signal received
>> [7]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [7]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [7]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [7]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [7]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> [0]PETSC ERROR: ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [0]PETSC ERROR: likely location of problem given in stack below
>> [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [0]PETSC ERROR:       is given.
>> [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [0]PETSC ERROR: [0] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [0]PETSC ERROR: [0] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [0]PETSC ERROR: [0] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [0]PETSC ERROR: Signal received
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [0]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [0]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> [1]PETSC ERROR: ------------------------------------------------------------------------
>> [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [1]PETSC ERROR: likely location of problem given in stack below
>> [1]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [1]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [1]PETSC ERROR:       is given.
>> [1]PETSC ERROR: [1] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [1]PETSC ERROR: [1] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [1]PETSC ERROR: [1] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [1]PETSC ERROR: [1] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [1]PETSC ERROR: [1] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [1]PETSC ERROR: Signal received
>> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [1]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [1]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [1]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> [2]PETSC ERROR: ------------------------------------------------------------------------
>> [2]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [2]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [2]PETSC ERROR: likely location of problem given in stack below
>> [2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [2]PETSC ERROR:       is given.
>> [2]PETSC ERROR: [2] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [2]PETSC ERROR: [2] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [2]PETSC ERROR: [2] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [2]PETSC ERROR: [2] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [2]PETSC ERROR: [2] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [2]PETSC ERROR: Signal received
>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [2]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [2]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [2]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [2]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> [4]PETSC ERROR: ------------------------------------------------------------------------
>> [4]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [4]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [4]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [4]PETSC ERROR: likely location of problem given in stack below
>> [4]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [4]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [4]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [4]PETSC ERROR:       is given.
>> [4]PETSC ERROR: [4] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [4]PETSC ERROR: [4] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [4]PETSC ERROR: [4] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [4]PETSC ERROR: [4] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [4]PETSC ERROR: [4] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [4]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [4]PETSC ERROR: Signal received
>> [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [4]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [4]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [4]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [4]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> [5]PETSC ERROR: ------------------------------------------------------------------------
>> [5]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
>> [5]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [5]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [5]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [5]PETSC ERROR: likely location of problem given in stack below
>> [5]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [5]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [5]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [5]PETSC ERROR:       is given.
>> [5]PETSC ERROR: [5] SuperLU_DIST:pdgssvx line 161 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [5]PETSC ERROR: [5] MatSolve_SuperLU_DIST line 121 /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
>> [5]PETSC ERROR: [5] MatSolve line 3104 /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c
>> [5]PETSC ERROR: [5] PCApply_LU line 194 /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c
>> [5]PETSC ERROR: [5] KSP_PCApplyBAorAB line 258 /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h
>> [5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [5]PETSC ERROR: Signal received
>> [5]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [5]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 
>> [5]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by simpson Sat Aug 22 17:01:41 2015
>> [5]PETSC ERROR: Configure options --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a --with-lapack-lib=/liblapack.a --download-suitesparse=yes --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes
>> [5]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> 
>> -gideon
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150822/ff8dc9a6/attachment-0001.html>


More information about the petsc-users mailing list