[petsc-users] SEGV on KSPSolve with mutliple processors

Dave May dave.mayhem23 at gmail.com
Tue Jun 18 14:15:20 CDT 2013


You should recompile your code using a debug build of petsc so you get some
meaningful info from the stack trace when the Segv occurs.

Dave


On Tuesday, 18 June 2013, Brendan C Lyons wrote:

> Hi everyone,
>
> I've run into a strange problem in my Fortran 90 code where it runs fine
> with 1 processor, but then throws a segmentation fault on KSPSolve() when I
> try to run it in parallel.  I'm using PETSc 3.3 with the SuperLU direct
> solver for the sequential case and SuperLU_dist for the parallel case.
>  I've called KSPView before and after KSPSolve.  I'll put the KSPView
> output for the sequential and parallel cases and the crash info for the
> parallel case below (with some details of my system redacted).  Any help
> would be appreciated.  If you need any other information, I'm happy to
> provide it.
>
> Thank you,
>
> ~Brendan
> ------------------------------
>
> KSPView() before sequential solve:
>
> KSP Object: 1 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using DEFAULT norm type for convergence test
> PC Object: 1 MPI processes
>   type: lu
>     LU: out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: nd
>   linear system matrix = precond matrix:
>   Matrix Object:   1 MPI processes
>     type: seqaij
>     rows=11760, cols=11760
>     total: nonzeros=506586, allocated nonzeros=509061
>     total number of mallocs used during MatSetValues calls =0
>       not using I-node routines
>
> KSPView() after sequential solve:
>
> KSP Object: 1 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object: 1 MPI processes
>   type: lu
>     LU: out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: nd
>     factor fill ratio given 0, needed 0
>       Factored matrix follows:
>         Matrix Object:         1 MPI processes
>           type: seqaij
>           rows=11760, cols=11760
>           package used to perform factorization: superlu
>           total: nonzeros=0, allocated nonzeros=0
>           total number of mallocs used during MatSetValues calls =0
>             SuperLU run parameters:
>               Equil: NO
>               ColPerm: 3
>               IterRefine: 0
>               SymmetricMode: NO
>               DiagPivotThresh: 1
>               PivotGrowth: NO
>               ConditionNumber: NO
>               RowPerm: 0
>               ReplaceTinyPivot: NO
>               PrintStat: NO
>               lwork: 0
>   linear system matrix = precond matrix:
>   Matrix Object:   1 MPI processes
>     type: seqaij
>     rows=11760, cols=11760
>     total: nonzeros=506586, allocated nonzeros=509061
>     total number of mallocs used during MatSetValues calls =0
>       not using I-node routines
>
>
> KSPView() before parallel solve:
>
> KSP Object: 2 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using DEFAULT norm type for convergence test
> PC Object: 2 MPI processes
>   type: lu
>     LU: out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: natural
>   linear system matrix = precond matrix:
>       Solving Electron Matrix Equation
>   Matrix Object:   2 MPI processes
>     type: mpiaij
>     rows=11760, cols=11760
>     total: nonzeros=506586, allocated nonzeros=520821
>     total number of mallocs used during MatSetValues calls =0
>       not using I-node (on process 0) routines
>
> Crash info for parallel solve:
>
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSCERROR: or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [1]PETSC ERROR: to get more information on the crash.
> [1]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [1]PETSC ERROR: Signal received!
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34
> CST 2013
> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [1]PETSC ERROR: See docs/index.html for manual pages.
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: <redacted> on a path-ompi named <redacted>
> [1]PETSC ERROR: Libraries linked from <redacted>
> [1]PETSC ERROR: Configure run at Thu Mar 21 14:19:42 2013
> [1]PETSC ERROR: Configure options --PETSC_ARCH=path-ompi
> --PETSC_DIR=<redacted> --CFLAGS="-fPIC -O -mp" --CXXFLAGS="-fPIC -O -mp"
> --FFLAGS="-fPIC -O -mp" --with-debugging=0 --with-dynamic-loadin=no
> --with-mpi=1 --with-mpi-dir=<redacted> --with-superlu=1
> --with-superlu-dir=<redacted> --with-blas-lapack-lib="<redacted>"
> --with-scalapack=1 --with-scalapack-dir=<redacted> --with-superlu_dist=1
> --with-superlu_dist-dir=<redacted> --with-metis=1
> --with-metis-dir=<redacted> --with-parmetis=1
> --with-parmetis-dir=<redacted> --with-blacs-lib="<redacted>"
> --with-blacs-include=<redacted> --with-hypre=1 --download-hypre=1
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130618/a215c7cf/attachment.html>


More information about the petsc-users mailing list