[petsc-users] SEGV on KSPSolve with mutliple processors
Dave May
dave.mayhem23 at gmail.com
Tue Jun 18 14:15:20 CDT 2013
You should recompile your code using a debug build of petsc so you get some
meaningful info from the stack trace when the Segv occurs.
Dave
On Tuesday, 18 June 2013, Brendan C Lyons wrote:
> Hi everyone,
>
> I've run into a strange problem in my Fortran 90 code where it runs fine
> with 1 processor, but then throws a segmentation fault on KSPSolve() when I
> try to run it in parallel. I'm using PETSc 3.3 with the SuperLU direct
> solver for the sequential case and SuperLU_dist for the parallel case.
> I've called KSPView before and after KSPSolve. I'll put the KSPView
> output for the sequential and parallel cases and the crash info for the
> parallel case below (with some details of my system redacted). Any help
> would be appreciated. If you need any other information, I'm happy to
> provide it.
>
> Thank you,
>
> ~Brendan
> ------------------------------
>
> KSPView() before sequential solve:
>
> KSP Object: 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: 1 MPI processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: nd
> linear system matrix = precond matrix:
> Matrix Object: 1 MPI processes
> type: seqaij
> rows=11760, cols=11760
> total: nonzeros=506586, allocated nonzeros=509061
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
>
> KSPView() after sequential solve:
>
> KSP Object: 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using NONE norm type for convergence test
> PC Object: 1 MPI processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: nd
> factor fill ratio given 0, needed 0
> Factored matrix follows:
> Matrix Object: 1 MPI processes
> type: seqaij
> rows=11760, cols=11760
> package used to perform factorization: superlu
> total: nonzeros=0, allocated nonzeros=0
> total number of mallocs used during MatSetValues calls =0
> SuperLU run parameters:
> Equil: NO
> ColPerm: 3
> IterRefine: 0
> SymmetricMode: NO
> DiagPivotThresh: 1
> PivotGrowth: NO
> ConditionNumber: NO
> RowPerm: 0
> ReplaceTinyPivot: NO
> PrintStat: NO
> lwork: 0
> linear system matrix = precond matrix:
> Matrix Object: 1 MPI processes
> type: seqaij
> rows=11760, cols=11760
> total: nonzeros=506586, allocated nonzeros=509061
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
>
>
> KSPView() before parallel solve:
>
> KSP Object: 2 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: 2 MPI processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> linear system matrix = precond matrix:
> Solving Electron Matrix Equation
> Matrix Object: 2 MPI processes
> type: mpiaij
> rows=11760, cols=11760
> total: nonzeros=506586, allocated nonzeros=520821
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
> Crash info for parallel solve:
>
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSCERROR: or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [1]PETSC ERROR: to get more information on the crash.
> [1]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [1]PETSC ERROR: Signal received!
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34
> CST 2013
> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [1]PETSC ERROR: See docs/index.html for manual pages.
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: <redacted> on a path-ompi named <redacted>
> [1]PETSC ERROR: Libraries linked from <redacted>
> [1]PETSC ERROR: Configure run at Thu Mar 21 14:19:42 2013
> [1]PETSC ERROR: Configure options --PETSC_ARCH=path-ompi
> --PETSC_DIR=<redacted> --CFLAGS="-fPIC -O -mp" --CXXFLAGS="-fPIC -O -mp"
> --FFLAGS="-fPIC -O -mp" --with-debugging=0 --with-dynamic-loadin=no
> --with-mpi=1 --with-mpi-dir=<redacted> --with-superlu=1
> --with-superlu-dir=<redacted> --with-blas-lapack-lib="<redacted>"
> --with-scalapack=1 --with-scalapack-dir=<redacted> --with-superlu_dist=1
> --with-superlu_dist-dir=<redacted> --with-metis=1
> --with-metis-dir=<redacted> --with-parmetis=1
> --with-parmetis-dir=<redacted> --with-blacs-lib="<redacted>"
> --with-blacs-include=<redacted> --with-hypre=1 --download-hypre=1
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130618/a215c7cf/attachment.html>
More information about the petsc-users
mailing list