[petsc-users] SEGV on KSPSolve with mutliple processors

Brendan C Lyons bclyons at princeton.edu
Tue Jun 18 14:10:06 CDT 2013


Hi everyone,

I've run into a strange problem in my Fortran 90 code where it runs fine
with 1 processor, but then throws a segmentation fault on KSPSolve() when I
try to run it in parallel.  I'm using PETSc 3.3 with the SuperLU direct
solver for the sequential case and SuperLU_dist for the parallel case.
 I've called KSPView before and after KSPSolve.  I'll put the KSPView
output for the sequential and parallel cases and the crash info for the
parallel case below (with some details of my system redacted).  Any help
would be appreciated.  If you need any other information, I'm happy to
provide it.

Thank you,

~Brendan
------------------------------

KSPView() before sequential solve:

KSP Object: 1 MPI processes
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
  left preconditioning
  using DEFAULT norm type for convergence test
PC Object: 1 MPI processes
  type: lu
    LU: out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: nd
  linear system matrix = precond matrix:
  Matrix Object:   1 MPI processes
    type: seqaij
    rows=11760, cols=11760
    total: nonzeros=506586, allocated nonzeros=509061
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines

KSPView() after sequential solve:

KSP Object: 1 MPI processes
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
  left preconditioning
  using NONE norm type for convergence test
PC Object: 1 MPI processes
  type: lu
    LU: out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: nd
    factor fill ratio given 0, needed 0
      Factored matrix follows:
        Matrix Object:         1 MPI processes
          type: seqaij
          rows=11760, cols=11760
          package used to perform factorization: superlu
          total: nonzeros=0, allocated nonzeros=0
          total number of mallocs used during MatSetValues calls =0
            SuperLU run parameters:
              Equil: NO
              ColPerm: 3
              IterRefine: 0
              SymmetricMode: NO
              DiagPivotThresh: 1
              PivotGrowth: NO
              ConditionNumber: NO
              RowPerm: 0
              ReplaceTinyPivot: NO
              PrintStat: NO
              lwork: 0
  linear system matrix = precond matrix:
  Matrix Object:   1 MPI processes
    type: seqaij
    rows=11760, cols=11760
    total: nonzeros=506586, allocated nonzeros=509061
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines


KSPView() before parallel solve:

KSP Object: 2 MPI processes
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
  left preconditioning
  using DEFAULT norm type for convergence test
PC Object: 2 MPI processes
  type: lu
    LU: out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: natural
  linear system matrix = precond matrix:
      Solving Electron Matrix Equation
  Matrix Object:   2 MPI processes
    type: mpiaij
    rows=11760, cols=11760
    total: nonzeros=506586, allocated nonzeros=520821
    total number of mallocs used during MatSetValues calls =0
      not using I-node (on process 0) routines

Crash info for parallel solve:

[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR:
or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
run
[1]PETSC ERROR: to get more information on the crash.
[1]PETSC ERROR: --------------------- Error Message
------------------------------------
[1]PETSC ERROR: Signal received!
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34
CST 2013
[1]PETSC ERROR: See docs/changes/index.html for recent updates.
[1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[1]PETSC ERROR: See docs/index.html for manual pages.
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: <redacted> on a path-ompi named <redacted>
[1]PETSC ERROR: Libraries linked from <redacted>
[1]PETSC ERROR: Configure run at Thu Mar 21 14:19:42 2013
[1]PETSC ERROR: Configure options --PETSC_ARCH=path-ompi
--PETSC_DIR=<redacted> --CFLAGS="-fPIC -O -mp" --CXXFLAGS="-fPIC -O -mp"
--FFLAGS="-fPIC -O -mp" --with-debugging=0 --with-dynamic-loadin=no
--with-mpi=1 --with-mpi-dir=<redacted> --with-superlu=1
--with-superlu-dir=<redacted> --with-blas-lapack-lib="<redacted>"
--with-scalapack=1 --with-scalapack-dir=<redacted> --with-superlu_dist=1
--with-superlu_dist-dir=<redacted> --with-metis=1
--with-metis-dir=<redacted> --with-parmetis=1
--with-parmetis-dir=<redacted> --with-blacs-lib="<redacted>"
--with-blacs-include=<redacted> --with-hypre=1 --download-hypre=1
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130618/abe09a24/attachment.html>


More information about the petsc-users mailing list