[petsc-users] SEGV on KSPSolve with mutliple processors

Barry Smith bsmith at mcs.anl.gov
Tue Jun 18 15:52:30 CDT 2013


   If possible you would also benefit from running the debug version under valgrind http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind   it is possible memory corruption has taken place before the point where the code crashes, valgrind will help identify any memory corruption as soon as it takes place.

    Barry

On Jun 18, 2013, at 2:15 PM, Dave May <dave.mayhem23 at gmail.com> wrote:

> You should recompile your code using a debug build of petsc so you get some meaningful info from the stack trace when the Segv occurs.
> 
> Dave
> 
> 
> On Tuesday, 18 June 2013, Brendan C Lyons wrote:
> Hi everyone,
> 
> I've run into a strange problem in my Fortran 90 code where it runs fine with 1 processor, but then throws a segmentation fault on KSPSolve() when I try to run it in parallel.  I'm using PETSc 3.3 with the SuperLU direct solver for the sequential case and SuperLU_dist for the parallel case.  I've called KSPView before and after KSPSolve.  I'll put the KSPView output for the sequential and parallel cases and the crash info for the parallel case below (with some details of my system redacted).  Any help would be appreciated.  If you need any other information, I'm happy to provide it.
> 
> Thank you,
> 
> ~Brendan
> ------------------------------
> 
> KSPView() before sequential solve:  
> 
> KSP Object: 1 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using DEFAULT norm type for convergence test
> PC Object: 1 MPI processes
>   type: lu
>     LU: out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: nd
>   linear system matrix = precond matrix:
>   Matrix Object:   1 MPI processes
>     type: seqaij
>     rows=11760, cols=11760
>     total: nonzeros=506586, allocated nonzeros=509061
>     total number of mallocs used during MatSetValues calls =0
>       not using I-node routines
> 
> KSPView() after sequential solve:
>   
> KSP Object: 1 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object: 1 MPI processes
>   type: lu
>     LU: out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: nd
>     factor fill ratio given 0, needed 0
>       Factored matrix follows:
>         Matrix Object:         1 MPI processes
>           type: seqaij
>           rows=11760, cols=11760
>           package used to perform factorization: superlu
>           total: nonzeros=0, allocated nonzeros=0
>           total number of mallocs used during MatSetValues calls =0
>             SuperLU run parameters:
>               Equil: NO
>               ColPerm: 3
>               IterRefine: 0
>               SymmetricMode: NO
>               DiagPivotThresh: 1
>               PivotGrowth: NO
>               ConditionNumber: NO
>               RowPerm: 0
>               ReplaceTinyPivot: NO
>               PrintStat: NO
>               lwork: 0
>   linear system matrix = precond matrix:
>   Matrix Object:   1 MPI processes
>     type: seqaij
>     rows=11760, cols=11760
>     total: nonzeros=506586, allocated nonzeros=509061
>     total number of mallocs used during MatSetValues calls =0
>       not using I-node routines
> 
> 
> KSPView() before parallel solve:  
> 
> KSP Object: 2 MPI processes
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using DEFAULT norm type for convergence test
> PC Object: 2 MPI processes
>   type: lu
>     LU: out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: natural
>   linear system matrix = precond matrix:
>       Solving Electron Matrix Equation
>   Matrix Object:   2 MPI processes
>     type: mpiaij
>     rows=11760, cols=11760
>     total: nonzeros=506586, allocated nonzeros=520821
>     total number of mallocs used during MatSetValues calls =0
>       not using I-node (on process 0) routines
> 
> Crash info for parallel solve:
> 
> [1]PETSC ERROR: ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
> [1]PETSC ERROR: to get more information on the crash.
> [1]PETSC ERROR: --------------------- Error Message ------------------------------------
> [1]PETSC ERROR: Signal received!
> [1]PETSC ERROR: ------------------------------------------------------------------------
> [1]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 
> [1]PETSC ERROR: See docs/changes/index.html for recent updates.
> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [1]PETSC ERROR: See docs/index.html for manual pages.
> [1]PETSC ERROR: ------------------------------------------------------------------------
> [1]PETSC ERROR: <redacted> on a path-ompi named <redacted>
> [1]PETSC ERROR: Libraries linked from <redacted>
> [1]PETSC ERROR: Configure run at Thu Mar 21 14:19:42 2013
> [1]PETSC ERROR: Configure options --PETSC_ARCH=path-ompi --PETSC_DIR=<redacted> --CFLAGS="-fPIC -O -mp" --CXXFLAGS="-fPIC -O -mp" --FFLAGS="-fPIC -O -mp" --with-debugging=0 --with-dynamic-loadin=no --with-mpi=1 --with-mpi-dir=<redacted> --with-superlu=1 --with-superlu-dir=<redacted> --with-blas-lapack-lib="<redacted>" --with-scalapack=1 --with-scalapack-dir=<redacted> --with-superlu_dist=1 --with-superlu_dist-dir=<redacted> --with-metis=1 --with-metis-dir=<redacted> --with-parmetis=1 --with-parmetis-dir=<redacted> --with-blacs-lib="<redacted>" --with-blacs-include=<redacted> --with-hypre=1 --download-hypre=1
> [1]PETSC ERROR: ------------------------------------------------------------------------
> [1]PETSC ERROR: User provided function() line 0 in unknown directory unknown file
> 
> 
> 



More information about the petsc-users mailing list