[petsc-users] Strange behavior of MatLUFactorNumeric()

Jinquan Zhong jzhong at scsolutions.com
Tue Aug 14 17:39:09 CDT 2012


Thanks, Matt.


1.       Yes, I have checked the returned values from x obtained from

MatSolve(F,b,x)

                The norm error check for x is complete for N=75, 2028.

2.       Good point, Matt.  Here is the complete message for Rank 391.  The others are similar to this one.


[391]PETSC ERROR: ------------------------------------------------------------------------
[391]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[391]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[391]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[391]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[391]PETSC ERROR: likely location of problem given in stack below
[391]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[391]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[391]PETSC ERROR:       INSTEAD the line number of the start of the function
[391]PETSC ERROR:       is given.
[391]PETSC ERROR: [391] MatLUFactorNumeric_SuperLU_DIST line 284 /nfs/06/com0488/programs/libraries/PETSc/petsc-3.3-p2/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[391]PETSC ERROR: [391] MatLUFactorNumeric line 2778 /nfs/06/com0488/programs/libraries/PETSc/petsc-3.3-p2/src/mat/interface/matrix.c
[391]PETSC ERROR: --------------------- Error Message ------------------------------------
[391]PETSC ERROR: Signal received!
[391]PETSC ERROR: ------------------------------------------------------------------------
[391]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13 15:42:00 CDT 2012
[391]PETSC ERROR: See docs/changes/index.html for recent updates.
[391]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[391]PETSC ERROR: See docs/index.html for manual pages.
[391]PETSC ERROR: ------------------------------------------------------------------------
[391]PETSC ERROR: /nfs/06/com0488/programs/examples/ZSOL0.2431/ZSOL on a arch-linu named n0272.ten.osc.edu by com0488 Sun Aug 12 23:18:07 2012
[391]PETSC ERROR: Libraries linked from /nfs/06/com0488/programs/libraries/PETSc/petsc-3.3-p2/arch-linux2-cxx-debug/lib
[391]PETSC ERROR: Configure run at Fri Aug  3 17:44:00 2012
[391]PETSC ERROR: Configure options --with-blas-lib=/nfs/06/com0488/programs/libraries/ScaLAPACK/2.0.1/lib/librefblas.a --with-lapack-lib=/nfs/06/com0488/programs/libraries/ScaLAPACK/2.0.1/lib/libreflapack.a --download-blacs --download-scalapack --with-mpi-dir=/usr/local/mvapich2/1.7-gnu --with-mpiexec=/usr/local/bin/mpiexec --with-scalar-type=complex --with-precision=double --with-clanguage=cxx --with-fortran-kernels=generic --download-mumps --download-superlu_dist --download-parmetis --download-metis --with-fortran-interfaces[391]PETSC ERROR: ------------------------------------------------------------------------
[391]PETSC ERROR: User provided function() line 0 in unknown directory unknown file
[cli_391]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 391


From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Matthew Knepley
Sent: Tuesday, August 14, 2012 3:34 PM
To: PETSc users list
Subject: Re: [petsc-users] Strange behavior of MatLUFactorNumeric()

On Tue, Aug 14, 2012 at 5:26 PM, Jinquan Zhong <jzhong at scsolutions.com<mailto:jzhong at scsolutions.com>> wrote:
Dear PETSc folks,

I have a strange observation on using MatLUFactorNumeric() for dense matrices at different order N.  Here is the situation I have:


1.       I use ./src/mat/tests/ex137.c as an example to direct PETSc in selecting superLU-dist and mumps.  The calling sequence is

MatGetOrdering(A,...)

MatGetFactor(A,...)

MatLUFactorSymbolic(F, A,...)

MatLUFactorNumeric(F, A,...)

MatSolve(F,b,x)

2.       I have three dense matrices A at three different dimensions: N=75, 2028 and 21180.

3.       The calling sequence works for N=75 and 2028.  But when N=21180, the program hanged up when calling MatLUFactorNumeric(...).  Seemed to be a segmentation fault with the following error message:


[1]PETSC ERROR: --------------------- Error Message ------------------------------------
[1]PETSC ERROR: Signal received!

ALWAYS send the entire error message. How can we tell anything from a small snippet?

Since you have [1], this was run in parallel, so you need 3rd party packages. But you do
not seem to be checking return values. Check them to make sure those packages are installed
correctly.

   Matt

Does anybody have similar experience on that?

Thanks a lot!

Jinquan



--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120814/85663473/attachment-0001.html>


More information about the petsc-users mailing list