[petsc-users] Strange behavior of MatLUFactorNumeric()
Barry Smith
bsmith at mcs.anl.gov
Tue Aug 14 21:54:23 CDT 2012
On Aug 14, 2012, at 6:05 PM, Jinquan Zhong <jzhong at scsolutions.com> wrote:
> Barry,
>
> The machine I ran this program does not have valgrind.
>
> Another interesting observation is that when I ran the same three matrices using PETSc3.2. MatLUFactorNumeric() hanged up even on N=75, 2028 till I specified -mat_superlu_dist_colperm. However, MatLUFactorNumeric() didn't work for N=21180 either even I used
>
> -mat_superlu_dist_rowperm NATURAL -mat_superlu_dist_colperm NATURAL -mat_superlu_dist_parsymbfact YES
>
> I suspect that there is something incompatible in the factored matrix from superLU-dist to be used MatLUFactorNumeric() in PETSc3.2. Although PETSc 3.3 fixed this issue for matrix with small N, however, this issue relapsed for large N in PETSc3.3.
It is using Superlu_dist for this factorization (and that version changed with PETSc 3.3) the problem is with Superlu_Dist not PETSc. valgrind will likely find an error in SuperLU_dist
Barry
>
> Jinquan
>
>
> -----Original Message-----
> From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Barry Smith
> Sent: Tuesday, August 14, 2012 3:55 PM
> To: PETSc users list
> Subject: Re: [petsc-users] Strange behavior of MatLUFactorNumeric()
>
>
> Can you run with valgrind
>
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>
>
> On Aug 14, 2012, at 5:39 PM, Jinquan Zhong <jzhong at scsolutions.com> wrote:
>
>> Thanks, Matt.
>>
>> 1. Yes, I have checked the returned values from x obtained from
>> MatSolve(F,b,x)
>>
>> The norm error check for x is complete for N=75, 2028.
>>
>> 2. Good point, Matt. Here is the complete message for Rank 391. The others are similar to this one.
>>
>>
>> [391]PETSC ERROR:
>> ----------------------------------------------------------------------
>> -- [391]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>> Violation, probably memory access out of range [391]PETSC ERROR: Try
>> option -start_in_debugger or -on_error_attach_debugger [391]PETSC
>> ERROR: or see
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[391]PETSC
>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>> find memory corruption errors [391]PETSC ERROR: likely location of
>> problem given in stack below [391]PETSC ERROR: ---------------------
>> Stack Frames ------------------------------------
>> [391]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [391]PETSC ERROR: INSTEAD the line number of the start of the function
>> [391]PETSC ERROR: is given.
>> [391]PETSC ERROR: [391] MatLUFactorNumeric_SuperLU_DIST line 284
>> /nfs/06/com0488/programs/libraries/PETSc/petsc-3.3-p2/src/mat/impls/ai
>> j/mpi/superlu_dist/superlu_dist.c [391]PETSC ERROR: [391]
>> MatLUFactorNumeric line 2778
>> /nfs/06/com0488/programs/libraries/PETSc/petsc-3.3-p2/src/mat/interfac
>> e/matrix.c [391]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [391]PETSC ERROR: Signal received!
>> [391]PETSC ERROR:
>> ----------------------------------------------------------------------
>> -- [391]PETSC ERROR: Petsc Release Version 3.3.0, Patch 2, Fri Jul 13
>> 15:42:00 CDT 2012 [391]PETSC ERROR: See docs/changes/index.html for
>> recent updates.
>> [391]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [391]PETSC ERROR: See docs/index.html for manual pages.
>> [391]PETSC ERROR:
>> ----------------------------------------------------------------------
>> -- [391]PETSC ERROR: /nfs/06/com0488/programs/examples/ZSOL0.2431/ZSOL
>> on a arch-linu named n0272.ten.osc.edu by com0488 Sun Aug 12 23:18:07
>> 2012 [391]PETSC ERROR: Libraries linked from
>> /nfs/06/com0488/programs/libraries/PETSc/petsc-3.3-p2/arch-linux2-cxx-
>> debug/lib [391]PETSC ERROR: Configure run at Fri Aug 3 17:44:00 2012
>> [391]PETSC ERROR: Configure options
>> --with-blas-lib=/nfs/06/com0488/programs/libraries/ScaLAPACK/2.0.1/lib
>> /librefblas.a
>> --with-lapack-lib=/nfs/06/com0488/programs/libraries/ScaLAPACK/2.0.1/l
>> ib/libreflapack.a --download-blacs --download-scalapack
>> --with-mpi-dir=/usr/local/mvapich2/1.7-gnu
>> --with-mpiexec=/usr/local/bin/mpiexec --with-scalar-type=complex
>> --with-precision=double --with-clanguage=cxx
>> --with-fortran-kernels=generic --download-mumps
>> --download-superlu_dist --download-parmetis --download-metis
>> --with-fortran-interfaces[391]PETSC ERROR:
>> ----------------------------------------------------------------------
>> -- [391]PETSC ERROR: User provided function() line 0 in unknown
>> directory unknown file
>> [cli_391]: aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 391
>>
>>
>> From: petsc-users-bounces at mcs.anl.gov
>> [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Matthew Knepley
>> Sent: Tuesday, August 14, 2012 3:34 PM
>> To: PETSc users list
>> Subject: Re: [petsc-users] Strange behavior of MatLUFactorNumeric()
>>
>> On Tue, Aug 14, 2012 at 5:26 PM, Jinquan Zhong <jzhong at scsolutions.com> wrote:
>> Dear PETSc folks,
>>
>> I have a strange observation on using MatLUFactorNumeric() for dense matrices at different order N. Here is the situation I have:
>>
>> 1. I use ./src/mat/tests/ex137.c as an example to direct PETSc in selecting superLU-dist and mumps. The calling sequence is
>>
>> MatGetOrdering(A,...)
>>
>> MatGetFactor(A,...)
>>
>> MatLUFactorSymbolic(F, A,...)
>>
>> MatLUFactorNumeric(F, A,...)
>>
>> MatSolve(F,b,x)
>>
>> 2. I have three dense matrices A at three different dimensions: N=75, 2028 and 21180.
>>
>> 3. The calling sequence works for N=75 and 2028. But when N=21180, the program hanged up when calling MatLUFactorNumeric(...). Seemed to be a segmentation fault with the following error message:
>>
>>
>>
>> [1]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [1]PETSC ERROR: Signal received!
>>
>> ALWAYS send the entire error message. How can we tell anything from a small snippet?
>>
>> Since you have [1], this was run in parallel, so you need 3rd party
>> packages. But you do not seem to be checking return values. Check them
>> to make sure those packages are installed correctly.
>>
>> Matt
>>
>> Does anybody have similar experience on that?
>>
>> Thanks a lot!
>>
>> Jinquan
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>
More information about the petsc-users
mailing list