[petsc-users] Segmentation violation

Zhang, Hong hzhang at mcs.anl.gov
Tue Oct 30 20:41:06 CDT 2018


Santiago,
The shift '-eps_target -2e-3+1.01i' is very close to the eigenvalues. What happens if you pick a target little away from your eigenvalues?
I suspect mumps encounters a zero pivot during numerical factorization. There are options to handle it, but I need matrices A and B to investigate.
I am not sure if the problem comes from memory bug.
Anyway, I'm cc'ing mumps developers here.

Hong

On Tue, Oct 30, 2018 at 8:09 PM Smith, Barry F. via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

  Yeah this doesn't look good for MUMPS but isn't for sure the problem either.

   The valgrind output should be sent to the MUMPS developers.

   Hong,

         Can you send this to the MUMPS developers and see what they say?

    Thanks

   Barry


> On Oct 30, 2018, at 2:04 PM, Santiago Andres Triana <repepo at gmail.com<mailto:repepo at gmail.com>> wrote:
>
> This is the output of
> mpiexec -n 2 valgrind --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc -eps_nev 4 -eps_target -2e-3+1.01i -st_type sinvert
>
> Generalized eigenproblem stored in file.
>
>  Reading COMPLEX matrices from binary files...
> [1]PETSC ERROR: ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [1]PETSC ERROR: likely location of problem given in stack below
> [1]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [1]PETSC ERROR:       INSTEAD the line number of the start of the function
> [1]PETSC ERROR:       is given.
> [1]PETSC ERROR: [1] MatFactorNumeric_MUMPS line 1205 /home/spin2/petsc-3.10.2/src/mat/impls/aij/mpi/mumps/mumps.c
> [1]PETSC ERROR: [1] MatLUFactorNumeric line 3054 /home/spin2/petsc-3.10.2/src/mat/interface/matrix.c
> [1]PETSC ERROR: [1] PCSetUp_LU line 59 /home/spin2/petsc-3.10.2/src/ksp/pc/impls/factor/lu/lu.c
> [1]PETSC ERROR: [1] PCSetUp line 894 /home/spin2/petsc-3.10.2/src/ksp/pc/interface/precon.c
> [1]PETSC ERROR: [1] KSPSetUp line 304 /home/spin2/petsc-3.10.2/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: [1] STSetUp_Sinvert line 96 /home/spin2/slepc-3.10.1/src/sys/classes/st/impls/sinvert/sinvert.c
> [1]PETSC ERROR: [1] STSetUp line 233 /home/spin2/slepc-3.10.1/src/sys/classes/st/interface/stsolve.c
> [1]PETSC ERROR: [1] EPSSetUp line 104 /home/spin2/slepc-3.10.1/src/eps/interface/epssetup.c
> [1]PETSC ERROR: [1] EPSSolve line 129 /home/spin2/slepc-3.10.1/src/eps/interface/epssolve.c
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Signal received
> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.10.2, Oct, 09, 2018
> [1]PETSC ERROR: ./ex7 on a arch-linux2-c-opt named wobble-wkst-as by spin2 Tue Oct 30 19:42:18 2018
> [1]PETSC ERROR: Configure options --download-mpich -with-scalar-type=complex --download-mumps --download-parmetis --download-metis --download-scalapack --download-fblaslapack --with-debugging=1 --download-superlu_dist --download-ptscotch
> [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
>
>
>
> and one of the two valgrind logs (the other was empty):
>
> ==63004== Use of uninitialised value of size 8
> ==63004==    at 0x694F8FF: zmumps_redistribution_ (zfac_distrib_distentry.F:367)
> ==63004==    by 0x68E1266: zmumps_fac_driver_ (zfac_driver.F:1777)
> ==63004==    by 0x6869F63: zmumps_ (zmumps_driver.F:1686)
> ==63004==    by 0x6861B64: zmumps_f77_ (zmumps_f77.F:267)
> ==63004==    by 0x685FB43: zmumps_c (mumps_c.c:417)
> ==63004==    by 0x5B741CD: MatFactorNumeric_MUMPS (mumps.c:1227)
> ==63004==    by 0x53C3DDB: MatLUFactorNumeric (matrix.c:3065)
> ==63004==    by 0x626E652: PCSetUp_LU (lu.c:131)
> ==63004==    by 0x6387B8D: PCSetUp (precon.c:932)
> ==63004==    by 0x649CD41: KSPSetUp (itfunc.c:391)
> ==63004==    by 0x4A0E3F7: STSetUp_Sinvert (sinvert.c:132)
> ==63004==    by 0x4A4033F: STSetUp (stsolve.c:271)
> ==63004==    by 0x4B586FB: EPSSetUp (epssetup.c:263)
> ==63004==    by 0x4B5D43A: EPSSolve (epssolve.c:135)
> ==63004==    by 0x10B6FD: main (ex7.c:134)
> ==63004==
> ==63004== Invalid read of size 4
> ==63004==    at 0x694F8FF: zmumps_redistribution_ (zfac_distrib_distentry.F:367)
> ==63004==    by 0x68E1266: zmumps_fac_driver_ (zfac_driver.F:1777)
> ==63004==    by 0x6869F63: zmumps_ (zmumps_driver.F:1686)
> ==63004==    by 0x6861B64: zmumps_f77_ (zmumps_f77.F:267)
> ==63004==    by 0x685FB43: zmumps_c (mumps_c.c:417)
> ==63004==    by 0x5B741CD: MatFactorNumeric_MUMPS (mumps.c:1227)
> ==63004==    by 0x53C3DDB: MatLUFactorNumeric (matrix.c:3065)
> ==63004==    by 0x626E652: PCSetUp_LU (lu.c:131)
> ==63004==    by 0x6387B8D: PCSetUp (precon.c:932)
> ==63004==    by 0x649CD41: KSPSetUp (itfunc.c:391)
> ==63004==    by 0x4A0E3F7: STSetUp_Sinvert (sinvert.c:132)
> ==63004==    by 0x4A4033F: STSetUp (stsolve.c:271)
> ==63004==    by 0x4B586FB: EPSSetUp (epssetup.c:263)
> ==63004==    by 0x4B5D43A: EPSSolve (epssolve.c:135)
> ==63004==    by 0x10B6FD: main (ex7.c:134)
> ==63004==  Address 0xe5ffda3101296ca0 is not stack'd, malloc'd or (recently) free'd
> ==63004==
>
>
> Hope it gives enough info. Thanks!
>
>
>
> On Tue, Oct 30, 2018 at 6:50 PM Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:
>
>    Can you run the code on the "failing" machine using valgrind? https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>   Barry
>
>
> > On Oct 30, 2018, at 12:10 PM, Santiago Andres Triana via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
> >
> > Hi petsc-users,
> >
> > I am solving a generalized eigenvalue problem using ex7 in $SLEPC_DIR/src/eps/examples/tutorials/. I provide the A and B matrices.
> > The program runs fine, with correct solutions on 12-core node and also on a mac laptop.
> >
> > However, on a 16-core workstation running Debian testing (fresh install) and also a fresh install of petsc and slepc I get the following error:
> >
> >  $ mpiexec -n 2 ./ex7 -f1 A.petsc -f2 B.petsc -st_type sinvert -eps_nev 4 -eps_target -2e-3+1.01i
> >
> > Generalized eigenproblem stored in file.
> >
> >  Reading COMPLEX matrices from binary files...
> > [1]PETSC ERROR: ------------------------------------------------------------------------
> > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> > [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> > [1]PETSC ERROR: likely location of problem given in stack below
> > [1]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> > [1]PETSC ERROR:       INSTEAD the line number of the start of the function
> > [1]PETSC ERROR:       is given.
> > [1]PETSC ERROR: [1] MatFactorNumeric_MUMPS line 1205 /home/spin2/petsc-3.10.2/src/mat/impls/aij/mpi/mumps/mumps.c
> > [1]PETSC ERROR: [1] MatLUFactorNumeric line 3054 /home/spin2/petsc-3.10.2/src/mat/interface/matrix.c
> > [1]PETSC ERROR: [1] PCSetUp_LU line 59 /home/spin2/petsc-3.10.2/src/ksp/pc/impls/factor/lu/lu.c
> > [1]PETSC ERROR: [1] PCSetUp line 894 /home/spin2/petsc-3.10.2/src/ksp/pc/interface/precon.c
> > [1]PETSC ERROR: [1] KSPSetUp line 304 /home/spin2/petsc-3.10.2/src/ksp/ksp/interface/itfunc.c
> > [1]PETSC ERROR: [1] STSetUp_Sinvert line 96 /home/spin2/slepc-3.10.1/src/sys/classes/st/impls/sinvert/sinvert.c
> > [1]PETSC ERROR: [1] STSetUp line 233 /home/spin2/slepc-3.10.1/src/sys/classes/st/interface/stsolve.c
> > [1]PETSC ERROR: [1] EPSSetUp line 104 /home/spin2/slepc-3.10.1/src/eps/interface/epssetup.c
> > [1]PETSC ERROR: [1] EPSSolve line 129 /home/spin2/slepc-3.10.1/src/eps/interface/epssolve.c
> > [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > [1]PETSC ERROR: Signal received
> > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > [1]PETSC ERROR: Petsc Release Version 3.10.2, Oct, 09, 2018
> > [1]PETSC ERROR: ./ex7 on a arch-linux2-c-opt named wobble-wkst-as by spin2 Tue Oct 30 17:40:51 2018
> > [1]PETSC ERROR: Configure options --download-mpich -with-scalar-type=complex --download-mumps --download-parmetis --download-metis --download-scalapack --download-fblaslapack --with-debugging=1 --download-superlu_dist --download-ptscotch
> > [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
> >
> >
> >
> > the expected output is the following (on a compute node running petsc-3.9.2 and also on a mac laptop running petsc-3.10.2):
> >
> > $ mpiexec -n 2 ./ex7 -f1 A.petsc -f2 B.petsc -st_type sinvert -eps_nev 4 -eps_target -2e-3+1.01i
> >
> > Generalized eigenproblem stored in file.
> >
> >  Reading COMPLEX matrices from binary files...
> >  Number of iterations of the method: 2
> >  Number of linear iterations of the method: 27
> >  Solution method: krylovschur
> >
> >  Number of requested eigenvalues: 4
> >  Stopping condition: tol=1e-08, maxit=63157
> >  Linear eigensolve converged (4 eigenpairs) due to CONVERGED_TOL; iterations 2
> >  ---------------------- --------------------
> >             k             ||Ax-kBx||/||kx||
> >  ---------------------- --------------------
> >   -0.002806+1.009827i       2.00821e-19
> >   -0.002980+1.008417i       8.08359e-17
> >   -0.002676+1.011755i       9.49342e-18
> >   -0.003201+1.007367i       1.50869e-16
> >  ---------------------- --------------------
> >
> >
> > Just in case, the matrices can be downloaded from here if any one wants to give them a try
> > https://www.dropbox.com/s/ejpa9owkv8tjnwi/A.petsc?dl=0
> > https://www.dropbox.com/s/urjtxaezl0cv3om/B.petsc?dl=0
> >
> >
> > I tried different petsc/slepc versions to no avail, including an OS reinstall. So any help would be highly appreciated. Thanks in advance!
> >
> > Santiago
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181031/149427fb/attachment-0001.html>


More information about the petsc-users mailing list