[petsc-users] [SLEPc] SIGFPE Arithmetic exception in EPSGD

Smith, Barry F. bsmith at mcs.anl.gov
Thu May 3 22:11:10 CDT 2018


 at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:1112
1112	  if (d->nR[i]/a < data->fix) {

Likely the problem is due to the division by a when a is zero. Perhaps the code needs above a check that a is not zero. Or rewrite the check as

if (d->nR[i] < a*data->fix) {

   Barry


> On May 3, 2018, at 7:58 PM, Harshad Sahasrabudhe <harshad.sahasrabudhe at gmail.com> wrote:
> 
> Hello,
> 
> I am solving for the lowest eigenvalues and eigenvectors of symmetric positive definite matrices in the generalized eigenvalue problem. I am using the GD solver with the default settings of PCBJACOBI. When I run a standalone executable on 16 processes which loads the matrices from a file and solves the eigenproblem, I get converged results in ~600 iterations. I am using PETSc/SLEPc 3.5.4.
> 
> However, when I use the same settings in my software, which uses LibMesh (0.9.5) for FEM discretization, I get a SIGFPE. The backtrace is:
> 
> Program received signal SIGFPE, Arithmetic exception.
> 0x00002aaab377ea26 in dvd_improvex_jd_lit_const_0 (d=0x1d29078, i=0, theta=0x1f396f8, thetai=0x1f39718, maxits=0x7fffffff816c, tol=0x7fffffff8140)
>     at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:1112
> 1112	  if (d->nR[i]/a < data->fix) {
> 
> #0  0x00002aaab377ea26 in dvd_improvex_jd_lit_const_0 (d=0x1d29078, i=0, theta=0x1f396f8, thetai=0x1f39718, maxits=0x7fffffff816c, tol=0x7fffffff8140)
>     at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:1112
> #1  0x00002aaab3774316 in dvd_improvex_jd_gen (d=0x1d29078, r_s=0, r_e=1, size_D=0x7fffffff821c) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:316
> #2  0x00002aaab3731ec4 in dvd_updateV_update_gen (d=0x1d29078) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_updatev.c:360
> #3  0x00002aaab3730296 in dvd_updateV_extrapol (d=0x1d29078) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_updatev.c:193
> #4  0x00002aaab3727cbc in EPSSolve_XD (eps=0x1d0ee10) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/davidson.c:299
> #5  0x00002aaab35bafc8 in EPSSolve (eps=0x1d0ee10) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/interface/epssolve.c:99
> #6  0x00002aaab30dbaf9 in libMesh::SlepcEigenSolver<double>::_solve_generalized_helper (this=0x1b19880, mat_A=0x1c906d0, mat_B=0x1cb16b0, nev=5, ncv=20, tol=9.9999999999999998e-13, m_its=3000) at src/solvers/slepc_eigen_solver.C:519
> #7  0x00002aaab30da56a in libMesh::SlepcEigenSolver<double>::solve_generalized (this=0x1b19880, matrix_A_in=..., matrix_B_in=..., nev=5, ncv=20, tol=9.9999999999999998e-13, m_its=3000) at src/solvers/slepc_eigen_solver.C:316
> #8  0x00002aaab30fb02e in libMesh::EigenSystem::solve (this=0x1b19930) at src/systems/eigen_system.C:241
> #9  0x00002aaab30e48a9 in libMesh::CondensedEigenSystem::solve (this=0x1b19930) at src/systems/condensed_eigen_system.C:106
> #10 0x00002aaaacce0e78 in EMSchrodingerFEM::do_solve (this=0x19d6a90) at EMSchrodingerFEM.cpp:879
> #11 0x00002aaaadaae3e5 in Simulation::solve (this=0x19d6a90) at Simulation.cpp:789
> #12 0x00002aaaad52458b in NonlinearPoissonFEM::do_my_assemble (this=0x19da050, x=..., residual=0x7fffffff9eb0, jacobian=0x0) at NonlinearPoissonFEM.cpp:179
> #13 0x00002aaaad555eec in NonlinearPoisson::my_assemble_residual (x=..., r=..., s=...) at NonlinearPoisson.cpp:1469
> #14 0x00002aaab30c5dc3 in libMesh::__libmesh_petsc_snes_residual (snes=0x1b9ed70, x=0x1a50330, r=0x1a47a50, ctx=0x19e5a60) at src/solvers/petsc_nonlinear_solver.C:137
> #15 0x00002aaab41048b9 in SNESComputeFunction (snes=0x1b9ed70, x=0x1a50330, y=0x1a47a50) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/interface/snes.c:2033
> #16 0x00002aaaad1c9ad8 in SNESShellSolve_PredictorCorrector (snes=0x1b9ed70, vec_sol=0x1a2a5a0) at PredictorCorrectorModule.cpp:413
> #17 0x00002aaab4653e3d in SNESSolve_Shell (snes=0x1b9ed70) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/impls/shell/snesshell.c:167
> #18 0x00002aaab4116fb7 in SNESSolve (snes=0x1b9ed70, b=0x0, x=0x1a2a5a0) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/interface/snes.c:3743
> #19 0x00002aaab30c7c3c in libMesh::PetscNonlinearSolver<double>::solve (this=0x19e5a60, jac_in=..., x_in=..., r_in=...) at src/solvers/petsc_nonlinear_solver.C:714
> #20 0x00002aaab3136ad9 in libMesh::NonlinearImplicitSystem::solve (this=0x19e4b80) at src/systems/nonlinear_implicit_system.C:183
> #21 0x00002aaaad5791f3 in NonlinearPoisson::execute_solver (this=0x19da050) at NonlinearPoisson.cpp:1218
> #22 0x00002aaaad554a99 in NonlinearPoisson::do_solve (this=0x19da050) at NonlinearPoisson.cpp:961
> #23 0x00002aaaadaae3e5 in Simulation::solve (this=0x19da050) at Simulation.cpp:789
> #24 0x00002aaaad1c9657 in PredictorCorrectorModule::do_solve (this=0x19c0210) at PredictorCorrectorModule.cpp:334
> #25 0x00002aaaadaae3e5 in Simulation::solve (this=0x19c0210) at Simulation.cpp:789
> #26 0x00002aaaad9e8f4a in Nemo::run_simulations (this=0x63ba80 <Nemo::instance()::impl>) at Nemo.cpp:1367
> #27 0x0000000000426f36 in main (argc=2, argv=0x7fffffffd0f8) at main.cpp:452
> 
> 
> Here is the log_view from the standalone executable:
> 
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> 
> ./libmesh_solve_eigenproblem on a linux named conte-a373.rcac.purdue.edu with 16 processors, by hsahasra Thu May  3 20:56:03 2018
> Using Petsc Release Version 3.5.4, May, 23, 2015 
> 
>                          Max       Max/Min        Avg      Total 
> Time (sec):           2.628e+01      1.00158   2.625e+01
> Objects:              6.400e+03      1.00000   6.400e+03
> Flops:                3.576e+09      1.00908   3.564e+09  5.702e+10
> Flops/sec:            1.363e+08      1.00907   1.358e+08  2.172e+09
> MPI Messages:         1.808e+04      2.74920   1.192e+04  1.907e+05
> MPI Message Lengths:  4.500e+07      1.61013   3.219e+03  6.139e+08
> MPI Reductions:       8.522e+03      1.00000
> 
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of length N --> 8N flops
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>  0:      Main Stage: 2.6254e+01 100.0%  5.7023e+10 100.0%  1.907e+05 100.0%  3.219e+03      100.0%  8.521e+03 100.0% 
> 
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> MatMult             1639 1.0 4.7509e+00 1.7 3.64e+08 1.1 1.9e+05 3.0e+03 0.0e+00 13 10100 93  0  13 10100 93  0  1209
> MatSolve            1045 1.0 6.4188e-01 1.0 2.16e+08 1.1 0.0e+00 0.0e+00 0.0e+00  2  6  0  0  0   2  6  0  0  0  5163
> MatLUFactorNum         1 1.0 2.0798e-02 3.5 9.18e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   664
> MatILUFactorSym        1 1.0 1.1777e-02 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       4 1.0 1.3677e-01 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         4 1.0 3.7882e-02 1.3 0.00e+00 0.0 4.6e+02 7.5e+02 1.6e+01  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 7.1526e-06 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 2.3198e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries        33 1.0 1.1992e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatLoad                2 1.0 2.9271e-01 1.0 0.00e+00 0.0 5.5e+02 7.6e+04 2.6e+01  1  0  0  7  0   1  0  0  7  0     0
> VecCopy             2096 1.0 4.0181e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1047 1.0 1.7598e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     1639 1.0 4.3395e-01 2.0 0.00e+00 0.0 1.9e+05 3.0e+03 0.0e+00  1  0100 93  0   1  0100 93  0     0
> VecScatterEnd       1639 1.0 3.2399e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     0
> VecReduceArith      2096 1.0 5.6402e-02 1.1 3.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  9287
> VecReduceComm       1572 1.0 5.5213e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+03 20  0  0  0 18  20  0  0  0 18     0
> EPSSetUp               1 1.0 9.0121e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 5.2e+01  0  0  0  0  1   0  0  0  0  1     0
> EPSSolve               1 1.0 2.5917e+01 1.0 3.58e+09 1.0 1.9e+05 3.0e+03 8.5e+03 99100100 93100  99100100 93100  2200
> STSetUp                1 1.0 4.8380e-03 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve            1045 1.0 6.8107e-01 1.0 2.17e+08 1.1 0.0e+00 0.0e+00 0.0e+00  3  6  0  0  0   3  6  0  0  0  4886
> PCSetUp                2 1.0 2.3827e-02 2.8 9.18e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   580
> PCApply             1045 1.0 7.0819e-01 1.0 2.17e+08 1.1 0.0e+00 0.0e+00 0.0e+00  3  6  0  0  0   3  6  0  0  0  4699
> BVCreate             529 1.0 3.7145e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.2e+03 11  0  0  0 37  11  0  0  0 37     0
> BVCopy              1048 1.0 1.3941e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> BVMult              3761 1.0 3.6953e+00 1.1 2.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 56  0  0  0  14 56  0  0  0  8674
> BVDot               2675 1.0 9.6611e+00 1.3 1.08e+09 1.0 6.8e+04 3.0e+03 2.7e+03 34 30 36 33 31  34 30 36 33 31  1791
> BVOrthogonalize      526 1.0 4.0705e+00 1.1 7.89e+08 1.0 6.8e+04 3.0e+03 5.9e+02 15 22 36 33  7  15 22 36 33  7  3092
> BVScale             1047 1.0 1.6144e-02 1.1 8.18e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  8105
> BVSetRandom            5 1.0 4.7204e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> BVMatProject        1046 1.0 5.1708e+00 1.4 6.11e+08 1.0 0.0e+00 0.0e+00 1.6e+03 18 17  0  0 18  18 17  0  0 18  1891
> DSSolve              533 1.0 9.7243e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> DSVectors           1048 1.0 1.3440e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> DSOther             2123 1.0 8.8778e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ------------------------------------------------------------------------------------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>               Viewer     3              2         1504     0
>               Matrix  3196           3190     31529868     0
>               Vector  2653           2651    218802920     0
>       Vector Scatter     2              0            0     0
>            Index Set     7              7        84184     0
> Eigenvalue Problem Solver     1              1         4564     0
>          PetscRandom     1              1          632     0
>   Spectral Transform     1              1          828     0
>        Krylov Solver     2              2         2320     0
>       Preconditioner     2              2         1912     0
>        Basis Vectors   530            530      1111328     0
>               Region     1              1          648     0
>        Direct solver     1              1       201200     0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 0.0004704
> Average time for zero size MPI_Send(): 0.000118256
> #PETSc Option Table entries:
> -eps_monitor
> -f1 A.mat
> -f2 B.mat
> -log_view
> -matload_block_size 1
> -ncv 70
> -st_ksp_tol 1e-12
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-x=0 --download-hdf5=1 --with-scalar-type=real --with-single-library=1 --with-pic=1 --with-shared-libraries=0 --with-clanguage=C++ --with-fortran=1 --with-debugging=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --download-metis=1 --download-parmetis=1 --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 --with-fortran-kernels=0 --download-superlu_dist=1 --download-scalapack --download-fblaslapack=1
> -----------------------------------------
> Libraries compiled on Thu Sep 22 10:19:43 2016 on carter-g008.rcac.purdue.edu 
> Machine characteristics: Linux-2.6.32-573.8.1.el6.x86_64-x86_64-with-redhat-6.7-Santiago
> Using PETSc directory: /depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real
> Using PETSc arch: linux
> -----------------------------------------
> 
> Using C compiler: mpicxx  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O   -fPIC   ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -O   ${FOPTFLAGS} ${FFLAGS} 
> -----------------------------------------
> 
> Using include paths: -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/include -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/include -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/include -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/include -I/apps/rhel6/valgrind/3.8.1/include -I/depot/apps/ncn/conte/mpich-3.1/include
> -----------------------------------------
> 
> Using C linker: mpicxx
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -L/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -lpetsc -Wl,-rpath,/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -L/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist_3.3 -lflapack -lfblas -lparmetis -lmetis -lpthread -lssl -lcrypto -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -Wl,-rpath,/depot/apps/ncn/conte/mpich-3.1/lib -L/depot/apps/ncn/conte/mpich-3.1/lib -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -L/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib64 -L/apps/rhel6/gcc/5.2.0/lib64 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib -L/apps/rhel6/gcc/5.2.0/lib -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/depot/apps/ncn/conte/mpich-3.1/lib -L/depot/apps/ncn/conte/mpich-3.1/lib -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -L/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib64 -L/apps/rhel6/gcc/5.2.0/lib64 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib -L/apps/rhel6/gcc/5.2.0/lib -ldl -Wl,-rpath,/depot/apps/ncn/conte/mpich-3.1/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl  
> -----------------------------------------
> 
> Can you please point me to what could be going wrong with the larger software?
> 
> Thanks!
> Harshad



More information about the petsc-users mailing list