[petsc-users] [SLEPc] SIGFPE Arithmetic exception in EPSGD
Smith, Barry F.
bsmith at mcs.anl.gov
Thu May 3 22:11:10 CDT 2018
at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:1112
1112 if (d->nR[i]/a < data->fix) {
Likely the problem is due to the division by a when a is zero. Perhaps the code needs above a check that a is not zero. Or rewrite the check as
if (d->nR[i] < a*data->fix) {
Barry
> On May 3, 2018, at 7:58 PM, Harshad Sahasrabudhe <harshad.sahasrabudhe at gmail.com> wrote:
>
> Hello,
>
> I am solving for the lowest eigenvalues and eigenvectors of symmetric positive definite matrices in the generalized eigenvalue problem. I am using the GD solver with the default settings of PCBJACOBI. When I run a standalone executable on 16 processes which loads the matrices from a file and solves the eigenproblem, I get converged results in ~600 iterations. I am using PETSc/SLEPc 3.5.4.
>
> However, when I use the same settings in my software, which uses LibMesh (0.9.5) for FEM discretization, I get a SIGFPE. The backtrace is:
>
> Program received signal SIGFPE, Arithmetic exception.
> 0x00002aaab377ea26 in dvd_improvex_jd_lit_const_0 (d=0x1d29078, i=0, theta=0x1f396f8, thetai=0x1f39718, maxits=0x7fffffff816c, tol=0x7fffffff8140)
> at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:1112
> 1112 if (d->nR[i]/a < data->fix) {
>
> #0 0x00002aaab377ea26 in dvd_improvex_jd_lit_const_0 (d=0x1d29078, i=0, theta=0x1f396f8, thetai=0x1f39718, maxits=0x7fffffff816c, tol=0x7fffffff8140)
> at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:1112
> #1 0x00002aaab3774316 in dvd_improvex_jd_gen (d=0x1d29078, r_s=0, r_e=1, size_D=0x7fffffff821c) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_improvex.c:316
> #2 0x00002aaab3731ec4 in dvd_updateV_update_gen (d=0x1d29078) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_updatev.c:360
> #3 0x00002aaab3730296 in dvd_updateV_extrapol (d=0x1d29078) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/dvd_updatev.c:193
> #4 0x00002aaab3727cbc in EPSSolve_XD (eps=0x1d0ee10) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/impls/davidson/common/davidson.c:299
> #5 0x00002aaab35bafc8 in EPSSolve (eps=0x1d0ee10) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/slepc/build-real/src/eps/interface/epssolve.c:99
> #6 0x00002aaab30dbaf9 in libMesh::SlepcEigenSolver<double>::_solve_generalized_helper (this=0x1b19880, mat_A=0x1c906d0, mat_B=0x1cb16b0, nev=5, ncv=20, tol=9.9999999999999998e-13, m_its=3000) at src/solvers/slepc_eigen_solver.C:519
> #7 0x00002aaab30da56a in libMesh::SlepcEigenSolver<double>::solve_generalized (this=0x1b19880, matrix_A_in=..., matrix_B_in=..., nev=5, ncv=20, tol=9.9999999999999998e-13, m_its=3000) at src/solvers/slepc_eigen_solver.C:316
> #8 0x00002aaab30fb02e in libMesh::EigenSystem::solve (this=0x1b19930) at src/systems/eigen_system.C:241
> #9 0x00002aaab30e48a9 in libMesh::CondensedEigenSystem::solve (this=0x1b19930) at src/systems/condensed_eigen_system.C:106
> #10 0x00002aaaacce0e78 in EMSchrodingerFEM::do_solve (this=0x19d6a90) at EMSchrodingerFEM.cpp:879
> #11 0x00002aaaadaae3e5 in Simulation::solve (this=0x19d6a90) at Simulation.cpp:789
> #12 0x00002aaaad52458b in NonlinearPoissonFEM::do_my_assemble (this=0x19da050, x=..., residual=0x7fffffff9eb0, jacobian=0x0) at NonlinearPoissonFEM.cpp:179
> #13 0x00002aaaad555eec in NonlinearPoisson::my_assemble_residual (x=..., r=..., s=...) at NonlinearPoisson.cpp:1469
> #14 0x00002aaab30c5dc3 in libMesh::__libmesh_petsc_snes_residual (snes=0x1b9ed70, x=0x1a50330, r=0x1a47a50, ctx=0x19e5a60) at src/solvers/petsc_nonlinear_solver.C:137
> #15 0x00002aaab41048b9 in SNESComputeFunction (snes=0x1b9ed70, x=0x1a50330, y=0x1a47a50) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/interface/snes.c:2033
> #16 0x00002aaaad1c9ad8 in SNESShellSolve_PredictorCorrector (snes=0x1b9ed70, vec_sol=0x1a2a5a0) at PredictorCorrectorModule.cpp:413
> #17 0x00002aaab4653e3d in SNESSolve_Shell (snes=0x1b9ed70) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/impls/shell/snesshell.c:167
> #18 0x00002aaab4116fb7 in SNESSolve (snes=0x1b9ed70, b=0x0, x=0x1a2a5a0) at /depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/interface/snes.c:3743
> #19 0x00002aaab30c7c3c in libMesh::PetscNonlinearSolver<double>::solve (this=0x19e5a60, jac_in=..., x_in=..., r_in=...) at src/solvers/petsc_nonlinear_solver.C:714
> #20 0x00002aaab3136ad9 in libMesh::NonlinearImplicitSystem::solve (this=0x19e4b80) at src/systems/nonlinear_implicit_system.C:183
> #21 0x00002aaaad5791f3 in NonlinearPoisson::execute_solver (this=0x19da050) at NonlinearPoisson.cpp:1218
> #22 0x00002aaaad554a99 in NonlinearPoisson::do_solve (this=0x19da050) at NonlinearPoisson.cpp:961
> #23 0x00002aaaadaae3e5 in Simulation::solve (this=0x19da050) at Simulation.cpp:789
> #24 0x00002aaaad1c9657 in PredictorCorrectorModule::do_solve (this=0x19c0210) at PredictorCorrectorModule.cpp:334
> #25 0x00002aaaadaae3e5 in Simulation::solve (this=0x19c0210) at Simulation.cpp:789
> #26 0x00002aaaad9e8f4a in Nemo::run_simulations (this=0x63ba80 <Nemo::instance()::impl>) at Nemo.cpp:1367
> #27 0x0000000000426f36 in main (argc=2, argv=0x7fffffffd0f8) at main.cpp:452
>
>
> Here is the log_view from the standalone executable:
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./libmesh_solve_eigenproblem on a linux named conte-a373.rcac.purdue.edu with 16 processors, by hsahasra Thu May 3 20:56:03 2018
> Using Petsc Release Version 3.5.4, May, 23, 2015
>
> Max Max/Min Avg Total
> Time (sec): 2.628e+01 1.00158 2.625e+01
> Objects: 6.400e+03 1.00000 6.400e+03
> Flops: 3.576e+09 1.00908 3.564e+09 5.702e+10
> Flops/sec: 1.363e+08 1.00907 1.358e+08 2.172e+09
> MPI Messages: 1.808e+04 2.74920 1.192e+04 1.907e+05
> MPI Message Lengths: 4.500e+07 1.61013 3.219e+03 6.139e+08
> MPI Reductions: 8.522e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 2.6254e+01 100.0% 5.7023e+10 100.0% 1.907e+05 100.0% 3.219e+03 100.0% 8.521e+03 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 1639 1.0 4.7509e+00 1.7 3.64e+08 1.1 1.9e+05 3.0e+03 0.0e+00 13 10100 93 0 13 10100 93 0 1209
> MatSolve 1045 1.0 6.4188e-01 1.0 2.16e+08 1.1 0.0e+00 0.0e+00 0.0e+00 2 6 0 0 0 2 6 0 0 0 5163
> MatLUFactorNum 1 1.0 2.0798e-02 3.5 9.18e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 664
> MatILUFactorSym 1 1.0 1.1777e-02 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 4 1.0 1.3677e-01 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 4 1.0 3.7882e-02 1.3 0.00e+00 0.0 4.6e+02 7.5e+02 1.6e+01 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 7.1526e-06 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 2.3198e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 33 1.0 1.1992e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatLoad 2 1.0 2.9271e-01 1.0 0.00e+00 0.0 5.5e+02 7.6e+04 2.6e+01 1 0 0 7 0 1 0 0 7 0 0
> VecCopy 2096 1.0 4.0181e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1047 1.0 1.7598e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 1639 1.0 4.3395e-01 2.0 0.00e+00 0.0 1.9e+05 3.0e+03 0.0e+00 1 0100 93 0 1 0100 93 0 0
> VecScatterEnd 1639 1.0 3.2399e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0
> VecReduceArith 2096 1.0 5.6402e-02 1.1 3.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 9287
> VecReduceComm 1572 1.0 5.5213e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+03 20 0 0 0 18 20 0 0 0 18 0
> EPSSetUp 1 1.0 9.0121e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 5.2e+01 0 0 0 0 1 0 0 0 0 1 0
> EPSSolve 1 1.0 2.5917e+01 1.0 3.58e+09 1.0 1.9e+05 3.0e+03 8.5e+03 99100100 93100 99100100 93100 2200
> STSetUp 1 1.0 4.8380e-03 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1045 1.0 6.8107e-01 1.0 2.17e+08 1.1 0.0e+00 0.0e+00 0.0e+00 3 6 0 0 0 3 6 0 0 0 4886
> PCSetUp 2 1.0 2.3827e-02 2.8 9.18e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 580
> PCApply 1045 1.0 7.0819e-01 1.0 2.17e+08 1.1 0.0e+00 0.0e+00 0.0e+00 3 6 0 0 0 3 6 0 0 0 4699
> BVCreate 529 1.0 3.7145e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.2e+03 11 0 0 0 37 11 0 0 0 37 0
> BVCopy 1048 1.0 1.3941e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> BVMult 3761 1.0 3.6953e+00 1.1 2.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 56 0 0 0 14 56 0 0 0 8674
> BVDot 2675 1.0 9.6611e+00 1.3 1.08e+09 1.0 6.8e+04 3.0e+03 2.7e+03 34 30 36 33 31 34 30 36 33 31 1791
> BVOrthogonalize 526 1.0 4.0705e+00 1.1 7.89e+08 1.0 6.8e+04 3.0e+03 5.9e+02 15 22 36 33 7 15 22 36 33 7 3092
> BVScale 1047 1.0 1.6144e-02 1.1 8.18e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8105
> BVSetRandom 5 1.0 4.7204e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> BVMatProject 1046 1.0 5.1708e+00 1.4 6.11e+08 1.0 0.0e+00 0.0e+00 1.6e+03 18 17 0 0 18 18 17 0 0 18 1891
> DSSolve 533 1.0 9.7243e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
> DSVectors 1048 1.0 1.3440e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> DSOther 2123 1.0 8.8778e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Viewer 3 2 1504 0
> Matrix 3196 3190 31529868 0
> Vector 2653 2651 218802920 0
> Vector Scatter 2 0 0 0
> Index Set 7 7 84184 0
> Eigenvalue Problem Solver 1 1 4564 0
> PetscRandom 1 1 632 0
> Spectral Transform 1 1 828 0
> Krylov Solver 2 2 2320 0
> Preconditioner 2 2 1912 0
> Basis Vectors 530 530 1111328 0
> Region 1 1 648 0
> Direct solver 1 1 201200 0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 0.0004704
> Average time for zero size MPI_Send(): 0.000118256
> #PETSc Option Table entries:
> -eps_monitor
> -f1 A.mat
> -f2 B.mat
> -log_view
> -matload_block_size 1
> -ncv 70
> -st_ksp_tol 1e-12
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-x=0 --download-hdf5=1 --with-scalar-type=real --with-single-library=1 --with-pic=1 --with-shared-libraries=0 --with-clanguage=C++ --with-fortran=1 --with-debugging=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --download-metis=1 --download-parmetis=1 --with-valgrind-dir=/apps/rhel6/valgrind/3.8.1/ --download-mumps=1 --with-fortran-kernels=0 --download-superlu_dist=1 --download-scalapack --download-fblaslapack=1
> -----------------------------------------
> Libraries compiled on Thu Sep 22 10:19:43 2016 on carter-g008.rcac.purdue.edu
> Machine characteristics: Linux-2.6.32-573.8.1.el6.x86_64-x86_64-with-redhat-6.7-Santiago
> Using PETSc directory: /depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real
> Using PETSc arch: linux
> -----------------------------------------
>
> Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fPIC ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -O ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/include -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/include -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/include -I/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/include -I/apps/rhel6/valgrind/3.8.1/include -I/depot/apps/ncn/conte/mpich-3.1/include
> -----------------------------------------
>
> Using C linker: mpicxx
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -L/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -lpetsc -Wl,-rpath,/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -L/depot/ncn/apps/conte/conte-gcc-petsc35/libs/petsc/build-real/linux/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist_3.3 -lflapack -lfblas -lparmetis -lmetis -lpthread -lssl -lcrypto -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lz -Wl,-rpath,/depot/apps/ncn/conte/mpich-3.1/lib -L/depot/apps/ncn/conte/mpich-3.1/lib -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -L/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib64 -L/apps/rhel6/gcc/5.2.0/lib64 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib -L/apps/rhel6/gcc/5.2.0/lib -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/depot/apps/ncn/conte/mpich-3.1/lib -L/depot/apps/ncn/conte/mpich-3.1/lib -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -L/apps/rhel6/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib64 -L/apps/rhel6/gcc/5.2.0/lib64 -Wl,-rpath,/apps/rhel6/gcc/5.2.0/lib -L/apps/rhel6/gcc/5.2.0/lib -ldl -Wl,-rpath,/depot/apps/ncn/conte/mpich-3.1/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
> -----------------------------------------
>
> Can you please point me to what could be going wrong with the larger software?
>
> Thanks!
> Harshad
More information about the petsc-users
mailing list