[petsc-users] SEGV in PCSetUp_LU/MatFactorNumeric_MUMPS
Jose E. Roman
jroman at dsic.upv.es
Tue Dec 22 04:14:00 CST 2015
> El 22 dic 2015, a las 6:27, Katharine Hyatt <kshyatt at physics.ucsb.edu> escribió:
>
> Hello PETSc Users,
>
> I hope this is the right mailing list. I am hitting a SEGV in my code which is running SLEPc + PETSc to do a shift-and-invert eigensolution. I have run the code through valgrind and nothing pops up before (about 20 minutes in):
>
> [1]PETSC ERROR: #1 User provided function() line 0 in unknown file
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [2]PETSC ERROR: INSTEAD the line number of the start of the function
> [2]PETSC ERROR: is given.
> [2]PETSC ERROR: [2] MatFactorNumeric_MUMPS line 1151 /home/kshyatt/software/petsc-3.6.0/src/mat/impls/aij/mpi/mumps/mumps.c
> [2]PETSC ERROR: [2] MatLUFactorNumeric line 2937 /home/kshyatt/software/petsc-3.6.0/src/mat/interface/matrix.c
> [2]PETSC ERROR: [2] PCSetUp_LU line 99 /home/kshyatt/software/petsc-3.6.0/src/ksp/pc/impls/factor/lu/lu.c
> [4]PETSC ERROR: [4] PCSetUp_LU line 99 /home/kshyatt/software/petsc-3.6.0/src/ksp/pc/impls/factor/lu/lu.c
> [4]PETSC ERROR: [4] PCSetUp line 944 /home/kshyatt/software/petsc-3.6.0/src/ksp/pc/interface/precon.c
> [4]PETSC ERROR: [4] KSPSetUp line 247 /home/kshyatt/software/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c
> [4]PETSC ERROR: [4] STSetUp_Sinvert line 116 /home/kshyatt/sources/slepc-3.6.0/src/sys/classes/st/impls/sinvert/sinvert.c
> [4]PETSC ERROR: [8]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [8]PETSC ERROR: Signal received
> [8]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [2]PETSC ERROR: [2] PCSetUp line 944 /home/kshyatt/software/petsc-3.6.0/src/ksp/pc/interface/precon.c
> [2]PETSC ERROR: [2] KSPSetUp line 247 /home/kshyatt/software/petsc-3.6.0/src/ksp/ksp/interface/itfunc.c
> [2]PETSC ERROR: [2] STSetUp_Sinvert line 116 /home/kshyatt/sources/slepc-3.6.0/src/sys/classes/st/impls/sinvert/sinvert.c
> [2]PETSC ERROR: [2] STSetUp line 273 /home/kshyatt/sources/slepc-3.6.0/src/sys/classes/st/interface/stsolve.c
> [2]PETSC ERROR: [4] STSetUp line 273 /home/kshyatt/sources/slepc-3.6.0/src/sys/classes/st/interface/stsolve.c
> [4]PETSC ERROR: [4] EPSSetUp line 58 /home/kshyatt/sources/slepc-3.6.0/src/eps/interface/epssetup.c
> [4]PETSC ERROR: [4] EPSSolve line 83 /home/kshyatt/sources/slepc-3.6.0/src/eps/interface/epssolve.c
> [8]PETSC ERROR: Petsc Release Version 3.6.0, Jun, 09, 2015
> [8]PETSC ERROR: ./diagonalize on a arch-linux2-c-debug named node22 by kshyatt Mon Dec 21 15:54:32 2015
> [8]PETSC ERROR: Configure options --download-hdf5 --download-parmetis --download-superlu_dist --download-superlu --download-cmake --download-metis --download-mumps --download-scalapack --with-blas-lapack-dir=/opt/intel/composer_xe_2015/mkl/lib/intel64/
>
> I have tried SuperLU_dist as well, and the error has not happened over 6 hours so far. I’ve attached the code I am using. I have tried this code successfully on smaller matrices ( dimension 63,503 x 63,503, with about 698,540 nonzero elements) and I haven’t attached the bigger matrix I’m trying now because the HDF5 file is very large. The larger matrix is size 853,775 x 853,775 and has 11,099,080 nonzero elements. If you need it to reproduce the problem, I can copy it somewhere you can access, or I can send you a Julia script to generate an equivalent one. Running with valgrind and SuperLU_dist doesn’t show anything.
>
> I’m running the script across 10 nodes, with 12 CPUs each, and about 47GB of RAM + swap. Even when I run with just 1 MPI process per node, I see the segfault. I’m running the application with:
>
> mpirun -npernode 1 -machinefile $PBS_NODEFILE ./diagonalize 1 bilayer-24-01-8-4.h5 1-bilayer-24-01-8-4.h5 4 24 -eps_nev 250 -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -st_ksp_rtol 1e-7 -eps_monitor
>
> I don’t see any output from SLEPc’s EPS monitoring before the segfault (which makes sense to me, because it’s happening in the inversion step, right?). The segfault has also happened to me with version 3.6.3 of PETSc on SDSC’s Comet.
>
> Is this a problem with my code? With PETSc? or with MUMPS?
>
> Thanks,
> Katharine
>
> <diagonalize.cpp>
SLEPc code seems correct. We strongly recommend EPSKRYLOVSCHUR instead of EPSLANCZOS, although this has nothing to do with the error you are reporting.
I would bet it is an out-of-memory problem. In the SLEPc side, you can reduce the memory requirements by using the mpd parameter, see section 2.6.5 of the users manual, for instance -eps_mpd 30
However, most of the memory is being consumed by MUMPS. Maybe using Scotch instead of Parmetis reduces fill-in. You can also try running on more nodes.
Jose
More information about the petsc-users
mailing list