[petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert
dazza simplythebest
sayosale at hotmail.com
Fri Aug 27 09:12:56 CDT 2021
Dear All,
Okay, thanks for the tip and all the guidance this far - I will also investigate superLU as the linear solver.
I have a good test problem now at least !
Have a good weekend and many thanks once again,
Dan.
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Thursday, August 26, 2021 3:53 PM
To: dazza simplythebest <sayosale at hotmail.com>
Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert
On Thu, Aug 26, 2021 at 8:32 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
Dear Jose and Matthew,
Many thanks for your assistance, this would seem to explain what the problem was.
So judging by this test case, there seems to be a memory vs computational time tradeoff involved
in choosing whether to shift-invert or not; the shift-invert will greatly reduce the
number of required iterations ,but will require a higher memory cost ?
I have been trying a few values of -st_mat_mumps_icntl_14 (and also the alternative
-st_mat_mumps_icntl_23) today but have not yet been able to select one that fits onto the
workstation I am using (although it seems that setting these parameters seems to guarantee
that an error message is generated at least).
Thus I will probably need to reduce the number of MPI
processes and thereby reduce the memory requirement). In this regard the MUMPS documentation
suggests that a hybrid MPI-OpenMP approach is optimum for their software, whereas I remember reading
somewhere else that openmp threading was not a good choice for using PETSC, would you have any
general advice on this ?
Memory does not really track the number of MPI processes. MUMPS does a lot of things redundantly. For minimum memory, I
would suggest trying SuperLU_dist:
--download-superlu_dist
I do not think OpenMP will have much influence at all.
Thanks,
Matt
I was thinking maybe that a version of slepc / petsc compiled against openmp,
and with the number of threads set appropriately, but not explicitly using openmp directives in
the user's code may be the way forward ? That way PETSC will (?) just ignore the threading whereas
threading will be available to MUMPS when execution is passed to those routines ?
Many thanks once again,
Dan.
________________________________
From: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>
Sent: Wednesday, August 25, 2021 1:40 PM
To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
Cc: PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Improving efficiency of slepc usage
MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with
-st_mat_mumps_icntl_14 <percentage>
where <percentage> is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more.
See ex43.c for an example showing how to set this option in code.
Jose
> El 25 ago 2021, a las 14:11, dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> escribió:
>
>
>
> From: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Sent: Wednesday, August 25, 2021 12:08 PM
> To: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> Dear Matthew and Jose,
> I have derived a smaller program from the original program by constructing
> matrices of the same size, but filling their entries randomly instead of computing the correct
> fluid dynamics values just to allow faster experimentation. This modified code's behaviour seems
> to be similar, with the code again failing for the large matrix case with the SIGKILL error, so I first report
> results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the
> intel compiler, which it seems places automatic arrays on the stack. The stacksize, as determined
> by ulimit -a, is reported to be :
> stack size (kbytes, -s) 8192
>
> [1] Okay, so I followed your suggestion and used ctrl-c followed by 'where' in one of the non-SIGKILL gdb windows.
> I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call
> to the MUMPS solver ?
>
> [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine.
> This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors).
> On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI
> processes, ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message
> when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource
> demand that exceeds some limit as the number of MPI processes increases ?
>
> Many thanks once again,
> Dan.
>
> [2] output
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [4]PETSC ERROR: Error in external library
> [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [5]PETSC ERROR: Error in external library
> [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [6]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [6]PETSC ERROR: Error in external library
> [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045
>
> [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [7]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [7]PETSC ERROR: Error in external library
> [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925
>
> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
>
>
>
> [1] output
>
> Continuing.
> [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> [New Thread 0x7f6f5aad0800 (LWP 794040)]
> [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> ^C
> Thread 1 "my.exe" received signal SIGINT, Interrupt.
> 0x00007f72904927b0 in ofi_fastlock_release_noop ()
> from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> (gdb) where
> #0 0x00007f72904927b0 in ofi_fastlock_release_noop ()
> from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #1 0x00007f729049354b in ofi_cq_readfrom ()
> from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #2 0x00007f728ffe8f0e in rxm_ep_do_progress ()
> from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
> from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
> from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
> comm=1, flag=0x0, status=0xffffffffffffffff)
> at /usr/include/rdma/fi_tagged.h:109
> #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
> v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
> at ../../src/binding/fortran/mpif_h/iprobef.c:276
> #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
> blocking=<error reading variable: Cannot access memory at address 0x1>,
>
> --Type <RET> for more, q to quit, c to continue without paging--cont
> irecv=<error reading variable: Cannot access memory at address 0x0>, message_received=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=<error reading variable: value of type `zmumps_root_struc' requires 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730
> #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=<error reading variable: Cannot access memory at address 0x1>, a=..., la=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at address 0x2>, noffnegpv=<error reading variable: Cannot access memory at address 0x0>, nb22t1=<error reading variable: Cannot access memory at address 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>, nbtiny=<error reading variable: Cannot access memory at address 0x0>, det_exp=<error reading variable: Cannot access memory at address 0x0>, det_mant=<error reading variable: Cannot access memory at address 0x0>, det_sign=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot access memory at address 0x0>, rinfo=<error reading variable: Cannot access memory at address 0x0>, posfac=<error reading variable: Cannot access memory at address 0x0>, iwpos=<error reading variable: Cannot access memory at address 0x0>, lrlu=<error reading variable: Cannot access memory at address 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>, lrlus=<error reading variable: Cannot access memory at address 0x0>, leaf=<error reading variable: Cannot access memory at address 0x0>, nbroot=<error reading variable: Cannot access memory at address 0x0>, nbrtot=<error reading variable: Cannot access memory at address 0x0>, uu=<error reading variable: Cannot access memory at address 0x0>, icntl=<error reading variable: Cannot access memory at address 0x0>, ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at address 0x0>, keep=<error reading variable: Cannot access memory at address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot access memory at address 0xffffffff>, comm_nodes=<error reading variable: Cannot access memory at address 0x0>, myid_nodes=<error reading variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182
> #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error reading variable: Cannot access memory at address 0x1>, liw=<error reading variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading variable: Cannot access memory at address 0x25344>, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot access memory at address 0x11dbec>, nelt=<error reading variable: Cannot access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot access memory at address 0x0>, ass_irecv=<error reading variable: Cannot access memory at address 0x0>, seuil=<error reading variable: Cannot access memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot access memory at address 0x0>, mem_distrib=<error reading variable: Cannot access memory at address 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243
> #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zfac_driver.F:2421
> #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zmumps_driver.F:1883
> #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading variable: Cannot access memory at address 0x1>, comm_f77=<error reading variable: Cannot access memory at address 0x0>, n=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
> #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=<error reading variable: Cannot access memory at address 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at small_slepc_example_program.F:322
> #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> #26 0x00000000004023f2 in main ()
> #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308
> #28 0x00000000004022fe in _start ()
>
> From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
> Sent: Tuesday, August 24, 2021 3:59 PM
> To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Cc: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>; PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
>
> Dear Matthew and Jose,
> Apologies for the delayed reply, I had a couple of unforeseen days off this week.
> Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS
> to solve linear systems (the code is using a distributed MPI matrix to solve the generalised
> non-Hermitian complex problem).
>
> I have tried the gdb debugger as per Matthew's suggestion.
> Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') ,
> but after some googling I found a tip suggesting the command;
> echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> which seemed to get it working.
>
> I then first ran the debugger on the small matrix case that worked.
> That stopped in gdb almost immediately after starting execution
> with a report regarding 'nanosleep.c':
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> However, issuing the 'cont' command again caused the program to run through to the end of the
> execution w/out any problems, and with correct looking results, so I am guessing this error
> is not particularly important.
>
> We do that on purpose when the debugger starts up. Typing 'cont' is correct.
>
> I then tried the same debugging procedure on the large matrix case that fails.
> The code again stopped almost immediately after the start of execution with
> the same nanosleep error as before, and I was able to set the program running
> again with 'cont' (see full output below). I was running the code with 4 MPI processes,
> and so had 4 gdb windows appear. Thereafter the code ran for sometime until completing the
> matrix construction, and then one of the gdb process windows printed a
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> message. I then typed 'where' into this terminal but just received the message
> No stack.
>
> I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays
> on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed
> the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you
> have a large structure on the stack?
>
> Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you
> the prompt and then "where".
>
> Thanks,
>
> Matt
>
> The other gdb windows basically seemed to be left in limbo until I issued the 'quit'
> command in the SIGKILL, and then they vanished.
>
> I paste the full output from the gdb window that recorded the SIGKILL below here.
> I guess it is necessary to somehow work out where the SIGKILL originates from ?
>
> Thanks once again,
> Dan.
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./stab1.exe...
> Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919
> Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for more, q to quit, c to continue without paging--cont
> /intel64_lin/libmkl_intel_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> (gdb) cont
> Continuing.
> [New Thread 0x7f9e49c02780 (LWP 676559)]
> [New Thread 0x7f9e49400800 (LWP 676560)]
> [New Thread 0x7f9e48bfe880 (LWP 676562)]
> [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> [Thread 0x7f9e49400800 (LWP 676560) exited]
> [Thread 0x7f9e49c02780 (LWP 676559) exited]
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) where
> No stack.
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
> Sent: Friday, August 20, 2021 2:12 PM
> To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Cc: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>; PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
> Dear Jose,
> Many thanks for your response, I have been investigating this issue with a few more calculations
> today, hence the slightly delayed response.
>
> The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things
> I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
> the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
> larger matrix but to lower accuracy.
>
> Results
>
> Small matrix (N= 21168) - everything good!
> This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging
> very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target
> and also for a zero -eps_target.
>
> Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert
> I have just double checked again that the code does run properly when we use the -eps_largest_real
> option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the
> machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
> as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
> with expectations.
>
> Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
> whether or not I use a non-zero -eps_target. For reference this is the command line used :
> -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt
> To be precise the code crashes soon after calling EPSSolve (it successfully calls
> MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the
> 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
> as soon as EPSsolve is called.
>
> Hi Dan,
>
> It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with
>
> -start_in_debugger
>
> which should launch the debugger (usually), and then type
>
> cont
>
> to continue, and then
>
> where
>
> to get the stack trace when it crashes, or 'bt' on lldb.
>
> Thanks,
>
> Matt
>
> Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using
> -eps_largest_real ? The fact that the program works and produces correct results
> when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification
> of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the
> only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden
> shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using
> about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially
> more memory for example ?
>
> I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
> the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
> get that working for the full-size problem.
>
> Many thanks and best wishes,
> Dan.
>
>
>
> From: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>
> Sent: Thursday, August 19, 2021 7:58 AM
> To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Cc: PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.
>
> In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1
>
> Jose
>
>
> > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> escribió:
> >
> > Dear All,
> > I am planning on using slepc to do a large number of eigenvalue calculations
> > of a generalized eigenvalue problem, called from a program written in fortran using MPI.
> > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
> > and on smaller test problems everything is working well; the matrices are efficiently and
> > correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> > towards now solving the full-size 'production run' problems, and would appreciate some
> > general advice on how to improve the solver's performance.
> >
> > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
> > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
> > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part,
> > although in other cases I will also be interested in finding the eigenvalues whose real part
> > is close to zero.
> >
> > A)
> > Calling slepc 's EPS solver with the following options:
> >
> > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt
> >
> >
> > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
> > (examining the monitor output it did appear to be very slowly approaching convergence).
> >
> > B)
> > On the same problem I have also tried a shift-invert transformation using the options
> >
> > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert
> >
> > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
> >
> >
> > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> > In the case of A) I can see from watching the slepc videos that increasing ncv
> > may help, but I am wondering , since 600 is a large number of iterations, whether there
> > maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command line options?
> > Again, any advice will be greatly appreciated.
> > Best wishes, Dan.
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/079710a3/attachment-0001.html>
More information about the petsc-users
mailing list